1
00:00:00,000 --> 00:00:02,400
ANNOUNCER: Open content is
provided under a creative

2
00:00:02,400 --> 00:00:03,830
commons license.

3
00:00:03,830 --> 00:00:06,840
Your support will help MIT
OpenCourseWare continue to

4
00:00:06,840 --> 00:00:10,520
offer High-quality educational
resources for free.

5
00:00:10,520 --> 00:00:13,380
To make a donation, or view
additional materials from

6
00:00:13,380 --> 00:00:17,490
hundreds of MIT courses, visit
MIT OpenCourseWare at

7
00:00:17,490 --> 00:00:19,930
ocw.mit.edu .

8
00:00:19,930 --> 00:00:23,820
PROFESSOR ERIC GRIMSON: Let's
recap where we were.

9
00:00:23,820 --> 00:00:26,640
Last lecture, we talked about,
or started to talk about,

10
00:00:26,640 --> 00:00:27,770
efficiency.

11
00:00:27,770 --> 00:00:28,640
Orders of growth.

12
00:00:28,640 --> 00:00:30,050
Complexity.

13
00:00:30,050 --> 00:00:33,450
And I'll remind you, we saw a
set of algorithms, and part of

14
00:00:33,450 --> 00:00:35,740
my goal was to get you
to begin to recognize

15
00:00:35,740 --> 00:00:38,380
characteristics of algorithms
that map into

16
00:00:38,380 --> 00:00:40,200
a particular class.

17
00:00:40,200 --> 00:00:40,820
So what did we see?

18
00:00:40,820 --> 00:00:44,480
We saw linear algorithms.
Typical characterization, not

19
00:00:44,480 --> 00:00:47,100
all the time, but typical
characterization, is an

20
00:00:47,100 --> 00:00:51,270
algorithm that reduces the size
of a problem by one, or

21
00:00:51,270 --> 00:00:55,310
by some constant amount each
time, is typically an example

22
00:00:55,310 --> 00:00:57,330
of a linear algorithm.

23
00:00:57,330 --> 00:01:00,140
And we saw a couple of examples
of linear algorithms.

24
00:01:00,140 --> 00:01:03,620
We also saw a logarithmic
algorithm. and we like log

25
00:01:03,620 --> 00:01:06,510
algorithms, because they're
really fast. A typical

26
00:01:06,510 --> 00:01:09,490
characteristic of a log
algorithm is a pro-- or sorry,

27
00:01:09,490 --> 00:01:13,850
an algorithm where it reduces
the size of the problem by a

28
00:01:13,850 --> 00:01:14,820
constant factor.

29
00:01:14,820 --> 00:01:16,372
Obviously-- and that's a bad
way of saying it, I said

30
00:01:16,372 --> 00:01:18,120
constant the previous time--
in the linear case, it's

31
00:01:18,120 --> 00:01:19,600
subtract by certain amount.

32
00:01:19,600 --> 00:01:21,840
In the log case, it's
divide by an amount.

33
00:01:21,840 --> 00:01:23,240
Cut the problem in half.

34
00:01:23,240 --> 00:01:25,300
Cut the problem in half again.

35
00:01:25,300 --> 00:01:26,110
And that's a typical

36
00:01:26,110 --> 00:01:28,310
characterization of a log algorithm.

37
00:01:28,310 --> 00:01:31,140
We saw some quadratic
algorithms, typically those

38
00:01:31,140 --> 00:01:33,910
are things with multiple nested
loops, or iterative or

39
00:01:33,910 --> 00:01:36,390
recursive calls, where you're
doing, say, a linear amount of

40
00:01:36,390 --> 00:01:39,820
time but you're doing it a
linear number of times, and so

41
00:01:39,820 --> 00:01:42,010
it becomes quadratic, and you'll
see other polynomial

42
00:01:42,010 --> 00:01:43,100
kinds of algorithms.

43
00:01:43,100 --> 00:01:46,340
And finally, we saw an example
of an exponential algorithm,

44
00:01:46,340 --> 00:01:48,510
those Towers of Hanoi.

45
00:01:48,510 --> 00:01:51,200
We don't like exponential
algorithms, or at least you

46
00:01:51,200 --> 00:01:53,410
shouldn't like them, because
they blow up quickly.

47
00:01:53,410 --> 00:01:55,490
And we saw some examples
of that.

48
00:01:55,490 --> 00:01:58,210
And unfortunately, some problems
are inherently

49
00:01:58,210 --> 00:02:00,470
exponential, you're sort of
stuck with that, and then you

50
00:02:00,470 --> 00:02:03,450
just have to try be as
clever as you can.

51
00:02:03,450 --> 00:02:04,750
OK.

52
00:02:04,750 --> 00:02:07,180
At the end of the lecture last
time, I also showed you an

53
00:02:07,180 --> 00:02:09,940
example of binary search.

54
00:02:09,940 --> 00:02:12,440
And I want to redo that in a
little more detail today,

55
00:02:12,440 --> 00:02:15,170
because I felt like I did that
a little more quickly than I

56
00:02:15,170 --> 00:02:18,780
wanted to, so, if you really got
binary search, fall asleep

57
00:02:18,780 --> 00:02:21,060
for about ten minutes, just
don't snore, your neighbors

58
00:02:21,060 --> 00:02:22,680
may not appreciate it, but
we're going to go over it

59
00:02:22,680 --> 00:02:24,730
again, because it's a problem
and an idea that we're going

60
00:02:24,730 --> 00:02:26,690
to come back to, and I really
want to make sure that I do

61
00:02:26,690 --> 00:02:30,970
this in a way that makes
real good sense you.

62
00:02:30,970 --> 00:02:31,570
Again.

63
00:02:31,570 --> 00:02:33,820
Basic premise of binary search,
or at least we set it

64
00:02:33,820 --> 00:02:37,720
up was, imagine I have a sorted
list of elements.

65
00:02:37,720 --> 00:02:39,150
We get, in a second, to how
we're going to get them

66
00:02:39,150 --> 00:02:42,330
sorted, and I want to know,
is a particular

67
00:02:42,330 --> 00:02:44,390
element in that list..

68
00:02:44,390 --> 00:02:47,510
And the basic idea of binary
search is to start with the

69
00:02:47,510 --> 00:02:50,790
full range of the list,
pick the midpoint,

70
00:02:50,790 --> 00:02:53,160
and test that point.

71
00:02:53,160 --> 00:02:54,970
If it's the thing I'm looking
for, I'm golden.

72
00:02:54,970 --> 00:02:58,350
If not, because the list is
sorted, I can use the

73
00:02:58,350 --> 00:03:01,150
difference between what I'm
looking for and that midpoint

74
00:03:01,150 --> 00:03:04,040
to decide, should I look in the
top half of the list, or

75
00:03:04,040 --> 00:03:05,350
the bottom half of the list?

76
00:03:05,350 --> 00:03:06,960
And I keep chopping it down.

77
00:03:06,960 --> 00:03:09,230
And I want to show you a little
bit more detail of

78
00:03:09,230 --> 00:03:11,490
that, so let's create a simple
little list here.

79
00:03:11,490 --> 00:03:22,880
All right?

80
00:03:22,880 --> 00:03:24,530
I don't care what's in there,
but just assume that's my

81
00:03:24,530 --> 00:03:28,510
list. And just to remind you, on
your handout, and there it

82
00:03:28,510 --> 00:03:30,930
is on the screen, I'm going to
bring it back up, there's the

83
00:03:30,930 --> 00:03:32,140
little binary search
algorithm.

84
00:03:32,140 --> 00:03:33,350
We're going to call
search, which just

85
00:03:33,350 --> 00:03:34,550
calls binary search.

86
00:03:34,550 --> 00:03:39,140
And you can look at it, and
let's in fact take a look at

87
00:03:39,140 --> 00:03:40,640
it to see what it does.

88
00:03:40,640 --> 00:03:42,670
We're going to call binary
search, it's going to take the

89
00:03:42,670 --> 00:03:44,640
list to search and the element,
but it's also going

90
00:03:44,640 --> 00:03:52,390
to say, here's the first part
of the list, and there's the

91
00:03:52,390 --> 00:03:54,760
last part of the list,
and what does it

92
00:03:54,760 --> 00:03:55,830
do inside that code?

93
00:03:55,830 --> 00:03:57,770
Well, it checks to see,
is it bigger than two?

94
00:03:57,770 --> 00:03:59,280
Are there more than two
elements there?

95
00:03:59,280 --> 00:04:01,920
If there are less than two
elements there, I just check

96
00:04:01,920 --> 00:04:03,420
one or both of those to
see if I'm looking

97
00:04:03,420 --> 00:04:04,530
for the right thing.

98
00:04:04,530 --> 00:04:06,790
Otherwise, what does that
code say to do?

99
00:04:06,790 --> 00:04:10,370
It says find the midpoint, which
says, take the start,

100
00:04:10,370 --> 00:04:15,270
which is pointing to that place
right there, take last

101
00:04:15,270 --> 00:04:17,850
minus first, divide it by
2, and add it to start.

102
00:04:17,850 --> 00:04:21,230
And that basically, somewhere
about here,

103
00:04:21,230 --> 00:04:23,740
gives me the midpoint.

104
00:04:23,740 --> 00:04:25,590
Now I look at that element.

105
00:04:25,590 --> 00:04:26,830
Is it the thing I'm
looking for?

106
00:04:26,830 --> 00:04:29,020
If I'm really lucky, it is.

107
00:04:29,020 --> 00:04:33,030
If not, I look at the value of
that point here and the thing

108
00:04:33,030 --> 00:04:34,210
I'm looking for.

109
00:04:34,210 --> 00:04:36,450
And for sake of argument, let's
assume that the thing

110
00:04:36,450 --> 00:04:39,380
I'm looking for is smaller
than the value here.

111
00:04:39,380 --> 00:04:41,180
Here's what I do.

112
00:04:41,180 --> 00:04:42,470
I change-- oops!

113
00:04:42,470 --> 00:04:43,320
Let me do that this way--

114
00:04:43,320 --> 00:04:51,570
I change last to here, and keep
first there, and I throw

115
00:04:51,570 --> 00:04:54,330
away all of that.

116
00:04:54,330 --> 00:04:56,480
All right?

117
00:04:56,480 --> 00:04:59,730
That's just the those-- let me
use my pointer-- that's just

118
00:04:59,730 --> 00:05:01,580
these two lines here.

119
00:05:01,580 --> 00:05:05,020
I checked the value, and in
one case, I'm changing the

120
00:05:05,020 --> 00:05:09,350
last to be mid minus 1, which is
the case I'm in here, and I

121
00:05:09,350 --> 00:05:10,050
just call again.

122
00:05:10,050 --> 00:05:12,060
All right?

123
00:05:12,060 --> 00:05:13,450
I'm going to call exactly
the same thing.

124
00:05:13,450 --> 00:05:16,860
Now, first is pointing here,
last is pointing there, again,

125
00:05:16,860 --> 00:05:18,880
I check to see, are there more
than two things left?

126
00:05:18,880 --> 00:05:20,280
There are, in this case.

127
00:05:20,280 --> 00:05:21,000
So what do I do?

128
00:05:21,000 --> 00:05:24,060
I find the midpoint by taking
last minus first, divide by 2,

129
00:05:24,060 --> 00:05:25,800
and add to start.

130
00:05:25,800 --> 00:05:29,790
Just for sake of argument, we'll
assume it's about there,

131
00:05:29,790 --> 00:05:31,330
and I do the same thing.

132
00:05:31,330 --> 00:05:33,950
Is this value what
I'm looking for?

133
00:05:33,950 --> 00:05:35,970
Again, for sake of argument,
let's assume it's not.

134
00:05:35,970 --> 00:05:38,130
Let's assume, for sake of
argument, the thing I'm

135
00:05:38,130 --> 00:05:40,440
looking for is bigger
than this.

136
00:05:40,440 --> 00:05:43,630
In that case, I'm going to throw
away all of this, I'm

137
00:05:43,630 --> 00:05:46,790
going to hit that bottom
line of that code.

138
00:05:46,790 --> 00:05:47,200
Ah.

139
00:05:47,200 --> 00:05:48,310
What does that do?

140
00:05:48,310 --> 00:05:49,440
It changes the call.

141
00:05:49,440 --> 00:05:54,530
So in this case, first
now points

142
00:05:54,530 --> 00:06:00,310
there, last points there.

143
00:06:00,310 --> 00:06:01,110
And I cut around.

144
00:06:01,110 --> 00:06:07,660
And again, notice
what I've done.

145
00:06:07,660 --> 00:06:09,980
I've thrown away most of the
array-- most of the list, I

146
00:06:09,980 --> 00:06:12,780
shouldn't say array-- most
of the list. All right?

147
00:06:12,780 --> 00:06:16,420
So it cuts it down quickly
as we go along.

148
00:06:16,420 --> 00:06:18,170
OK.

149
00:06:18,170 --> 00:06:20,730
That's the basic idea
of binary search.

150
00:06:20,730 --> 00:06:23,210
And let's just run a couple of
examples to remind you of what

151
00:06:23,210 --> 00:06:27,470
happens if we do this.

152
00:06:27,470 --> 00:06:29,960
So if I call, let's
[UNINTELLIGIBLE], let's set up

153
00:06:29,960 --> 00:06:37,950
s to be, I don't know, some
big long list. OK.

154
00:06:37,950 --> 00:06:41,890
And I'm going to look to see, is
a particular element inside

155
00:06:41,890 --> 00:06:47,400
of that list, and again, I'll
remind you, that's just giving

156
00:06:47,400 --> 00:06:50,750
me the integers from zero up
to 9999 something or other.

157
00:06:50,750 --> 00:06:56,740
If I look for, say, minus 1,
you might go, gee, wait a

158
00:06:56,740 --> 00:06:58,680
minute, if I was just doing
linear search, I would've

159
00:06:58,680 --> 00:07:01,030
known right away that minus
one wasn't in this list,

160
00:07:01,030 --> 00:07:02,250
because it's sorted
and it's smaller

161
00:07:02,250 --> 00:07:03,840
than the first elements.

162
00:07:03,840 --> 00:07:06,070
So this looks like it's doing
a little bit of extra work,

163
00:07:06,070 --> 00:07:09,340
but you can see, if you look at
that, how it cuts it down

164
00:07:09,340 --> 00:07:09,990
at each stage.

165
00:07:09,990 --> 00:07:12,580
And I'll remind you, what I'm
printing out there is, first

166
00:07:12,580 --> 00:07:16,920
and last, with the range I'm
looking over, and then just

167
00:07:16,920 --> 00:07:20,230
how many times the
iteration called.

168
00:07:20,230 --> 00:07:22,590
So in this case, it just keeps
chopping down from the back

169
00:07:22,590 --> 00:07:24,830
end, which kind of makes
sense, all right?

170
00:07:24,830 --> 00:07:28,040
But in a fixed number, in fact,
twenty-three calls, it

171
00:07:28,040 --> 00:07:29,440
gets down to the point
of being able to say

172
00:07:29,440 --> 00:07:30,100
whether it's there.

173
00:07:30,100 --> 00:07:33,870
Let's go the other direction.

174
00:07:33,870 --> 00:07:39,490
And yes, I guess I'd better say
s not 2, or we're going to

175
00:07:39,490 --> 00:07:40,620
get an error here.

176
00:07:40,620 --> 00:07:48,320
Again, in twenty-three checks.

177
00:07:48,320 --> 00:07:50,650
In this case, it's cutting up
from the bottom end, which

178
00:07:50,650 --> 00:07:52,610
makes sense because the thing
I'm looking for is always

179
00:07:52,610 --> 00:07:55,950
bigger than the midpoint, and
then, I don't know, let's pick

180
00:07:55,950 --> 00:07:58,940
something in between.

181
00:07:58,940 --> 00:08:03,820
Somebody want-- ah, I keep doing
that-- somebody like to

182
00:08:03,820 --> 00:08:05,740
give me a number?

183
00:08:05,740 --> 00:08:07,620
I know you'd like to give
me other things, other

184
00:08:07,620 --> 00:08:10,810
expression, somebody
give me a number.

185
00:08:10,810 --> 00:08:11,650
Anybody?

186
00:08:11,650 --> 00:08:12,750
No?

187
00:08:12,750 --> 00:08:13,940
Sorry.

188
00:08:13,940 --> 00:08:14,880
Thank you.

189
00:08:14,880 --> 00:08:15,430
Good number.

190
00:08:15,430 --> 00:08:23,480
OK, walks in very quickly.

191
00:08:23,480 --> 00:08:24,620
OK?

192
00:08:24,620 --> 00:08:26,780
And if you just look at the
numbers, you can see how it

193
00:08:26,780 --> 00:08:29,290
cuts in from one side and then
the other side as it keeps

194
00:08:29,290 --> 00:08:31,800
narrowing that range, until it
gets down to the place where

195
00:08:31,800 --> 00:08:34,320
there are at most two things
left, and then it just has to

196
00:08:34,320 --> 00:08:37,030
check those two to say whether
it's there or not.

197
00:08:37,030 --> 00:08:39,350
Think about this compared
to a linear search.

198
00:08:39,350 --> 00:08:39,440
All right?

199
00:08:39,440 --> 00:08:41,300
A linear search, I start at the
beginning of the list and

200
00:08:41,300 --> 00:08:42,490
walk all the way through it.

201
00:08:42,490 --> 00:08:45,970
All right, if I'm lucky and it's
at the low end, I'll find

202
00:08:45,970 --> 00:08:46,770
it pretty quickly.

203
00:08:46,770 --> 00:08:48,980
If it's not, if it's at the
far end, I've got to go

204
00:08:48,980 --> 00:08:51,380
forever, and you saw that last
time where this thing paused

205
00:08:51,380 --> 00:08:52,470
for a little while
while it actually

206
00:08:52,470 --> 00:08:55,210
searched a list this big.

207
00:08:55,210 --> 00:08:55,470
OK.

208
00:08:55,470 --> 00:08:58,750
So, what do I want you to
take away from this?

209
00:08:58,750 --> 00:09:00,950
This idea of binary search
is going to be a

210
00:09:00,950 --> 00:09:02,720
really powerful tool.

211
00:09:02,720 --> 00:09:04,500
And it has this property,
again, of

212
00:09:04,500 --> 00:09:06,440
chopping things into pieces.

213
00:09:06,440 --> 00:09:09,040
So in fact, what does that
suggest about the order of

214
00:09:09,040 --> 00:09:09,400
growth here?

215
00:09:09,400 --> 00:09:12,860
What is the complexity of this?

216
00:09:12,860 --> 00:09:14,100
Yeah.

217
00:09:14,100 --> 00:09:14,750
Logarithmic.

218
00:09:14,750 --> 00:09:15,200
Why?

219
00:09:15,200 --> 00:09:17,810
STUDENT: [UNINTELLIGIBLE]

220
00:09:17,810 --> 00:09:18,450
PROFESSOR ERIC GRIMSON: Yeah.

221
00:09:18,450 --> 00:09:18,840
Thank you.

222
00:09:18,840 --> 00:09:20,830
I mean, I know I sort of said
it to you, but you're right.

223
00:09:20,830 --> 00:09:21,750
It's logarithmic, right?

224
00:09:21,750 --> 00:09:24,460
It's got that property of,
it cuts things in half.

225
00:09:24,460 --> 00:09:26,860
Here's another way to think
about why is this log.

226
00:09:26,860 --> 00:09:28,460
Actually, let me ask a slightly
different question.

227
00:09:28,460 --> 00:09:30,050
How do we know this
always stops?

228
00:09:30,050 --> 00:09:33,190
I mean, I ran three trials
here, and it did.

229
00:09:33,190 --> 00:09:36,560
But how would I reason about,
does this always stop?

230
00:09:36,560 --> 00:09:37,190
Well let's see.

231
00:09:37,190 --> 00:09:40,180
Where's the end test
on this thing?

232
00:09:40,180 --> 00:09:43,470
The end test-- and I've got
the wrong glasses on-- but

233
00:09:43,470 --> 00:09:46,450
it's up here, where I'm looking
to see, is last minus

234
00:09:46,450 --> 00:09:49,470
first less than or equal to 2?

235
00:09:49,470 --> 00:09:49,620
OK.

236
00:09:49,620 --> 00:09:52,330
So, soon as I get down to a list
that has no more than two

237
00:09:52,330 --> 00:09:55,060
elements in it, I'm done.

238
00:09:55,060 --> 00:09:55,600
Notice that.

239
00:09:55,600 --> 00:09:57,440
It's a less than or equal to.

240
00:09:57,440 --> 00:10:00,640
What if I just tested to see
if it was only, say, one?

241
00:10:00,640 --> 00:10:01,730
There was one element
in there.

242
00:10:01,730 --> 00:10:07,740
Would that have worked?

243
00:10:07,740 --> 00:10:09,480
I think it depends on
whether the list is

244
00:10:09,480 --> 00:10:11,870
odd or even in length.

245
00:10:11,870 --> 00:10:12,760
Actually, that's probably
not true.

246
00:10:12,760 --> 00:10:14,770
With one, it'll probably always
get it down there, but

247
00:10:14,770 --> 00:10:17,440
if I've made it just equal to
two, I might have lost.

248
00:10:17,440 --> 00:10:19,320
So first of all, I've got to be
careful about the end test.

249
00:10:19,320 --> 00:10:22,270
But the second thing is, OK, if
it stops whenever this is

250
00:10:22,270 --> 00:10:26,030
less than two, am I convinced
that this will always halt?

251
00:10:26,030 --> 00:10:26,820
And the answer is sure.

252
00:10:26,820 --> 00:10:27,670
Because what do I do?

253
00:10:27,670 --> 00:10:32,820
At each stage, no matter which
branch, here or here, I take,

254
00:10:32,820 --> 00:10:35,400
I'm cutting down the length
of the list that I'm

255
00:10:35,400 --> 00:10:36,850
searching in half.

256
00:10:36,850 --> 00:10:38,160
All right?

257
00:10:38,160 --> 00:10:41,020
So if I start off with a list
of length n, how many times

258
00:10:41,020 --> 00:10:43,610
can I divide it by 2, until
I get to something no

259
00:10:43,610 --> 00:10:45,450
more than two left?

260
00:10:45,450 --> 00:10:46,520
Log times, right.?

261
00:10:46,520 --> 00:10:47,630
Exactly as the gentleman said.

262
00:10:47,630 --> 00:10:48,490
Oh, I'm sorry.

263
00:10:48,490 --> 00:10:50,280
You're patiently waiting
for me to reward.

264
00:10:50,280 --> 00:10:53,340
Or actually, maybe you're not.

265
00:10:53,340 --> 00:10:55,030
Thank you.

266
00:10:55,030 --> 00:10:56,080
OK.

267
00:10:56,080 --> 00:11:08,690
So this is, in fact, log.

268
00:11:08,690 --> 00:11:10,830
Now, having said that,
I actually snuck

269
00:11:10,830 --> 00:11:12,900
something by you.

270
00:11:12,900 --> 00:11:14,330
And I want to spend a
couple of minutes

271
00:11:14,330 --> 00:11:16,300
again reinforcing that.

272
00:11:16,300 --> 00:11:19,690
So if we look at that code,
and we were little more

273
00:11:19,690 --> 00:11:21,480
careful about this, what
did we say to do?

274
00:11:21,480 --> 00:11:22,840
We said look an-- sorry.

275
00:11:22,840 --> 00:11:26,600
Count the number of primitive
operations in each step.

276
00:11:26,600 --> 00:11:26,860
OK.

277
00:11:26,860 --> 00:11:30,470
So if I look at this code,
first of all I'm calling

278
00:11:30,470 --> 00:11:35,420
search, it just has one call,
so looks like search is

279
00:11:35,420 --> 00:11:37,040
constant, except I
don't know what

280
00:11:37,040 --> 00:11:38,180
happens inside of b search.

281
00:11:38,180 --> 00:11:39,210
So I've got to look
at b search.

282
00:11:39,210 --> 00:11:39,900
So let's see.

283
00:11:39,900 --> 00:11:42,900
The first line, that
print thing, is

284
00:11:42,900 --> 00:11:44,910
obviously constant, right?

285
00:11:44,910 --> 00:11:47,510
Just take it as a constant
amount of operations But.

286
00:11:47,510 --> 00:11:50,960
let's look at the next one here,
or is that second line?

287
00:11:50,960 --> 00:11:51,310
OK.

288
00:11:51,310 --> 00:11:54,900
If last minus first is greater
than or equal to 2-- sorry,

289
00:11:54,900 --> 00:11:57,820
less than 2, then either
look at this thing or

290
00:11:57,820 --> 00:11:58,590
look at that thing.

291
00:11:58,590 --> 00:12:02,310
And that's where I said we've
got to be careful.

292
00:12:02,310 --> 00:12:05,900
That's accessing an element of
a list. We have to make sure

293
00:12:05,900 --> 00:12:08,490
that, in fact, that operation
is not linear.

294
00:12:08,490 --> 00:12:12,460
So let me expand on that very
slightly, and again, we did

295
00:12:12,460 --> 00:12:14,740
this last time but I want
to do one more time.

296
00:12:14,740 --> 00:12:22,470
I have to be careful about how
I'm actually implementing a

297
00:12:22,470 --> 00:12:22,840
list.

298
00:12:22,840 --> 00:12:38,410
So, for example: in
this case, my list

299
00:12:38,410 --> 00:12:40,530
is a bunch of integers.

300
00:12:40,530 --> 00:12:42,740
And one of the things I could
take advantage of, is I'm only

301
00:12:42,740 --> 00:12:44,540
going to need a finite
amount of space to

302
00:12:44,540 --> 00:12:46,180
represent an integer.

303
00:12:46,180 --> 00:12:49,730
So, for example, if I want to
allow for some fairly large

304
00:12:49,730 --> 00:12:52,880
range of integers, I might say,
I need four memory cells

305
00:12:52,880 --> 00:12:54,350
in a row to represent
an integer.

306
00:12:54,350 --> 00:12:56,660
All right, if it's a zero, it's
going to be a whole bunch

307
00:12:56,660 --> 00:12:59,200
of ones-- of zeroes, so one,
it may be a whole bunch of

308
00:12:59,200 --> 00:13:01,510
zeroes in the first three and
then a one at the end of this

309
00:13:01,510 --> 00:13:04,020
thing, but one of the way to
think about this list in

310
00:13:04,020 --> 00:13:08,470
memory, is that I can decide in
constant time how to find

311
00:13:08,470 --> 00:13:09,640
the i'th element of a list.

312
00:13:09,640 --> 00:13:12,460
So in particular, here's where
the zero-th element of the

313
00:13:12,460 --> 00:13:15,320
list starts, there's where the
first element starts, here's

314
00:13:15,320 --> 00:13:17,560
where the third element starts,
these are just memory

315
00:13:17,560 --> 00:13:25,140
cells in a row, and to find the
zero-th element, if start

316
00:13:25,140 --> 00:13:29,300
is pointing to that memory
cell, it's just at start.

317
00:13:29,300 --> 00:13:33,730
To find the first element,
because I know I need four

318
00:13:33,730 --> 00:13:41,590
memory cells to represent an
integer, it's at start plus 4.

319
00:13:41,590 --> 00:13:46,080
To get to the second element,
I know that that's-- you get

320
00:13:46,080 --> 00:13:50,940
the idea-- at the start plus 2
times 4, and to get to the

321
00:13:50,940 --> 00:14:01,520
k'th element, I know that I
want to take whatever the

322
00:14:01,520 --> 00:14:04,790
start is which points to that
place in memory, take care,

323
00:14:04,790 --> 00:14:08,590
multiply by 4, and that tells me
exactly where to go to find

324
00:14:08,590 --> 00:14:09,200
that location.

325
00:14:09,200 --> 00:14:13,860
This may sound like a nuance,
but it's important.

326
00:14:13,860 --> 00:14:14,910
Why?

327
00:14:14,910 --> 00:14:16,900
Because that's a constant
access, right?

328
00:14:16,900 --> 00:14:19,890
To get any location in memory,
to get to any value of the

329
00:14:19,890 --> 00:14:23,120
list, I simply have to say which
element do I want to

330
00:14:23,120 --> 00:14:25,780
get, I know that these things
are stored in a particular

331
00:14:25,780 --> 00:14:29,210
size, multiply that index by 4,
add it to start, and then

332
00:14:29,210 --> 00:14:31,880
it's in a constant amount of
time I can go to that location

333
00:14:31,880 --> 00:14:33,870
and get out the cell.

334
00:14:33,870 --> 00:14:36,590
OK.

335
00:14:36,590 --> 00:14:42,670
That works nicely if I know that
I have things stored in

336
00:14:42,670 --> 00:14:44,510
constant size.

337
00:14:44,510 --> 00:14:47,320
But what if I have
a list of lists?

338
00:14:47,320 --> 00:14:49,640
What if I have a homogeneous
list, a list of integers and

339
00:14:49,640 --> 00:14:52,438
strings and floats and lists and
lists of lists and lists

340
00:14:52,438 --> 00:14:53,860
of lists of lists and all
that sort of cool stuff?

341
00:14:53,860 --> 00:15:04,190
In that case, I've got to
be a lot more careful.

342
00:15:04,190 --> 00:15:07,450
So in this case, one of the
standard ways to do this, is

343
00:15:07,450 --> 00:15:13,130
to use what's called a linked
list. And I'm going to do it

344
00:15:13,130 --> 00:15:14,120
in the following way.

345
00:15:14,120 --> 00:15:21,570
Start again, we'll point to the
beginning of the list. But

346
00:15:21,570 --> 00:15:23,790
now, because my elements are
going to take different

347
00:15:23,790 --> 00:15:26,430
amounts of memory, I'm going
to do the following thing.

348
00:15:26,430 --> 00:15:31,440
In the first spot, I'm going to
store something that says,

349
00:15:31,440 --> 00:15:33,610
here's how far you
have to jump to

350
00:15:33,610 --> 00:15:35,320
get to the next element.

351
00:15:35,320 --> 00:15:38,910
And then, I'm going to use the
next sequence of things to

352
00:15:38,910 --> 00:15:40,750
represent the first
element, or

353
00:15:40,750 --> 00:15:42,230
the zero-th element,
if you like.

354
00:15:42,230 --> 00:15:44,120
In this case I might
need five.

355
00:15:44,120 --> 00:15:47,550
And then in the next spot, I'm
going to say how far you have

356
00:15:47,550 --> 00:15:49,070
to jump to get to the
next element.

357
00:15:49,070 --> 00:15:53,220
All right, followed by whatever
I need to represent

358
00:15:53,220 --> 00:15:54,880
it, which might only
be a blank one.

359
00:15:54,880 --> 00:15:59,100
And in the next spot, maybe I've
got a really long list,

360
00:15:59,100 --> 00:16:00,990
and I'm going to say
how to jump to

361
00:16:00,990 --> 00:16:02,060
get to the next element.

362
00:16:02,060 --> 00:16:05,020
All right, this is actually
kind of nice.

363
00:16:05,020 --> 00:16:07,800
This lets me have a way of
representing things that could

364
00:16:07,800 --> 00:16:08,900
be arbitrary in size.

365
00:16:08,900 --> 00:16:10,240
And some of these things
could be huge, if

366
00:16:10,240 --> 00:16:12,370
they're themselves lists.

367
00:16:12,370 --> 00:16:13,970
Here's the problem.

368
00:16:13,970 --> 00:16:16,690
How do I get to the nth-- er,
the k'th element in the list,

369
00:16:16,690 --> 00:16:17,860
in this case?

370
00:16:17,860 --> 00:16:21,380
Well I have to go to the zero-th
element, and say OK,

371
00:16:21,380 --> 00:16:23,520
gee, to get to the next
element, I've got

372
00:16:23,520 --> 00:16:24,700
to jump this here.

373
00:16:24,700 --> 00:16:27,490
And to get to the next element,
I've got to jump to

374
00:16:27,490 --> 00:16:29,760
here, and to get to the next
element, I've got to jump to

375
00:16:29,760 --> 00:16:32,990
here, until I get there.

376
00:16:32,990 --> 00:16:34,310
And so, I get some power.

377
00:16:34,310 --> 00:16:37,030
I get the ability to store
arbitrary things, but what

378
00:16:37,030 --> 00:16:39,830
just happened to
my complexity?

379
00:16:39,830 --> 00:16:41,270
How long does it take
me to find the

380
00:16:41,270 --> 00:16:43,310
k'th element?

381
00:16:43,310 --> 00:16:44,270
Linear.

382
00:16:44,270 --> 00:16:45,990
Because I've got to walk
my way down it.

383
00:16:45,990 --> 00:16:46,940
OK?

384
00:16:46,940 --> 00:16:56,730
So in this case, you
have linear access.

385
00:16:56,730 --> 00:16:57,820
Oh fudge knuckle.

386
00:16:57,820 --> 00:16:58,780
Right?

387
00:16:58,780 --> 00:17:01,440
If that was the case in that
code, then my complexity is no

388
00:17:01,440 --> 00:17:04,690
longer log, because I need
linear access for each time

389
00:17:04,690 --> 00:17:06,260
I've got to go to the list,
and it's going to be much

390
00:17:06,260 --> 00:17:06,940
worse than that.

391
00:17:06,940 --> 00:17:08,430
All right.

392
00:17:08,430 --> 00:17:08,650
Now.

393
00:17:08,650 --> 00:17:13,730
Some programming languages,
primarily Lisp, actually store

394
00:17:13,730 --> 00:17:15,700
lists these ways.

395
00:17:15,700 --> 00:17:17,470
You might say, why?

396
00:17:17,470 --> 00:17:19,850
Well it turns out there's
some trade-offs to it.

397
00:17:19,850 --> 00:17:22,340
It has some advantages in terms
of power of storing

398
00:17:22,340 --> 00:17:24,420
things, it has some
disadvantages, primarily in

399
00:17:24,420 --> 00:17:26,270
terms of access time.

400
00:17:26,270 --> 00:17:28,890
Fortunately for you, Python
decided, or the investors of

401
00:17:28,890 --> 00:17:30,840
Python decided, to store
this a different way.

402
00:17:30,840 --> 00:17:34,720
And the different way is to say,
look, if I redraw this,

403
00:17:34,720 --> 00:17:48,150
it's called a box and pointer
diagram, what we really have

404
00:17:48,150 --> 00:17:49,800
for each element
is two things.

405
00:17:49,800 --> 00:17:52,070
And I've actually just reversed
the order here.

406
00:17:52,070 --> 00:17:55,070
We have a pointer to the
location in memory that

407
00:17:55,070 --> 00:17:58,510
contains the actual value, which
itself might be a bunch

408
00:17:58,510 --> 00:18:02,885
of pointers, and we have a
pointer to the actual-- sorry,

409
00:18:02,885 --> 00:18:05,390
a pointer the value and we have
a pointer to the next

410
00:18:05,390 --> 00:18:08,000
element in the list.
All right?

411
00:18:08,000 --> 00:18:10,090
And one of the things we could
do if we look at that is, we

412
00:18:10,090 --> 00:18:12,440
say, gee, we could reorganize
this in a pretty

413
00:18:12,440 --> 00:18:13,360
straightforward way.

414
00:18:13,360 --> 00:18:21,400
In particular, why don't we
just take all of the first

415
00:18:21,400 --> 00:18:35,520
cells and stick them together?

416
00:18:35,520 --> 00:18:40,070
Where now, my list is a list of
pointers, it's not a set of

417
00:18:40,070 --> 00:18:42,080
values but it's actually a
pointer off to some other

418
00:18:42,080 --> 00:18:44,480
piece of memory that
contains the value.

419
00:18:44,480 --> 00:18:46,380
Why is this nice?

420
00:18:46,380 --> 00:18:50,750
Well this is exactly like this.

421
00:18:50,750 --> 00:18:53,480
All right?

422
00:18:53,480 --> 00:18:57,160
It's now something that I can
search in constant time.

423
00:18:57,160 --> 00:18:58,670
And that's what's going to
allow me to keep this

424
00:18:58,670 --> 00:19:01,400
thing as being log.

425
00:19:01,400 --> 00:19:03,440
OK.

426
00:19:03,440 --> 00:19:07,530
With that in mind, let's go
back to where we were.

427
00:19:07,530 --> 00:19:10,580
And where were we?

428
00:19:10,580 --> 00:19:15,040
We started off talking about
binary search, and I suggested

429
00:19:15,040 --> 00:19:18,870
that this was a log algorithm,
which it is, which is really

430
00:19:18,870 --> 00:19:21,060
kind of nice.

431
00:19:21,060 --> 00:19:32,970
Let's pull together what this
algorithm actually does.

432
00:19:32,970 --> 00:19:35,780
If I generalize binary search,
here's what I'm going to stake

433
00:19:35,780 --> 00:19:37,070
that this thing does.

434
00:19:37,070 --> 00:19:45,580
It says one: pick
the midpoint.

435
00:19:45,580 --> 00:19:56,360
Two: check to see if this is
the answer, if this is the

436
00:19:56,360 --> 00:19:58,000
thing I'm looking for.

437
00:19:58,000 --> 00:20:05,910
And then, three: if not, reduce
to a smaller problem,

438
00:20:05,910 --> 00:20:16,220
and repeat.

439
00:20:16,220 --> 00:20:18,260
OK, you're going, yeah, come on,
that makes obvious sense.

440
00:20:18,260 --> 00:20:18,980
And it does.

441
00:20:18,980 --> 00:20:21,150
But I want you to keep that
template in mind, because

442
00:20:21,150 --> 00:20:22,930
we're going to come
back to that.

443
00:20:22,930 --> 00:20:25,310
It's an example of a very common
tool that's going to be

444
00:20:25,310 --> 00:20:28,180
really useful to us, not just
for doing search, but for

445
00:20:28,180 --> 00:20:30,900
doing a whole range of problems.
That is, in essence,

446
00:20:30,900 --> 00:20:34,030
the template the describes
a log style algorithm.

447
00:20:34,030 --> 00:20:37,030
And we're going to
come back to it.

448
00:20:37,030 --> 00:20:38,620
OK.

449
00:20:38,620 --> 00:20:41,560
With that in mind though,
didn't I cheat?

450
00:20:41,560 --> 00:20:45,340
I remind you, I know you're not
really listening to me,

451
00:20:45,340 --> 00:20:45,940
but that's OK.

452
00:20:45,940 --> 00:20:47,635
I reminded you at the beginning
of the lecture, I

453
00:20:47,635 --> 00:20:50,090
said, let's assume we have
a sorted list, and then

454
00:20:50,090 --> 00:20:52,210
let's go search it.

455
00:20:52,210 --> 00:20:52,830
Where in the world

456
00:20:52,830 --> 00:20:54,790
did that sorted list
come from?

457
00:20:54,790 --> 00:20:58,700
What if I just get a list of
elements, what do I do?

458
00:20:58,700 --> 00:20:59,250
Well let's see.

459
00:20:59,250 --> 00:21:02,210
My fall back is, I could just
do linear search, walk down

460
00:21:02,210 --> 00:21:04,400
the list one at a time, just
comparing those things.

461
00:21:04,400 --> 00:21:04,580
OK.

462
00:21:04,580 --> 00:21:06,250
So that's sort of my base.

463
00:21:06,250 --> 00:21:08,390
But what if I wanted, you know,
how do I want to get to

464
00:21:08,390 --> 00:21:09,300
that sorted list?

465
00:21:09,300 --> 00:21:12,250
All right?

466
00:21:12,250 --> 00:21:16,090
Now.

467
00:21:16,090 --> 00:21:18,160
One of the questions, before we
get to doing the sorting,

468
00:21:18,160 --> 00:21:20,810
is even to ask, what should I do
in a search case like that?

469
00:21:20,810 --> 00:21:26,380
All right, so in particular,
does it make sense, if I'm

470
00:21:26,380 --> 00:21:29,990
given an unsorted list,
to first sort it,

471
00:21:29,990 --> 00:21:31,450
and then search it?

472
00:21:31,450 --> 00:21:34,330
Or should I just use the
basically linear case?

473
00:21:34,330 --> 00:21:34,580
All right?

474
00:21:34,580 --> 00:21:39,900
So, here's the question.

475
00:21:39,900 --> 00:21:47,560
Should we sort before
we search?

476
00:21:47,560 --> 00:21:47,820
OK.

477
00:21:47,820 --> 00:21:51,050
So let's see, if I'm going to
do this, how fast could we

478
00:21:51,050 --> 00:21:53,560
sort a list?

479
00:21:53,560 --> 00:22:05,880
Can we sort a list in
sublinear time?

480
00:22:05,880 --> 00:22:07,150
Sublinear meaning, something
like log

481
00:22:07,150 --> 00:22:08,180
less than linear time?

482
00:22:08,180 --> 00:22:11,460
What do you think?

483
00:22:11,460 --> 00:22:17,630
It's possible?

484
00:22:17,630 --> 00:22:19,800
Any thoughts?

485
00:22:19,800 --> 00:22:22,150
Don't you hate professors who
stand here waiting for you to

486
00:22:22,150 --> 00:22:25,660
answer, even when
they have candy?

487
00:22:25,660 --> 00:22:28,370
Does it make sense to think we
could do this in less than

488
00:22:28,370 --> 00:22:28,940
linear time?

489
00:22:28,940 --> 00:22:31,060
You know, it takes a little
bit of thinking.

490
00:22:31,060 --> 00:22:31,760
What would it mean--

491
00:22:31,760 --> 00:22:34,240
[UNINTELLIGIBLE PHRASE] do I see
a hand, way at the back,

492
00:22:34,240 --> 00:22:37,500
yes please?

493
00:22:37,500 --> 00:22:39,770
Thank you.

494
00:22:39,770 --> 00:22:41,570
Man, you're going to really make
me work here, I have no

495
00:22:41,570 --> 00:22:43,720
idea if I can get it that
far, ah, your friend

496
00:22:43,720 --> 00:22:44,250
will help you out.

497
00:22:44,250 --> 00:22:45,090
Thank you.

498
00:22:45,090 --> 00:22:47,130
The gentleman has it
exactly right.

499
00:22:47,130 --> 00:22:49,910
How could I possibly do it in
sublinear time, I've got to

500
00:22:49,910 --> 00:22:52,700
look at least every
element once.

501
00:22:52,700 --> 00:22:54,550
And that's the kind of instinct
I'd like you to get

502
00:22:54,550 --> 00:22:55,130
into thinking about.

503
00:22:55,130 --> 00:22:58,620
So the answer here is no.

504
00:22:58,620 --> 00:23:00,180
OK.

505
00:23:00,180 --> 00:23:07,380
Can we sort it in linear time?

506
00:23:07,380 --> 00:23:07,830
Hmmm.

507
00:23:07,830 --> 00:23:11,910
That one's not so obvious.

508
00:23:11,910 --> 00:23:13,440
So let's think about
this for a second.

509
00:23:13,440 --> 00:23:18,190
To sort a list in linear time,
would say, I have to look at

510
00:23:18,190 --> 00:23:20,530
each element in the
list at most a

511
00:23:20,530 --> 00:23:21,770
constant number of times.

512
00:23:21,770 --> 00:23:23,030
It doesn't have to be
just once, right?

513
00:23:23,030 --> 00:23:25,550
It could be two or three times.

514
00:23:25,550 --> 00:23:25,960
Hmm.

515
00:23:25,960 --> 00:23:26,590
Well, wait a minute.

516
00:23:26,590 --> 00:23:28,860
If I want to sort a list, I'll
take one element, I've got to

517
00:23:28,860 --> 00:23:33,590
look at probably a lot of the
other elements in the list in

518
00:23:33,590 --> 00:23:35,290
order to decide where it goes.

519
00:23:35,290 --> 00:23:37,370
And that suggests it's going
to depend on how

520
00:23:37,370 --> 00:23:38,200
long the list is.

521
00:23:38,200 --> 00:23:41,640
All right, so that's a weak
argument, but in fact, it's a

522
00:23:41,640 --> 00:23:49,010
way of suggesting,
probably not.

523
00:23:49,010 --> 00:23:50,490
All right.

524
00:23:50,490 --> 00:23:52,890
So how fast could
I sort a list?

525
00:23:52,890 --> 00:23:54,170
How fast can we sort it?

526
00:23:54,170 --> 00:24:03,020
And we're going to come back to
this, probably next time if

527
00:24:03,020 --> 00:24:12,670
I time this right, but the
answer is, we can do it in n

528
00:24:12,670 --> 00:24:14,830
log n time.

529
00:24:14,830 --> 00:24:15,810
We're going to come
back to that.

530
00:24:15,810 --> 00:24:16,110
All right?

531
00:24:16,110 --> 00:24:18,956
And I'm going to say-- sort of
set that stage here, so that--

532
00:24:18,956 --> 00:24:21,740
It turns out that that's
probably about the best we can

533
00:24:21,740 --> 00:24:25,560
do, or again ends at the
length of the list.

534
00:24:25,560 --> 00:24:27,510
OK, so that's still comes
back to my question.

535
00:24:27,510 --> 00:24:30,220
If I want to search a list,
should I sort it first and

536
00:24:30,220 --> 00:24:31,710
then search it?

537
00:24:31,710 --> 00:24:33,310
Hmmm.

538
00:24:33,310 --> 00:24:39,420
OK, so let's do the
comparison.

539
00:24:39,420 --> 00:24:42,550
I'm just going to take an
unsorted list and search it, I

540
00:24:42,550 --> 00:24:43,980
could do it in linear
time, right?

541
00:24:43,980 --> 00:24:44,600
One at a time.

542
00:24:44,600 --> 00:24:46,400
Walk down the elements
until I find it.

543
00:24:46,400 --> 00:24:48,480
That would be order n.

544
00:24:48,480 --> 00:24:53,020
On the other hand, if I want
to sort it first, OK, if I

545
00:24:53,020 --> 00:24:59,520
want to do sort and search, I
want to sort it, it's going to

546
00:24:59,520 --> 00:25:05,510
take n log n time to sort it,
and having done that, then I

547
00:25:05,510 --> 00:25:09,170
can search it in log n time.

548
00:25:09,170 --> 00:25:10,610
Ah.

549
00:25:10,610 --> 00:25:15,020
So which one's better?

550
00:25:15,020 --> 00:25:20,200
Yeah.

551
00:25:20,200 --> 00:25:20,760
Ah-ha.

552
00:25:20,760 --> 00:25:21,690
Thank you.

553
00:25:21,690 --> 00:25:23,205
Hold on to that thought for
second, I'm going to

554
00:25:23,205 --> 00:25:23,890
come back to it.

555
00:25:23,890 --> 00:25:25,860
That does not assume I'm running
a search it wants,

556
00:25:25,860 --> 00:25:29,000
which one's better?

557
00:25:29,000 --> 00:25:30,760
The unsorted, and you have
exactly the point I want to

558
00:25:30,760 --> 00:25:33,140
get to-- how come all the guys,
sorry, all the people

559
00:25:33,140 --> 00:25:36,780
answering questions are way,
way up in the back?

560
00:25:36,780 --> 00:25:40,430
Wow. that's a Tim Wakefield
pitch right there, all right.

561
00:25:40,430 --> 00:25:42,160
Thank you.

562
00:25:42,160 --> 00:25:43,910
He has it exactly right.

563
00:25:43,910 --> 00:25:45,170
OK?

564
00:25:45,170 --> 00:25:48,330
Is this smaller than that?

565
00:25:48,330 --> 00:25:49,190
No.

566
00:25:49,190 --> 00:25:50,340
Now that's a slight lie.

567
00:25:50,340 --> 00:25:52,730
Sorry, a slight misstatement,
OK?

568
00:25:52,730 --> 00:25:54,570
I could run for office, couldn't
I, if I can do that

569
00:25:54,570 --> 00:25:55,460
kind of talk.

570
00:25:55,460 --> 00:25:57,620
It's a slight misstatement in
the sense that these should

571
00:25:57,620 --> 00:25:58,630
really be orders of growth.

572
00:25:58,630 --> 00:26:00,640
There are some constants in
there, it depends on the size,

573
00:26:00,640 --> 00:26:05,800
but in general, n log n has
to be bigger than n.

574
00:26:05,800 --> 00:26:08,330
So, as the gentleman back there
said, if I'm searching

575
00:26:08,330 --> 00:26:11,920
it once, just use the
linear search.

576
00:26:11,920 --> 00:26:15,370
On the other hand, am I likely
to only search a list once?

577
00:26:15,370 --> 00:26:16,190
Probably not.

578
00:26:16,190 --> 00:26:17,710
There are going to be multiple
elements I'm going to be

579
00:26:17,710 --> 00:26:24,120
looking for, so that suggests
that in fact, I want to

580
00:26:24,120 --> 00:26:26,500
amortize the cost.

581
00:26:26,500 --> 00:26:30,970
And what does that say?

582
00:26:30,970 --> 00:26:33,000
It says, let's assume
I want to do k

583
00:26:33,000 --> 00:26:41,700
searches of a list. OK.

584
00:26:41,700 --> 00:26:44,720
In the linear case, meaning in
the unsorted case, what's the

585
00:26:44,720 --> 00:26:48,900
complexity of this?
k times n, right?

586
00:26:48,900 --> 00:26:51,170
Order n to do the search, and
I've got to do it k times, so

587
00:26:51,170 --> 00:26:55,830
this would be k times n.

588
00:26:55,830 --> 00:26:58,170
In the [GARBLED PHRASE]

589
00:26:58,170 --> 00:27:03,530
sort and search case,
what's my cost?

590
00:27:03,530 --> 00:27:05,690
I've got to sort it, and we
said, and we'll come back to

591
00:27:05,690 --> 00:27:10,520
that next time, that I can do
the sort in n log n, and then

592
00:27:10,520 --> 00:27:13,880
what's the search
in this case?

593
00:27:13,880 --> 00:27:17,730
Let's log n to do one search, I
want to do k of them, that's

594
00:27:17,730 --> 00:27:26,090
k log n, ah-ha!

595
00:27:26,090 --> 00:27:28,780
Now I'm in better
shape, right?

596
00:27:28,780 --> 00:27:31,730
Especially for really large n or
for a lot of k, because now

597
00:27:31,730 --> 00:27:37,860
in general, this is going
to be smaller than that.

598
00:27:37,860 --> 00:27:40,110
So this is a place where
the amortized cost

599
00:27:40,110 --> 00:27:41,800
actually helps me out.

600
00:27:41,800 --> 00:27:43,900
And as the gentleman at the
back said, the question he

601
00:27:43,900 --> 00:27:46,210
asked is right, it depends
on what I'm trying to do.

602
00:27:46,210 --> 00:27:49,160
So when I do the analysis, I
want to think about what am I

603
00:27:49,160 --> 00:27:51,370
doing here, am I capturing
all the pieces of it?

604
00:27:51,370 --> 00:27:54,020
Here, the two variables that
matter are what's the length

605
00:27:54,020 --> 00:27:57,030
of the list, and how many times
I'm going to search it?

606
00:27:57,030 --> 00:28:04,010
So in this case, this one wins,
whereas in this case,

607
00:28:04,010 --> 00:28:07,000
that one wins.

608
00:28:07,000 --> 00:28:08,960
OK.

609
00:28:08,960 --> 00:28:13,220
Having said that, let's look
at doing some sorts.

610
00:28:13,220 --> 00:28:16,290
And I'm going to start with
a couple of dumb sorting

611
00:28:16,290 --> 00:28:19,400
mechanisms. Actually, that's
the wrong way saying it,

612
00:28:19,400 --> 00:28:21,510
they're simply brain-damaged,
they're not dumb, OK?

613
00:28:21,510 --> 00:28:23,650
They are computationally
challenged, meaning, at the

614
00:28:23,650 --> 00:28:25,700
time they were invented, they
were perfectly good sorting

615
00:28:25,700 --> 00:28:27,170
algorithms, there are better
ones, we're going to see a

616
00:28:27,170 --> 00:28:29,010
much better one next time
around, but this is a good way

617
00:28:29,010 --> 00:28:30,900
to just start thinking about
how to do the algorithm, or

618
00:28:30,900 --> 00:28:32,360
how to do the sort.

619
00:28:32,360 --> 00:28:33,060
Blah, try again.

620
00:28:33,060 --> 00:28:34,560
How to do this sort.

621
00:28:34,560 --> 00:28:38,640
So the first one I want to talk
about it's what's called

622
00:28:38,640 --> 00:28:40,940
selection sort.

623
00:28:40,940 --> 00:28:50,330
And it's on your handout, and
I'm going to bring the code up

624
00:28:50,330 --> 00:28:53,310
here, you can see it, it's
called cell sort, just for

625
00:28:53,310 --> 00:28:54,160
selection sort.

626
00:28:54,160 --> 00:28:59,060
And let's take a look
at what this does.

627
00:28:59,060 --> 00:28:59,220
OK.

628
00:28:59,220 --> 00:29:01,010
And in fact I think the easy
way to look at what this

629
00:29:01,010 --> 00:29:02,690
does-- boy.

630
00:29:02,690 --> 00:29:03,700
My jokes are that bad.

631
00:29:03,700 --> 00:29:04,510
Wow--

632
00:29:04,510 --> 00:29:04,790
All right.

633
00:29:04,790 --> 00:29:07,535
I think the easiest way to look
at what this does, is

634
00:29:07,535 --> 00:29:10,790
let's take a really
simple example--

635
00:29:10,790 --> 00:29:20,690
I want to make sure I put
the right things out--

636
00:29:20,690 --> 00:29:23,040
I've got a simple little
list of values there.

637
00:29:23,040 --> 00:29:25,660
And if I look at this code, I'm
going to run over a loop,

638
00:29:25,660 --> 00:29:28,930
you can see that there, i is
going to go from zero up to

639
00:29:28,930 --> 00:29:34,720
the length minus 1, and I'm
going to keep track of a

640
00:29:34,720 --> 00:29:35,700
couple of variables.

641
00:29:35,700 --> 00:29:42,670
Min index, I think I
called it min val.

642
00:29:42,670 --> 00:29:42,810
OK.

643
00:29:42,810 --> 00:29:43,780
Let's simulate the code.

644
00:29:43,780 --> 00:29:44,960
Let's see what it's
doing here.

645
00:29:44,960 --> 00:29:47,780
All right, so we start off.

646
00:29:47,780 --> 00:29:53,110
Initially i-- ah, let me do it
this way, i is going to point

647
00:29:53,110 --> 00:29:58,780
there, and I want to make sure
I do it right, OK-- and min

648
00:29:58,780 --> 00:30:03,330
index is going to point to the
value of i, which is there,

649
00:30:03,330 --> 00:30:06,780
and min value is initially going
to have the value 1.

650
00:30:06,780 --> 00:30:09,490
So we're simply catting a hold
of what's the first value

651
00:30:09,490 --> 00:30:10,160
we've got there.

652
00:30:10,160 --> 00:30:12,000
And then what do we do?

653
00:30:12,000 --> 00:30:18,660
We start with j pointing here,
and we can see what this

654
00:30:18,660 --> 00:30:20,840
loop's going to do, right? j
is just going to move up.

655
00:30:20,840 --> 00:30:23,180
So it's going to look at the
rest of the list, walking

656
00:30:23,180 --> 00:30:25,990
along, and what does it do?

657
00:30:25,990 --> 00:30:27,780
It says, right.

658
00:30:27,780 --> 00:30:30,540
If j is-- well it says until j
is at the less than the length

659
00:30:30,540 --> 00:30:37,050
of l-- it says, if min value is
bigger than the thing I'm

660
00:30:37,050 --> 00:30:39,690
looking at, I'm going to do
something, all right?

661
00:30:39,690 --> 00:30:40,850
So let's walk this.

662
00:30:40,850 --> 00:30:42,030
Min value is 1,.

663
00:30:42,030 --> 00:30:43,020
Is 1 bigger than 8?

664
00:30:43,020 --> 00:30:43,410
No.

665
00:30:43,410 --> 00:30:44,080
I move j up.

666
00:30:44,080 --> 00:30:44,990
Is 1 bigger than 3?

667
00:30:44,990 --> 00:30:45,510
No.

668
00:30:45,510 --> 00:30:46,440
1 bigger than 6?

669
00:30:46,440 --> 00:30:46,580
No.

670
00:30:46,580 --> 00:30:47,550
1 bigger than 4?

671
00:30:47,550 --> 00:30:47,850
No.

672
00:30:47,850 --> 00:30:50,640
I get to the end of the loop,
and I actually do a little bit

673
00:30:50,640 --> 00:30:51,860
of wasted motion there.

674
00:30:51,860 --> 00:30:55,690
And the little bit of wasted
motion is, I take the value at

675
00:30:55,690 --> 00:31:00,405
i, store it away temporarily,
take the value where min index

676
00:31:00,405 --> 00:31:02,940
is pointing to, put it
back in there, and

677
00:31:02,940 --> 00:31:04,900
then swap it around.

678
00:31:04,900 --> 00:31:05,210
OK.

679
00:31:05,210 --> 00:31:11,540
Having done that, let's move i
up to here. i is now pointing

680
00:31:11,540 --> 00:31:12,070
at that thing.

681
00:31:12,070 --> 00:31:13,850
Go through the second
round of the loop.

682
00:31:13,850 --> 00:31:14,890
OK.

683
00:31:14,890 --> 00:31:15,790
What does that say?

684
00:31:15,790 --> 00:31:21,360
I'm going to change min index to
also point there n value is

685
00:31:21,360 --> 00:31:26,720
8, j starts off here, and I
say, OK, is the thing I'm

686
00:31:26,720 --> 00:31:30,330
looking at here smaller
than that?

687
00:31:30,330 --> 00:31:31,550
Yes.

688
00:31:31,550 --> 00:31:32,530
Ah-ha.

689
00:31:32,530 --> 00:31:33,710
What does that say to do?

690
00:31:33,710 --> 00:31:41,200
It says, gee, make min
index point to there,

691
00:31:41,200 --> 00:31:44,420
min value be 3.

692
00:31:44,420 --> 00:31:46,810
Change j.

693
00:31:46,810 --> 00:31:47,970
Is 6 bigger than 3?

694
00:31:47,970 --> 00:31:48,520
Yes.

695
00:31:48,520 --> 00:31:49,430
Is 4 bigger than 3?

696
00:31:49,430 --> 00:31:50,070
Yes.

697
00:31:50,070 --> 00:31:51,420
Get to the end.

698
00:31:51,420 --> 00:31:55,210
And when I get to the
end, what do I do?

699
00:31:55,210 --> 00:32:01,620
Well, you see, I say, take temp,
and store away what's

700
00:32:01,620 --> 00:32:04,300
here, all right?

701
00:32:04,300 --> 00:32:07,290
Which is that value, and then
take what min index is

702
00:32:07,290 --> 00:32:16,810
pointing to, and stick it in
there, and finally, replace

703
00:32:16,810 --> 00:32:21,550
that value.

704
00:32:21,550 --> 00:32:23,240
OK.

705
00:32:23,240 --> 00:32:24,900
Aren't you glad I'm
not a computer?

706
00:32:24,900 --> 00:32:26,890
Slow as hell.

707
00:32:26,890 --> 00:32:29,440
What's this thing doing?

708
00:32:29,440 --> 00:32:34,930
It's walking along the list,
looking for the smallest thing

709
00:32:34,930 --> 00:32:37,980
in the back end of the list,
keeping track of where it came

710
00:32:37,980 --> 00:32:41,220
from, and swapping it
with that spot in

711
00:32:41,220 --> 00:32:42,850
the list. All right?

712
00:32:42,850 --> 00:32:45,340
So in the first case, I didn't
have to do any swaps because 1

713
00:32:45,340 --> 00:32:46,150
was the smallest thing.

714
00:32:46,150 --> 00:32:49,700
In the second case, I found in
the next smallest element and

715
00:32:49,700 --> 00:32:52,550
moved here, taking what was
there and moving it on, in

716
00:32:52,550 --> 00:32:56,230
this case I would swap the 4 and
the 8, and in next case I

717
00:32:56,230 --> 00:32:58,300
wouldn't have to do anything.

718
00:32:58,300 --> 00:32:59,520
Let's check it out.

719
00:32:59,520 --> 00:33:02,650
I've written a little bit of a
test script here, so if we

720
00:33:02,650 --> 00:33:07,080
test cell sort, and I've written
this so that it's

721
00:33:07,080 --> 00:33:08,830
going to print out what
the list is at the end

722
00:33:08,830 --> 00:33:13,480
of each round, OK.

723
00:33:13,480 --> 00:33:16,110
Ah-ha.

724
00:33:16,110 --> 00:33:17,930
Notice what-- where am
I, here-- notice what

725
00:33:17,930 --> 00:33:19,000
happened in this case.

726
00:33:19,000 --> 00:33:22,220
At the end of the first round,
I've got the smallest element

727
00:33:22,220 --> 00:33:23,200
at the front.

728
00:33:23,200 --> 00:33:25,340
At the end of the second round,
I've got the smallest

729
00:33:25,340 --> 00:33:27,280
two elements at the front,
in fact I got all

730
00:33:27,280 --> 00:33:29,340
of them sorted out.

731
00:33:29,340 --> 00:33:31,810
And it actually runs through
the loop multiple times,

732
00:33:31,810 --> 00:33:33,180
making sure that it's
in the right form.

733
00:33:33,180 --> 00:33:36,710
Let's take another example.

734
00:33:36,710 --> 00:33:39,370
OK.

735
00:33:39,370 --> 00:33:40,950
Smallest element at the front.

736
00:33:40,950 --> 00:33:42,830
Smallest two elements
at the front.

737
00:33:42,830 --> 00:33:44,330
Smallest three elements
at the front.

738
00:33:44,330 --> 00:33:46,590
Smallest four elements at the
front, you get the idea.

739
00:33:46,590 --> 00:33:49,500
Smallest five elements
at the front.

740
00:33:49,500 --> 00:33:52,660
So this is a nice little
search-- sorry, a nice little

741
00:33:52,660 --> 00:33:53,210
sort algorithm .

742
00:33:53,210 --> 00:33:56,880
And in fact, it's relying on
something that we're going to

743
00:33:56,880 --> 00:33:59,200
come back to, called
the loop invariant.

744
00:33:59,200 --> 00:34:16,350
Actually, let me put it on this
board so you can see it.

745
00:34:16,350 --> 00:34:18,360
The loop invariant what does
the loop invariant mean?

746
00:34:18,360 --> 00:34:21,850
It says, here is a property that
is true of this structure

747
00:34:21,850 --> 00:34:23,510
every time through the loop.

748
00:34:23,510 --> 00:34:26,870
In the loop invariant here is
the following: the list is

749
00:34:26,870 --> 00:34:37,150
split, into a prefix or a first
part, and a suffix, the

750
00:34:37,150 --> 00:34:48,100
prefix is sorted, the suffix
is not, and basically, the

751
00:34:48,100 --> 00:34:50,410
loop starts off with the prefix
being nothing and it

752
00:34:50,410 --> 00:34:53,340
keeps increasing the size of the
prefix by 1 until it gets

753
00:34:53,340 --> 00:34:55,890
through the entire list, at
which point there's nothing in

754
00:34:55,890 --> 00:35:00,200
the suffix and entire
prefix is sorted.

755
00:35:00,200 --> 00:35:01,990
OK?

756
00:35:01,990 --> 00:35:04,114
So you can see that, it's just
walking through it, and in

757
00:35:04,114 --> 00:35:06,250
fact if I look at a couple of
another-- another couple of

758
00:35:06,250 --> 00:35:09,380
examples, it's been a
long day, again, you

759
00:35:09,380 --> 00:35:12,680
can see that property.

760
00:35:12,680 --> 00:35:16,470
You'll also notice that this
thing goes through the entire

761
00:35:16,470 --> 00:35:19,345
list, even if the list
is sorted before it

762
00:35:19,345 --> 00:35:20,030
gets partway through.

763
00:35:20,030 --> 00:35:22,720
And that you might look at,
for example, that first

764
00:35:22,720 --> 00:35:25,720
example, and say, man by this
stage it was already sorted,

765
00:35:25,720 --> 00:35:28,230
yet it had to go through and
check that the third element

766
00:35:28,230 --> 00:35:30,000
was in the right place, and then
the fourth and then the

767
00:35:30,000 --> 00:35:32,430
fifth and then the six.

768
00:35:32,430 --> 00:35:34,460
OK.

769
00:35:34,460 --> 00:35:35,740
What order of growth?

770
00:35:35,740 --> 00:35:40,450
What's complexity of this?

771
00:35:40,450 --> 00:35:43,200
I've got to get rid
of this candy.

772
00:35:43,200 --> 00:35:44,050
Anybody help me out?

773
00:35:44,050 --> 00:35:46,690
What's the complexity of this?

774
00:35:46,690 --> 00:35:49,010
Sorry, somebody at the back.

775
00:35:49,010 --> 00:35:49,680
n squared.

776
00:35:49,680 --> 00:35:52,810
Yeah, where n is what?

777
00:35:52,810 --> 00:35:54,940
Yeah, and I can't even see
who's saying that.

778
00:35:54,940 --> 00:35:56,220
Thank you.

779
00:35:56,220 --> 00:35:57,900
Sorry, I've got the wrong
glasses on, but you're

780
00:35:57,900 --> 00:36:00,030
absolutely right, and in case
the rest of you didn't hear

781
00:36:00,030 --> 00:36:03,890
it, n squared.

782
00:36:03,890 --> 00:36:05,980
How do I figure that out?

783
00:36:05,980 --> 00:36:09,630
Well I'm looping down
the list, right?

784
00:36:09,630 --> 00:36:12,660
I'm walking down the list. So
it's certainly at least linear

785
00:36:12,660 --> 00:36:15,230
in the length of the list.
For each starting

786
00:36:15,230 --> 00:36:15,940
point, what do I do?

787
00:36:15,940 --> 00:36:19,900
I look at the rest of the list
to decide what's the element

788
00:36:19,900 --> 00:36:21,300
to swap into the next place.

789
00:36:21,300 --> 00:36:23,200
Now, you might say, well,
wait a minute.

790
00:36:23,200 --> 00:36:26,310
As I keep moving down, that part
gets smaller, it's not

791
00:36:26,310 --> 00:36:29,110
always the initial length of
the list, and you're right.

792
00:36:29,110 --> 00:36:31,350
But if you do the sums, or if
you want to think of it this

793
00:36:31,350 --> 00:36:34,180
way, if you think about this
more generally, it's always on

794
00:36:34,180 --> 00:36:37,330
average at least the length of
the list. So I've got to do n

795
00:36:37,330 --> 00:36:39,160
things n times.

796
00:36:39,160 --> 00:36:42,620
So it's quadratic, in
terms of that sort.

797
00:36:42,620 --> 00:36:43,930
OK.

798
00:36:43,930 --> 00:36:45,680
That's one way to
do this sort.

799
00:36:45,680 --> 00:36:50,510
Let's do another one.

800
00:36:50,510 --> 00:36:52,180
The second one we're going to
do is called bubble sort.

801
00:36:52,180 --> 00:36:55,020
All right?

802
00:36:55,020 --> 00:36:59,980
And bubble sort is also
on your handout.

803
00:36:59,980 --> 00:37:07,970
And you want to take the first
of these, let me-- sorry, for

804
00:37:07,970 --> 00:37:10,480
a second let me uncomment
that, and let me

805
00:37:10,480 --> 00:37:11,510
comment this out--

806
00:37:11,510 --> 00:37:19,950
All right, you can see the code
for bubble sort there.

807
00:37:19,950 --> 00:37:21,770
Let's just look at it for a
second, then we'll try some

808
00:37:21,770 --> 00:37:23,100
examples, and then we'll
figure out what

809
00:37:23,100 --> 00:37:25,030
it's actually doing.

810
00:37:25,030 --> 00:37:27,890
So bubble sort, which
is right up here.

811
00:37:27,890 --> 00:37:28,630
What's it going to do?

812
00:37:28,630 --> 00:37:32,530
It's going to let j run over
the length of the list, all

813
00:37:32,530 --> 00:37:34,530
right, so it's going to start
at some point to move down,

814
00:37:34,530 --> 00:37:38,870
and then it's going to let i
run over range, that's just

815
00:37:38,870 --> 00:37:43,580
one smaller, and what's
it doing there?

816
00:37:43,580 --> 00:37:45,520
It's looking at successive
pairs, right?

817
00:37:45,520 --> 00:37:48,680
It's looking at the i'th and the
i plus first element, and

818
00:37:48,680 --> 00:37:51,240
it's saying, gee, if the i'th
element is bigger than the

819
00:37:51,240 --> 00:37:53,490
i'th plus first element, what's
the next set of three

820
00:37:53,490 --> 00:37:55,730
things doing?

821
00:37:55,730 --> 00:37:57,640
Just swapping them, right?

822
00:37:57,640 --> 00:37:59,770
I temporarily hold on to what's
in the i'th element so

823
00:37:59,770 --> 00:38:02,940
I can move the i plus first one
in, and then replace that

824
00:38:02,940 --> 00:38:05,210
with the i'th element.

825
00:38:05,210 --> 00:38:06,450
OK.

826
00:38:06,450 --> 00:38:08,980
What's this thing doing then,
in terms of sorting?

827
00:38:08,980 --> 00:38:13,230
At the end of the first pass,
what could I say about the

828
00:38:13,230 --> 00:38:16,910
result of this thing?

829
00:38:16,910 --> 00:38:25,360
What's the last element
in the list look like?

830
00:38:25,360 --> 00:38:28,050
I hate professors who do this.

831
00:38:28,050 --> 00:38:30,180
Well, let's try it.

832
00:38:30,180 --> 00:38:35,700
Let's try a little test. OK?

833
00:38:35,700 --> 00:38:40,850
Test bubble sort-- especially if
I could type-- let's run it

834
00:38:40,850 --> 00:38:49,740
on the first list. OK, let's
try it on another one.

835
00:38:49,740 --> 00:38:50,910
Oops sorry.

836
00:38:50,910 --> 00:38:53,520
Ah, I didn't want to do it this
time, I forgot to do the

837
00:38:53,520 --> 00:38:56,580
following, bear with me.

838
00:38:56,580 --> 00:38:57,890
I gave away my punchline.

839
00:38:57,890 --> 00:38:58,820
Let's try it again.

840
00:38:58,820 --> 00:39:04,440
Test bubble sort.

841
00:39:04,440 --> 00:39:07,180
OK, there's the first run, I'm
going to take a different

842
00:39:07,180 --> 00:39:18,720
list. Can you see
a pattern there?

843
00:39:18,720 --> 00:39:18,970
Yeah.

844
00:39:18,970 --> 00:39:22,180
STUDENT: The last cell in the
list is always going to

845
00:39:22,180 --> 00:39:22,510
[INAUDIBLE]

846
00:39:22,510 --> 00:39:23,400
PROFESSOR ERIC GRIMSON: Yeah.

847
00:39:23,400 --> 00:39:23,560
Why?

848
00:39:23,560 --> 00:39:24,380
You're right, but why?

849
00:39:24,380 --> 00:39:28,940
STUDENT: [UNINTELLIGIBLE PHRASE]

850
00:39:28,940 --> 00:39:29,910
PROFESSOR ERIC GRIMSON:
Exactly right.

851
00:39:29,910 --> 00:39:30,670
Thank you.

852
00:39:30,670 --> 00:39:37,090
The observation is, thank you,
on the first pass through, the

853
00:39:37,090 --> 00:39:40,110
last element is the biggest
thing in the list. On the next

854
00:39:40,110 --> 00:39:43,200
pass through, the next largest
element is at the second point

855
00:39:43,200 --> 00:39:43,280
in

856
00:39:43,280 --> 00:39:43,970
the list. OK?

857
00:39:43,970 --> 00:39:45,180
Because what am I doing?

858
00:39:45,180 --> 00:39:46,600
It's called bubble sort
because it's literally

859
00:39:46,600 --> 00:39:47,900
bubbling along, right?

860
00:39:47,900 --> 00:39:51,610
I'm walking along the list once,
taking two things, and

861
00:39:51,610 --> 00:39:53,510
saying, make sure the
biggest one is next.

862
00:39:53,510 --> 00:39:55,810
So wherever the largest element
started out in the

863
00:39:55,810 --> 00:39:59,800
list, by the time I get through
it, it's at the end.

864
00:39:59,800 --> 00:40:01,760
And then I go back and
I start again, and

865
00:40:01,760 --> 00:40:03,030
I do the same thing.

866
00:40:03,030 --> 00:40:03,250
OK.

867
00:40:03,250 --> 00:40:05,340
The next largest element
has to end up in

868
00:40:05,340 --> 00:40:06,740
the second last spot.

869
00:40:06,740 --> 00:40:07,290
Et cetera.

870
00:40:07,290 --> 00:40:09,990
All right, so it's called bubble
sort because it does

871
00:40:09,990 --> 00:40:12,340
this bubbling up until
it gets there.

872
00:40:12,340 --> 00:40:14,070
Now.

873
00:40:14,070 --> 00:40:15,110
What's the order
of growth here?

874
00:40:15,110 --> 00:40:19,810
What's the complexity?

875
00:40:19,810 --> 00:40:21,720
I haven't talked to the side
of the room in a while,

876
00:40:21,720 --> 00:40:23,160
actually I have. This gentleman
has helped me out.

877
00:40:23,160 --> 00:40:23,870
Somebody else help me out.

878
00:40:23,870 --> 00:40:27,970
What's the complexity here?

879
00:40:27,970 --> 00:40:31,700
I must have the wrong glasses
on to see a hand.

880
00:40:31,700 --> 00:40:34,160
No help.

881
00:40:34,160 --> 00:40:36,260
Log?

882
00:40:36,260 --> 00:40:38,050
Linear?

883
00:40:38,050 --> 00:40:40,450
Exponential?

884
00:40:40,450 --> 00:40:41,470
Quadratic?

885
00:40:41,470 --> 00:40:43,020
Yeah.

886
00:40:43,020 --> 00:40:44,970
Log.

887
00:40:44,970 --> 00:40:50,160
It's a good think, but why
do you think it's log?

888
00:40:50,160 --> 00:40:50,980
Ah-ha.

889
00:40:50,980 --> 00:40:53,490
It's not a bad instinct, the
length is getting shorter each

890
00:40:53,490 --> 00:40:54,400
time, but what's one of the

891
00:40:54,400 --> 00:40:56,260
characteristics of a log algorithm?

892
00:40:56,260 --> 00:40:58,920
It drops in half each time.

893
00:40:58,920 --> 00:41:00,900
So this isn't--

894
00:41:00,900 --> 00:41:01,230
OK.

895
00:41:01,230 --> 00:41:02,120
And you're also close.

896
00:41:02,120 --> 00:41:04,200
It's going to be linear,
but how many times do

897
00:41:04,200 --> 00:41:05,070
I go through this?

898
00:41:05,070 --> 00:41:08,980
All right, I've got to do one
pass to bubble the last

899
00:41:08,980 --> 00:41:10,080
element to the end.

900
00:41:10,080 --> 00:41:12,470
I've got to do another pass to
bubble the second last element

901
00:41:12,470 --> 00:41:12,720
to the end.

902
00:41:12,720 --> 00:41:14,730
I've got to do another pass.

903
00:41:14,730 --> 00:41:15,800
Huh.

904
00:41:15,800 --> 00:41:19,400
Sounds like a linear number of
times I've got to do-- oh

905
00:41:19,400 --> 00:41:20,150
fudge knuckle.

906
00:41:20,150 --> 00:41:23,230
A linear number of things,
quadratic.

907
00:41:23,230 --> 00:41:25,220
Right?

908
00:41:25,220 --> 00:41:25,600
OK.

909
00:41:25,600 --> 00:41:32,620
So this is again an example,
this was quadratic, and this

910
00:41:32,620 --> 00:41:35,130
one was quadratic.

911
00:41:35,130 --> 00:41:40,690
And I have this, to write it
out, this is order the length

912
00:41:40,690 --> 00:41:43,720
of the list squared, OK?

913
00:41:43,720 --> 00:41:44,820
Just to make it clear
what we're

914
00:41:44,820 --> 00:41:48,250
actually measuring there.

915
00:41:48,250 --> 00:41:48,360
All

916
00:41:48,360 --> 00:41:48,720
right.

917
00:41:48,720 --> 00:41:50,870
Could we do better?

918
00:41:50,870 --> 00:41:52,110
Sure.

919
00:41:52,110 --> 00:41:54,530
And in fact, next time we're
going to show you that n log n

920
00:41:54,530 --> 00:41:57,050
algorithm, but even with bubble
sort, we can do better.

921
00:41:57,050 --> 00:42:00,290
In a particular, if I look at
those traces, I can certainly

922
00:42:00,290 --> 00:42:03,950
see cases where, man, I already
had the list sorted

923
00:42:03,950 --> 00:42:06,350
much earlier on, and yet I
kept going back to see if

924
00:42:06,350 --> 00:42:08,620
there was anything else
to bubble up.

925
00:42:08,620 --> 00:42:09,870
How would I keep
track of that?

926
00:42:09,870 --> 00:42:12,600
Could I take advantage of that?

927
00:42:12,600 --> 00:42:13,860
Sure.

928
00:42:13,860 --> 00:42:16,550
Why don't I just keep track
on each pass through the

929
00:42:16,550 --> 00:42:18,720
algorithm whether I have
done any swaps?

930
00:42:18,720 --> 00:42:20,210
All right?

931
00:42:20,210 --> 00:42:22,480
Because if I don't do any swaps
on a pass through the

932
00:42:22,480 --> 00:42:23,850
algorithm, then it
says everything's

933
00:42:23,850 --> 00:42:24,820
in the right order.

934
00:42:24,820 --> 00:42:28,180
And so, in fact, the version
that I commented out-- which

935
00:42:28,180 --> 00:42:29,880
is also in your handout and I'm
now going to uncomment,

936
00:42:29,880 --> 00:42:38,820
let's get that one out, get rid
of this one-- notice the

937
00:42:38,820 --> 00:42:39,810
only change.

938
00:42:39,810 --> 00:42:42,113
I'm going to keep track of a
little variable called swap,

939
00:42:42,113 --> 00:42:46,225
it's initially true, and as long
as it's true, I'm going

940
00:42:46,225 --> 00:42:49,010
to keep going, but inside of the
loop I'm going to set it

941
00:42:49,010 --> 00:42:53,620
to false, and only if I do a
swap will I set it to true.

942
00:42:53,620 --> 00:42:56,070
This says, if I go through an
entire pass through the list

943
00:42:56,070 --> 00:42:58,410
and nothing gets changed,
I'm done.

944
00:42:58,410 --> 00:43:09,730
And in fact if I do that, and
try test bubble sort, well, in

945
00:43:09,730 --> 00:43:13,080
the first case, looks the same.

946
00:43:13,080 --> 00:43:13,620
Ah.

947
00:43:13,620 --> 00:43:17,660
On the second case, I
spot it right away.

948
00:43:17,660 --> 00:43:20,340
On the third case, it takes me
the same amount of time.

949
00:43:20,340 --> 00:43:24,210
And the fourth case, when
I set it up, I'm done.

950
00:43:24,210 --> 00:43:24,340
OK.

951
00:43:24,340 --> 00:43:25,670
So what's the lesson here?

952
00:43:25,670 --> 00:43:28,420
I can be a little more careful
about keeping track of what

953
00:43:28,420 --> 00:43:30,000
goes on inside of that loop.

954
00:43:30,000 --> 00:43:31,930
If I don't have any more work
to do, let me just stop.

955
00:43:31,930 --> 00:43:33,230
All right.

956
00:43:33,230 --> 00:43:36,940
Nonetheless, even with this
change, what's the order

957
00:43:36,940 --> 00:43:39,080
growth for bubble sort?

958
00:43:39,080 --> 00:43:40,380
Still quadratic, right?

959
00:43:40,380 --> 00:43:42,360
I'm looking for the worst case
behavior, it's still

960
00:43:42,360 --> 00:43:44,630
quadratic, it's quadratic in the
length of the list, so I'm

961
00:43:44,630 --> 00:43:47,180
sort of stuck with that.

962
00:43:47,180 --> 00:43:47,560
Now.

963
00:43:47,560 --> 00:43:49,120
Let me ask you one last
question, and then

964
00:43:49,120 --> 00:43:51,070
we'll wrap this up.

965
00:43:51,070 --> 00:43:55,420
Which of these algorithms
is better?

966
00:43:55,420 --> 00:43:57,020
Insertion sort or bubble sort?

967
00:43:57,020 --> 00:43:59,140
STUDENT: Bubble.

968
00:43:59,140 --> 00:43:59,520
PROFESSOR ERIC GRIMSON:
Bubble.

969
00:43:59,520 --> 00:44:00,270
Bubble bubble toil
and trouble.

970
00:44:00,270 --> 00:44:01,780
Who said bubble?

971
00:44:01,780 --> 00:44:02,140
Why?

972
00:44:02,140 --> 00:44:04,836
STUDENT: Well, the first
one was too inefficient

973
00:44:04,836 --> 00:44:07,195
[UNINTELLIGIBLE] store and
compare each one, so

974
00:44:07,195 --> 00:44:15,320
[UNINTELLIGIBLE]

975
00:44:15,320 --> 00:44:16,380
PROFESSOR ERIC GRIMSON: It's
not a bad instinct.

976
00:44:16,380 --> 00:44:16,600
Right.

977
00:44:16,600 --> 00:44:19,680
So it-- so, your argument is,
bubble is better because it's

978
00:44:19,680 --> 00:44:23,300
is essentially not doing all
these extra comparisons.

979
00:44:23,300 --> 00:44:25,150
Another way of saying it is,
I can do this stop when

980
00:44:25,150 --> 00:44:25,900
I don't need to.

981
00:44:25,900 --> 00:44:26,450
All right?

982
00:44:26,450 --> 00:44:28,120
OK.

983
00:44:28,120 --> 00:44:30,660
Anybody have an opposing
opinion?

984
00:44:30,660 --> 00:44:34,260
Wow, this sounds like a
presidential debate.

985
00:44:34,260 --> 00:44:35,320
Sorry, I should reward you.

986
00:44:35,320 --> 00:44:37,380
Thank you for that statement.

987
00:44:37,380 --> 00:44:40,160
Anybody have an opposing
opinion?

988
00:44:40,160 --> 00:44:41,730
Everybody's answering these
things and sitting

989
00:44:41,730 --> 00:44:42,390
way up at the back.

990
00:44:42,390 --> 00:44:44,340
Nice catch.

991
00:44:44,340 --> 00:44:44,720
Yeah.

992
00:44:44,720 --> 00:44:55,160
STUDENT: [INAUDIBLE]

993
00:44:55,160 --> 00:44:55,990
PROFESSOR ERIC GRIMSON: I
don't think so, right?

994
00:44:55,990 --> 00:44:57,690
I think selection sort, I
still have to go through

995
00:44:57,690 --> 00:45:01,750
multiple times, it was still
quadratic, OK, but I think

996
00:45:01,750 --> 00:45:03,540
you're heading towards a
direction I want to get at, so

997
00:45:03,540 --> 00:45:05,150
let me prime this
a little bit.

998
00:45:05,150 --> 00:45:10,120
How many swaps do I do in
general in bubble sort,

999
00:45:10,120 --> 00:45:13,650
compared to selection source?

1000
00:45:13,650 --> 00:45:14,340
God bless.

1001
00:45:14,340 --> 00:45:18,840
Oh, sorry, that wasn't a
sneeze, it was a two?

1002
00:45:18,840 --> 00:45:23,460
How many swaps do I
do in bubble sort?

1003
00:45:23,460 --> 00:45:24,430
A lot.

1004
00:45:24,430 --> 00:45:24,840
Right.

1005
00:45:24,840 --> 00:45:27,160
Potentially a lot because I'm
constantly doing that, that

1006
00:45:27,160 --> 00:45:29,620
says I'm running that inner loop
a whole bunch of times.

1007
00:45:29,620 --> 00:45:34,350
How many swaps do I do
in selection sort?

1008
00:45:34,350 --> 00:45:36,190
Once each time.

1009
00:45:36,190 --> 00:45:36,320
Right?

1010
00:45:36,320 --> 00:45:39,130
I only do one swap potentially,
it-- though not

1011
00:45:39,130 --> 00:45:40,750
one potentially, each
time at the end of

1012
00:45:40,750 --> 00:45:42,450
the loop I do a swap.

1013
00:45:42,450 --> 00:45:45,480
So this actually suggests again,
the orders of growth

1014
00:45:45,480 --> 00:45:49,060
are the same, but probably
selection sort is a more

1015
00:45:49,060 --> 00:45:51,710
efficient algorithm, because
I'm not doing that constant

1016
00:45:51,710 --> 00:45:53,010
amount of work every
time around.

1017
00:45:53,010 --> 00:45:55,890
And in fact, if you go look up,
you won't see bubble sort

1018
00:45:55,890 --> 00:45:56,780
used very much.

1019
00:45:56,780 --> 00:45:57,240
Most--

1020
00:45:57,240 --> 00:45:59,140
I shouldn't say most, many
computer scientists don't

1021
00:45:59,140 --> 00:46:00,700
think it should be taught,
because it's just so

1022
00:46:00,700 --> 00:46:01,770
inefficient.

1023
00:46:01,770 --> 00:46:03,950
I disagree, because it's a
clever idea, but it's still

1024
00:46:03,950 --> 00:46:06,310
something that we have
to keep track of.

1025
00:46:06,310 --> 00:46:07,770
All right.

1026
00:46:07,770 --> 00:46:10,140
We haven't gotten to our n log n
algorithm, we're going to do

1027
00:46:10,140 --> 00:46:14,150
that next time, but I want to
set the stage here by pulling

1028
00:46:14,150 --> 00:46:16,940
out one last piece.

1029
00:46:16,940 --> 00:46:17,330
OK.

1030
00:46:17,330 --> 00:46:19,130
Could we do better in
terms of sorting?

1031
00:46:19,130 --> 00:46:20,270
Again, remember what
our goal was.

1032
00:46:20,270 --> 00:46:23,300
If we could do sort, then we
saw, if we amortized the cost,

1033
00:46:23,300 --> 00:46:25,955
that searching is a lot more
efficient if we're searching a

1034
00:46:25,955 --> 00:46:27,060
sorted list.

1035
00:46:27,060 --> 00:46:29,030
How could we do better?

1036
00:46:29,030 --> 00:46:30,500
Let me set the stage.

1037
00:46:30,500 --> 00:46:34,690
I already said, back here, when
I used this board, that

1038
00:46:34,690 --> 00:46:36,630
this idea was really
important.

1039
00:46:36,630 --> 00:46:43,260
And that's because that is
a version of a divide and

1040
00:46:43,260 --> 00:46:48,590
conquer algorithm.

1041
00:46:48,590 --> 00:46:48,800
OK.

1042
00:46:48,800 --> 00:46:51,460
Binary search is perhaps the
simplest of the divide and

1043
00:46:51,460 --> 00:46:53,070
conquer algorithms, and
what does that mean?

1044
00:46:53,070 --> 00:46:56,230
It says, in order to solve a
problem, cut it down to a

1045
00:46:56,230 --> 00:46:58,770
smaller problem and try
and solve that one.

1046
00:46:58,770 --> 00:47:01,740
So to just preface what we're
going to do next time, what

1047
00:47:01,740 --> 00:47:04,810
would happen if I wanted to do
sort, and rather than in

1048
00:47:04,810 --> 00:47:09,210
sorting the entire list at once,
I broke it into pieces,

1049
00:47:09,210 --> 00:47:12,150
and sorted the pieces, and then
just figured out a very

1050
00:47:12,150 --> 00:47:15,240
efficient way to bring those two
pieces and merge them back

1051
00:47:15,240 --> 00:47:16,420
together again?

1052
00:47:16,420 --> 00:47:19,310
Where those pieces, I would do
the same thing with, I would

1053
00:47:19,310 --> 00:47:23,360
divide them up into smaller
chunks, and sort those.

1054
00:47:23,360 --> 00:47:25,760
Is that going to give me a
more efficient algorithm?

1055
00:47:25,760 --> 00:47:27,580
And if you come back
on Thursday,

1056
00:47:27,580 --> 00:47:29,420
we'll answer that question.