1
00:00:00,050 --> 00:00:02,490
The following content is
provided under a Creative

2
00:00:02,490 --> 00:00:03,900
Commons license.

3
00:00:03,900 --> 00:00:06,940
Your support will help MIT
OpenCourseWare continue to

4
00:00:06,940 --> 00:00:10,600
offer high-quality educational
resources for free.

5
00:00:10,600 --> 00:00:13,490
To make a donation or view
additional materials from

6
00:00:13,490 --> 00:00:19,320
hundreds of MIT courses, visit
MIT OpenCourseWare at

7
00:00:19,320 --> 00:00:22,170
ocw.mit.edu.

8
00:00:22,170 --> 00:00:24,750
PROFESSOR JOHN GUTTAG: In the
example we looked at, we had a

9
00:00:24,750 --> 00:00:27,350
list of ints.

10
00:00:27,350 --> 00:00:31,340
That's actually quite easy
to do in constant time.

11
00:00:31,340 --> 00:00:38,210
If you think about it, an int is
always going to occupy the

12
00:00:38,210 --> 00:00:44,070
same amount of space, roughly
speaking, either 32 or 64

13
00:00:44,070 --> 00:00:48,340
bits, depending upon
how big an int the

14
00:00:48,340 --> 00:00:50,980
language wants to support.

15
00:00:50,980 --> 00:00:56,100
So let's just, for the sake
of argument, assume an int

16
00:00:56,100 --> 00:00:59,970
occupies four units of memory.

17
00:00:59,970 --> 00:01:01,280
And I don't care
what a unit is.

18
00:01:01,280 --> 00:01:04,104
Is a unit 8 bits, 16 bits?

19
00:01:04,104 --> 00:01:06,110
It doesn't matter.

20
00:01:06,110 --> 00:01:09,240
4 units.

21
00:01:09,240 --> 00:01:14,450
How would we get to the i-th
element of the list?

22
00:01:14,450 --> 00:01:20,590
What is the location in
memory of L-th of i?

23
00:01:27,700 --> 00:01:30,015
Well, if we know the location
of the start of the list--

24
00:01:35,010 --> 00:01:40,240
and certainly we can know that
because our identifier, say L

25
00:01:40,240 --> 00:01:44,310
in this case, will point to
the start of the list--

26
00:01:44,310 --> 00:01:50,590
then it's simply going to be
the start plus 4 times i.

27
00:01:55,890 --> 00:01:58,010
My list looks like this.

28
00:01:58,010 --> 00:02:01,320
I point to the start.

29
00:02:01,320 --> 00:02:03,680
The first element is here.

30
00:02:03,680 --> 00:02:07,840
So, that's start
plus 4 times 0.

31
00:02:07,840 --> 00:02:10,770
Makes perfect sense.

32
00:02:10,770 --> 00:02:12,790
The second element is here.

33
00:02:12,790 --> 00:02:15,970
So, that's going to be
start plus 4 times 1.

34
00:02:15,970 --> 00:02:20,140
Sure enough, this would be
location 4, relative to the

35
00:02:20,140 --> 00:02:23,680
start of the list, et cetera.

36
00:02:23,680 --> 00:02:29,520
This is a very conventional
way to implement lists.

37
00:02:29,520 --> 00:02:33,060
But what does its correctness
depend upon?

38
00:02:35,780 --> 00:02:40,070
It depends upon the fact that
each element of the list is of

39
00:02:40,070 --> 00:02:41,320
the same size.

40
00:02:45,340 --> 00:02:46,760
In this case, it's 4.

41
00:02:46,760 --> 00:02:50,050
But I don't care if it's 4.

42
00:02:50,050 --> 00:02:52,240
If it's 2, it's 2 times i.

43
00:02:52,240 --> 00:02:54,560
If it's 58, it's 58 times i.

44
00:02:54,560 --> 00:02:55,840
It doesn't matter.

45
00:02:55,840 --> 00:03:01,850
But what matters is that each
element is the same size.

46
00:03:01,850 --> 00:03:05,790
So this trick would work for
accessing elements of lists of

47
00:03:05,790 --> 00:03:12,010
floats, lists of ints, anything
that's of fixed size.

48
00:03:12,010 --> 00:03:15,580
But that's not the way
lists are in Python.

49
00:03:15,580 --> 00:03:19,670
In Python, I can have a list
that contains ints, and

50
00:03:19,670 --> 00:03:24,070
floats, and strings, and
other lists, and

51
00:03:24,070 --> 00:03:27,080
dicts, almost anything.

52
00:03:27,080 --> 00:03:32,440
So, in Python, it's not this
nice picture where the lists

53
00:03:32,440 --> 00:03:34,890
are all homogeneous.

54
00:03:34,890 --> 00:03:39,110
In many languages they
are, by the way.

55
00:03:39,110 --> 00:03:42,480
And those languages would
implement it exactly as I've

56
00:03:42,480 --> 00:03:45,080
outlined it on the board here.

57
00:03:45,080 --> 00:03:48,495
But what about languages where
they're not, like Python?

58
00:03:51,700 --> 00:03:53,870
One possibility--

59
00:03:53,870 --> 00:03:57,720
and this is probably the oldest
way that people used to

60
00:03:57,720 --> 00:03:59,520
implement lists--

61
00:03:59,520 --> 00:04:00,970
is the notion of
a linked list.

62
00:04:03,840 --> 00:04:06,800
These were used way back
in the 1960s, when

63
00:04:06,800 --> 00:04:09,570
Lisp was first invented.

64
00:04:09,570 --> 00:04:13,165
And, effectively, there,
what you do is a list.

65
00:04:15,910 --> 00:04:24,460
Every element of the list is a
pointer to the next element.

66
00:04:28,300 --> 00:04:29,550
And then the value.

67
00:04:36,810 --> 00:04:42,335
So what it looks like in memory
is we have the list.

68
00:04:45,020 --> 00:04:48,760
And this points to the next
element, which maybe has a

69
00:04:48,760 --> 00:04:51,880
much bigger value field.

70
00:04:51,880 --> 00:04:53,750
But that's OK.

71
00:04:53,750 --> 00:04:57,000
This points to the
next element.

72
00:04:57,000 --> 00:04:59,570
Let's say this one, maybe,
is a tiny value field.

73
00:05:06,160 --> 00:05:12,830
And then at the end of the
list, I might write none,

74
00:05:12,830 --> 00:05:15,810
saying there is no
next element.

75
00:05:15,810 --> 00:05:17,450
Or nil, in Lisp speak.

76
00:05:20,080 --> 00:05:23,640
But what's the cost here of
accessing the nth element of

77
00:05:23,640 --> 00:05:27,160
the list, of the i-th
element of the list?

78
00:05:30,880 --> 00:05:33,660
Somebody?

79
00:05:33,660 --> 00:05:38,450
How many steps does it take
to find element i?

80
00:05:38,450 --> 00:05:38,940
AUDIENCE: i.

81
00:05:38,940 --> 00:05:39,430
AUDIENCE: i?

82
00:05:39,430 --> 00:05:41,310
PROFESSOR JOHN GUTTAG:
i steps, exactly.

83
00:05:41,310 --> 00:05:50,900
So for a linked list, finding
the i-th element is order i.

84
00:05:50,900 --> 00:05:53,370
That's not very good.

85
00:05:53,370 --> 00:05:56,650
That won't help me with
binary search.

86
00:05:56,650 --> 00:06:00,890
Because if this were the case
for finding an element of a

87
00:06:00,890 --> 00:06:06,390
list in Python, binary search
would not be log length of the

88
00:06:06,390 --> 00:06:10,830
list, but it would be order
length of the list.

89
00:06:10,830 --> 00:06:13,640
Because the worst case is I'd
have to visit every element of

90
00:06:13,640 --> 00:06:16,830
the list, say, to discover
something isn't in it.

91
00:06:22,690 --> 00:06:28,330
So, this is not what
you want to do.

92
00:06:28,330 --> 00:06:32,020
Instead, Python uses
something like the

93
00:06:32,020 --> 00:06:33,270
picture in your handout.

94
00:06:39,920 --> 00:06:42,840
And the key idea here is
one of indirection.

95
00:06:46,610 --> 00:06:57,420
So in Python, what a list looks
like is it is a list, a

96
00:06:57,420 --> 00:07:06,080
section of memory, a list of
objects each of the same size.

97
00:07:06,080 --> 00:07:10,590
Because now what each
object is a pointer.

98
00:07:10,590 --> 00:07:15,570
So we've now separated in
space the values of the

99
00:07:15,570 --> 00:07:17,630
members of the list and the
pointers to, if you

100
00:07:17,630 --> 00:07:19,730
will, the next one.

101
00:07:19,730 --> 00:07:23,210
So now, it can be very simple.

102
00:07:23,210 --> 00:07:26,510
This first element
could be big.

103
00:07:26,510 --> 00:07:28,840
Second element could be small.

104
00:07:28,840 --> 00:07:30,090
We don't care.

105
00:07:34,290 --> 00:07:39,370
Now I'm back to exactly the
model we looked at here.

106
00:07:43,020 --> 00:07:47,190
If, say, a pointer to someplace
in memory is 4 units

107
00:07:47,190 --> 00:07:55,430
long, then to find l-th of i, I
use that trick to find, say,

108
00:07:55,430 --> 00:07:57,710
the i-th pointer.

109
00:07:57,710 --> 00:08:00,540
And then it takes me only one
step to follow it to get to

110
00:08:00,540 --> 00:08:01,790
the object.

111
00:08:05,770 --> 00:08:14,910
So, I can now, in constant time,
access any object into a

112
00:08:14,910 --> 00:08:17,615
list, even though the
objects in the list

113
00:08:17,615 --> 00:08:21,090
are of varying size.

114
00:08:21,090 --> 00:08:25,510
This is the way it's done
in all object-oriented

115
00:08:25,510 --> 00:08:28,035
programming languages.

116
00:08:28,035 --> 00:08:29,545
Does that makes sense
to everybody?

117
00:08:35,340 --> 00:08:41,230
This concept of indirection is
one of the most powerful

118
00:08:41,230 --> 00:08:44,570
programming techniques
we have.

119
00:08:44,570 --> 00:08:45,820
It gets used a lot.

120
00:08:48,960 --> 00:08:51,690
My dictionary defines
indirection as a lack of

121
00:08:51,690 --> 00:08:56,090
straightforwardness and openness
and as a synonym uses

122
00:08:56,090 --> 00:08:57,340
deceitfulness.

123
00:08:59,300 --> 00:09:02,930
And it had this pejorative term
until about 1950 when

124
00:09:02,930 --> 00:09:06,130
computer scientists discovered
it and decided it was a

125
00:09:06,130 --> 00:09:08,180
wonderful thing.

126
00:09:08,180 --> 00:09:10,720
There's something that's often
quoted at people who do

127
00:09:10,720 --> 00:09:12,210
algorithms.

128
00:09:12,210 --> 00:09:15,200
They say quote, "all problems
in computer science can be

129
00:09:15,200 --> 00:09:19,140
solved by another level of
indirection." So, it's sort

130
00:09:19,140 --> 00:09:21,420
of, whenever you're stuck,
you add another level of

131
00:09:21,420 --> 00:09:23,470
indirection.

132
00:09:23,470 --> 00:09:26,850
The caveat to this is the one
problem that can't be solved

133
00:09:26,850 --> 00:09:30,690
by adding another level of
indirection is too many levels

134
00:09:30,690 --> 00:09:34,580
of indirection, which
can be a problem.

135
00:09:34,580 --> 00:09:38,700
As you look at certain kinds of
memory structures, the fact

136
00:09:38,700 --> 00:09:42,580
that you've separated the
pointers from the value fields

137
00:09:42,580 --> 00:09:46,910
can lead to them being in very
far apart in memory, which can

138
00:09:46,910 --> 00:09:51,000
disturb behaviors of caches
and things like that.

139
00:09:51,000 --> 00:09:55,120
So in some models of memory this
can lead to surprising

140
00:09:55,120 --> 00:09:56,660
inefficiency.

141
00:09:56,660 --> 00:09:58,730
But most of the time
it's really a great

142
00:09:58,730 --> 00:10:01,120
implementation technique.

143
00:10:01,120 --> 00:10:05,420
And I highly recommend it.

144
00:10:05,420 --> 00:10:07,640
So that's how we do the trick.

145
00:10:07,640 --> 00:10:12,280
Now we can convince ourselves
that binary search is indeed

146
00:10:12,280 --> 00:10:14,390
order log n.

147
00:10:14,390 --> 00:10:17,090
And as we saw Tuesday,
logarithmic

148
00:10:17,090 --> 00:10:19,130
growth is very slow.

149
00:10:19,130 --> 00:10:22,360
So it means we can use binary
search to search enormous

150
00:10:22,360 --> 00:10:25,280
lists and get the answer
very quickly.

151
00:10:28,820 --> 00:10:28,980
All right.

152
00:10:28,980 --> 00:10:32,230
There's still one catch.

153
00:10:32,230 --> 00:10:34,400
And what's the catch?

154
00:10:34,400 --> 00:10:37,970
There's an assumption
to binary search.

155
00:10:37,970 --> 00:10:42,310
Binary search works only when
what assumption is true?

156
00:10:42,310 --> 00:10:43,240
AUDIENCE: It's sorted.

157
00:10:43,240 --> 00:10:51,850
PROFESSOR JOHN GUTTAG: The
list is sorted because it

158
00:10:51,850 --> 00:10:53,455
depends on that piece
of knowledge.

159
00:10:55,960 --> 00:11:00,835
So, that raises the question,
how did it get sorted?

160
00:11:03,370 --> 00:11:07,600
Or the other question it raises,
if I ask you to search

161
00:11:07,600 --> 00:11:11,520
for something, does it make
sense to follow the algorithm

162
00:11:11,520 --> 00:11:19,075
of (1) sort L, (2) use
binary search?

163
00:11:25,990 --> 00:11:29,500
Does that make sense?

164
00:11:29,500 --> 00:11:34,490
Well, what does it depend upon,
whether this makes sense

165
00:11:34,490 --> 00:11:37,080
from an efficiency
point of view?

166
00:11:37,080 --> 00:11:54,210
We know that that's order log
length of L. We also know if

167
00:11:54,210 --> 00:12:01,440
the list isn't sorted, we can
do it in order L. We can

168
00:12:01,440 --> 00:12:04,590
always use linear search.

169
00:12:04,590 --> 00:12:09,780
So, whether or not this makes
a good idea depends upon

170
00:12:09,780 --> 00:12:15,750
whether we can do this
fast enough.

171
00:12:15,750 --> 00:12:28,250
It has the question is order
question mark plus order log

172
00:12:28,250 --> 00:12:35,360
len of L less than order L?

173
00:12:38,580 --> 00:12:42,030
If it's not, it doesn't make
sense to sort it first in

174
00:12:42,030 --> 00:12:43,280
sums, right?

175
00:12:45,660 --> 00:12:49,510
So what's the answer
to this question?

176
00:12:49,510 --> 00:12:53,940
Do we think we can sort
a list fast enough?

177
00:12:53,940 --> 00:12:56,060
And what would fast
enough mean?

178
00:12:56,060 --> 00:12:57,310
What would it have to be?

179
00:12:59,610 --> 00:13:03,250
For this to be better than this,
we know that we have to

180
00:13:03,250 --> 00:13:06,375
be able to sort a list
in sublinear time.

181
00:13:10,240 --> 00:13:12,510
Can we do that?

182
00:13:12,510 --> 00:13:16,410
Alas, the answer
is provably no.

183
00:13:16,410 --> 00:13:19,190
No matter how clever we are,
there is no algorithm that

184
00:13:19,190 --> 00:13:23,150
will sort a list in
sublinear time.

185
00:13:23,150 --> 00:13:25,990
And if you think of it, that
makes a lot of sense.

186
00:13:25,990 --> 00:13:28,740
Because, how can you get
a list in ascending or

187
00:13:28,740 --> 00:13:31,460
descending order without looking
at every element in

188
00:13:31,460 --> 00:13:35,000
the list at least once?

189
00:13:35,000 --> 00:13:37,700
Logic says you just
can't do it.

190
00:13:37,700 --> 00:13:40,130
If you're going to put something
in order, you're

191
00:13:40,130 --> 00:13:42,970
going to have to look at it.

192
00:13:42,970 --> 00:13:48,930
So we know that we have a lower
bound on sorting, which

193
00:13:48,930 --> 00:13:53,830
is order L. And we know that
order L plus order log length

194
00:13:53,830 --> 00:14:04,520
L is the same as order L, which
is not better than that.

195
00:14:04,520 --> 00:14:05,790
So why do we care?

196
00:14:05,790 --> 00:14:08,840
If this is true, why are we
interested in things like

197
00:14:08,840 --> 00:14:11,380
binary search at all?

198
00:14:11,380 --> 00:14:16,500
And the reason is we're often
interested in something called

199
00:14:16,500 --> 00:14:18,765
amortized complexity.

200
00:14:25,260 --> 00:14:27,650
I know that there are some
course 15 students in the

201
00:14:27,650 --> 00:14:31,360
class who will know what
amortization means.

202
00:14:31,360 --> 00:14:34,250
But maybe not everybody does.

203
00:14:34,250 --> 00:14:41,900
The idea here is that if we can
sort the list once and end

204
00:14:41,900 --> 00:14:53,590
up searching it many times, the
cost of the sort can be

205
00:14:53,590 --> 00:14:58,840
allocated, a little bit of it,
to each of the searches.

206
00:14:58,840 --> 00:15:03,480
And if we do enough searches,
then in fact it doesn't really

207
00:15:03,480 --> 00:15:07,680
matter how long the
sort takes.

208
00:15:07,680 --> 00:15:11,860
So if we were going to search
this list a million times,

209
00:15:11,860 --> 00:15:14,660
maybe we don't care
about the one-time

210
00:15:14,660 --> 00:15:18,450
overhead of sorting it.

211
00:15:18,450 --> 00:15:22,510
And this kind of amortized
analysis is quite common and

212
00:15:22,510 --> 00:15:27,510
is what we really end up doing
most of the time in practice.

213
00:15:27,510 --> 00:15:39,930
So the real question we want
to ask is, if we plan on

214
00:15:39,930 --> 00:15:41,303
performing k searches--

215
00:15:50,550 --> 00:15:54,160
who knows how long it will
take to sort it--

216
00:15:54,160 --> 00:16:03,480
what we'll take is order of
whatever sort of the list is,

217
00:16:03,480 --> 00:16:29,530
plus k times log length of
L. Is that less than k

218
00:16:29,530 --> 00:16:33,230
times len of L?

219
00:16:37,290 --> 00:16:40,840
If I don't sort it, to do
k sort searches will

220
00:16:40,840 --> 00:16:42,890
take this much time.

221
00:16:42,890 --> 00:16:45,680
If I do sort it, it will
take this much time.

222
00:16:48,730 --> 00:16:51,510
The answer to this question, of
course, depends upon what's

223
00:16:51,510 --> 00:16:55,150
the complexity of that
and how big is k.

224
00:17:01,273 --> 00:17:04,099
Does that make sense?

225
00:17:04,099 --> 00:17:09,329
In practice, k is
often very big.

226
00:17:09,329 --> 00:17:13,310
The number of times we access,
say, a student record is quite

227
00:17:13,310 --> 00:17:16,369
large compared to the
number of times

228
00:17:16,369 --> 00:17:19,970
students enroll in MIT.

229
00:17:19,970 --> 00:17:23,050
So if at the start of each
semester we produce a sorted

230
00:17:23,050 --> 00:17:29,610
list, it pays off to
do the searches.

231
00:17:29,610 --> 00:17:31,210
In fact, we don't do
a sorted list.

232
00:17:31,210 --> 00:17:33,220
We do something more complex.

233
00:17:33,220 --> 00:17:35,260
But you understand the
concept I hope.

234
00:17:40,080 --> 00:17:46,280
Now we have to say, how
well can we do that?

235
00:17:46,280 --> 00:17:49,190
That's what I want to spend most
of the rest of today on

236
00:17:49,190 --> 00:17:52,930
now is talking about how do we
do sorting because it is a

237
00:17:52,930 --> 00:17:56,750
very common operation.

238
00:17:56,750 --> 00:18:00,355
First of all, let's look at
a way we don't do sorting.

239
00:18:04,720 --> 00:18:08,240
There was a famous computer
scientist who

240
00:18:08,240 --> 00:18:09,490
opined on this topic.

241
00:18:12,520 --> 00:18:16,485
We can look for him this way.

242
00:18:16,485 --> 00:18:18,775
A well-known technique
is bubble sort.

243
00:18:26,820 --> 00:18:28,460
Actually, stop.

244
00:18:28,460 --> 00:18:30,180
We're going to need
sound for this.

245
00:18:30,180 --> 00:18:31,720
Do we have sound in the booth?

246
00:18:34,720 --> 00:18:36,340
Do we have somebody
in the booth?

247
00:18:39,600 --> 00:18:42,980
Well, we either have
sound or we don't.

248
00:18:42,980 --> 00:18:44,230
We'll find out shortly.

249
00:19:06,180 --> 00:19:07,600
Other way.

250
00:19:07,600 --> 00:19:08,120
Come on.

251
00:19:08,120 --> 00:19:08,920
You should know.

252
00:19:08,920 --> 00:19:10,170
Oh there.

253
00:19:12,050 --> 00:19:12,920
Thank you.

254
00:19:12,920 --> 00:19:13,350
[VIDEO PLAYBACK]

255
00:19:13,350 --> 00:19:16,380
-Now, it's hard to get
a job as President.

256
00:19:16,380 --> 00:19:18,000
And you're going through
the rigors now.

257
00:19:18,000 --> 00:19:21,510
It's also hard to get
a job at Google.

258
00:19:21,510 --> 00:19:25,180
We have questions, and we ask
our candidates questions.

259
00:19:25,180 --> 00:19:28,110
And this one is from
Larry Schwimmer.

260
00:19:28,110 --> 00:19:30,774
[LAUGHTER]

261
00:19:30,774 --> 00:19:31,980
-You guys think I'm kidding?

262
00:19:31,980 --> 00:19:34,350
It's right here.

263
00:19:34,350 --> 00:19:36,300
What is the most efficient
way to sort a

264
00:19:36,300 --> 00:19:37,350
million 32-bit integers?

265
00:19:37,350 --> 00:19:41,170
[LAUGHTER]

266
00:19:41,170 --> 00:19:44,130
-Well, uh.

267
00:19:44,130 --> 00:19:47,090
-I'm Sorry, maybe
that's not a--

268
00:19:47,090 --> 00:19:50,550
-I think the bubble sort would
be the wrong way to go.

269
00:19:50,550 --> 00:19:53,388
[LAUGHTER]

270
00:19:53,388 --> 00:19:57,880
-Come on, who told him this?

271
00:19:57,880 --> 00:20:00,030
I didn't see computer science
in your background.

272
00:20:00,030 --> 00:20:01,705
-We've got our spies in there.

273
00:20:04,610 --> 00:20:07,260
-OK, let's ask a different
interval--

274
00:20:07,260 --> 00:20:08,080
[END VIDEO PLAYBACK]

275
00:20:08,080 --> 00:20:13,410
PROFESSOR JOHN GUTTAG: All
right, so as he sometimes is,

276
00:20:13,410 --> 00:20:15,280
the President was correct.

277
00:20:15,280 --> 00:20:19,310
Bubble sort, though often
discussed, is almost always

278
00:20:19,310 --> 00:20:20,660
the wrong answer.

279
00:20:20,660 --> 00:20:24,001
So we're not going to talk
about bubble sort.

280
00:20:24,001 --> 00:20:26,430
I, by the way, know Larry
Schwimmer and can believe he

281
00:20:26,430 --> 00:20:29,700
did ask that question.

282
00:20:29,700 --> 00:20:31,350
But yes, I'm surprised.

283
00:20:31,350 --> 00:20:35,010
Someone had obviously warned
the President, actually the

284
00:20:35,010 --> 00:20:38,280
then future president I think.

285
00:20:38,280 --> 00:20:42,770
Let's look at a different one
that's often used, and that's

286
00:20:42,770 --> 00:20:45,470
called selection sort.

287
00:20:45,470 --> 00:20:48,620
This is about as simple
as it gets.

288
00:20:48,620 --> 00:20:52,230
The basic idea of
selection sort--

289
00:20:52,230 --> 00:20:56,700
and it's not a very good way
to sort, but it is a useful

290
00:20:56,700 --> 00:20:58,060
kind of thing to look
at because it

291
00:20:58,060 --> 00:21:00,770
introduces some ideas.

292
00:21:00,770 --> 00:21:09,460
Like many algorithms, it depends
upon establishing and

293
00:21:09,460 --> 00:21:10,710
maintaining an invariant.

294
00:21:27,860 --> 00:21:33,370
An invariant is something
that's invariantly true.

295
00:21:33,370 --> 00:21:37,140
The invariant we're going to
maintain here is we're going

296
00:21:37,140 --> 00:21:40,730
to have a pointer
into the list.

297
00:21:40,730 --> 00:21:49,130
And that pointer is going to
divide the list into a prefix

298
00:21:49,130 --> 00:21:50,380
and a suffix.

299
00:21:54,810 --> 00:22:02,520
And the invariant that we're
going to maintain is that the

300
00:22:02,520 --> 00:22:08,510
prefix is always sorted.

301
00:22:13,580 --> 00:22:16,810
We'll start where the
prefix is empty.

302
00:22:16,810 --> 00:22:19,910
It contains none of the list.

303
00:22:19,910 --> 00:22:24,430
And then each step through the
algorithm, we'll decrease the

304
00:22:24,430 --> 00:22:28,430
size of the suffix by one
element and increase the size

305
00:22:28,430 --> 00:22:31,240
of the prefix by one
element while

306
00:22:31,240 --> 00:22:34,480
maintaining the invariant.

307
00:22:34,480 --> 00:22:39,290
And we'll be done when the size
of the suffix is 0, and

308
00:22:39,290 --> 00:22:42,750
therefore the prefix contains
all the elements.

309
00:22:42,750 --> 00:22:45,960
And because we've been
maintaining this invariant, we

310
00:22:45,960 --> 00:22:48,210
know that we have now
sorted the list.

311
00:22:54,530 --> 00:22:55,800
So, you can think about it.

312
00:22:55,800 --> 00:23:07,380
For example, if I have a list
that looks like 4, 2, 3, I'll

313
00:23:07,380 --> 00:23:09,800
start pointing here.

314
00:23:09,800 --> 00:23:12,610
And the prefix, which contains
nothing, obeys the invariant.

315
00:23:15,760 --> 00:23:19,840
I'll then go through the list
and find the smallest element

316
00:23:19,840 --> 00:23:24,630
in the list and swap it with
the first element.

317
00:23:27,220 --> 00:23:32,745
My next step, the list will
look like 2, 4, 3.

318
00:23:36,270 --> 00:23:39,880
I'll now point here.

319
00:23:39,880 --> 00:23:41,200
My invariant is true.

320
00:23:41,200 --> 00:23:44,210
The prefix contains only
one element, so it is

321
00:23:44,210 --> 00:23:46,560
in ascending order.

322
00:23:46,560 --> 00:23:48,780
And I've increased
its size by 1.

323
00:23:52,320 --> 00:23:55,340
I don't have to look at this
element again because I know

324
00:23:55,340 --> 00:23:58,150
by construction that's
the smallest.

325
00:23:58,150 --> 00:24:02,160
Now I move here, and I look for
the smallest element in

326
00:24:02,160 --> 00:24:07,350
the suffix, which will be 3.

327
00:24:07,350 --> 00:24:09,990
I swapped 3 and 4.

328
00:24:09,990 --> 00:24:13,490
And then I'm going to be done.

329
00:24:13,490 --> 00:24:15,650
Does that make sense?

330
00:24:15,650 --> 00:24:17,390
It's very straightforward.

331
00:24:17,390 --> 00:24:23,280
It's, in some sense, the most
obvious way to sort a list.

332
00:24:23,280 --> 00:24:29,390
And if you look at the code,
that's exactly what it does.

333
00:24:29,390 --> 00:24:31,570
I've stated the invariant
here.

334
00:24:31,570 --> 00:24:36,200
And I just go through
and I sort it.

335
00:24:36,200 --> 00:24:38,730
So we can run it.

336
00:24:38,730 --> 00:24:39,980
Let's do that.

337
00:24:46,620 --> 00:24:52,960
I'm going to sort the list 3,
4, 5, et cetera, 35, 45.

338
00:24:52,960 --> 00:24:55,050
I'm going to call
selection sort.

339
00:24:55,050 --> 00:24:59,210
And I don't think this is in
your handout, but just to make

340
00:24:59,210 --> 00:25:02,550
it obvious what's going on, each
iteration of the loop I'm

341
00:25:02,550 --> 00:25:08,220
going to print the partially
sorted list so we can see

342
00:25:08,220 --> 00:25:09,470
what's happening.

343
00:25:18,250 --> 00:25:20,930
The first step, it finds 4 and
puts that in the beginning.

344
00:25:24,277 --> 00:25:27,420
It actually finds 0, puts it in
the beginning, et cetera.

345
00:25:27,420 --> 00:25:28,290
All right?

346
00:25:28,290 --> 00:25:31,080
So, people see what's
going on here?

347
00:25:31,080 --> 00:25:33,300
It's essentially doing
exactly what I did on

348
00:25:33,300 --> 00:25:35,220
the board over there.

349
00:25:35,220 --> 00:25:38,030
And when we're done, we have
the list completely sorted.

350
00:25:42,000 --> 00:25:43,495
What's the complexity of this?

351
00:25:51,580 --> 00:25:53,630
What's the complexity
of selection sort?

352
00:25:59,260 --> 00:26:01,610
There are two things going on.

353
00:26:01,610 --> 00:26:04,330
I'm doing a bunch
of comparisons.

354
00:26:04,330 --> 00:26:05,680
And I'm doing a bunch
of swaps.

355
00:26:09,130 --> 00:26:14,280
Since I do, at most, the same
number of comparisons as I do

356
00:26:14,280 --> 00:26:17,170
swap or swaps as I
do comparisons--

357
00:26:17,170 --> 00:26:20,520
I never swap without doing
a comparison--

358
00:26:20,520 --> 00:26:23,260
we can calculate complexity by
looking at the number of

359
00:26:23,260 --> 00:26:26,630
comparisons I'm doing.

360
00:26:26,630 --> 00:26:28,922
You can see that in
the code as well.

361
00:26:28,922 --> 00:26:32,870
So how many comparisons might
I have to do here?

362
00:26:40,850 --> 00:26:46,680
The key thing to notice is each
time I look at it, each

363
00:26:46,680 --> 00:26:50,870
iteration, I'm looking at
every element in what?

364
00:26:53,650 --> 00:26:55,660
In the list?

365
00:26:55,660 --> 00:26:58,520
No, every element
in the suffix.

366
00:26:58,520 --> 00:27:03,210
The first time through,
I'm going to look at--

367
00:27:03,210 --> 00:27:08,520
let's just say n equals the
length of the list.

368
00:27:08,520 --> 00:27:13,385
So the first time through, I'm
going to look at n elements.

369
00:27:17,060 --> 00:27:21,190
Then I'm going to look
at n minus 1.

370
00:27:21,190 --> 00:27:24,830
Then I'm going to look
at n minus 2.

371
00:27:24,830 --> 00:27:26,220
Until I'm done, right?

372
00:27:29,410 --> 00:27:34,470
So that's how many operations
I'm doing.

373
00:27:34,470 --> 00:27:39,650
And what is the order of n plus
n minus 1 plus n minus 2?

374
00:27:44,000 --> 00:27:46,610
Exactly.

375
00:27:46,610 --> 00:27:47,860
Order n.

376
00:27:50,610 --> 00:27:52,900
So, selection sort is order n.

377
00:27:56,250 --> 00:27:57,820
Is that right?

378
00:27:57,820 --> 00:27:58,910
Somebody said order n.

379
00:27:58,910 --> 00:28:01,290
Do you believe it's n?

380
00:28:01,290 --> 00:28:04,410
Is this really n?

381
00:28:04,410 --> 00:28:05,510
It's not n.

382
00:28:05,510 --> 00:28:08,010
What is it?

383
00:28:08,010 --> 00:28:11,160
Somebody raise your hand, so
I can throw the candy out.

384
00:28:11,160 --> 00:28:13,800
Yeah.

385
00:28:13,800 --> 00:28:16,284
AUDIENCE: [INAUDIBLE]

386
00:28:16,284 --> 00:28:17,534
PROFESSOR JOHN GUTTAG:
It's not n factorial.

387
00:28:22,260 --> 00:28:24,252
AUDIENCE: n-squared?

388
00:28:24,252 --> 00:28:25,746
PROFESSOR JOHN GUTTAG: You said
that with a question mark

389
00:28:25,746 --> 00:28:27,240
at the end of your voice.

390
00:28:27,240 --> 00:28:29,398
AUDIENCE: No, it's like the sum
of the numbers is, like, n

391
00:28:29,398 --> 00:28:31,722
times n minus 1 over 2 or
something like that.

392
00:28:31,722 --> 00:28:34,212
PROFESSOR JOHN GUTTAG: It's
really exactly right.

393
00:28:40,190 --> 00:28:41,910
It's a little smaller
than n-squared,

394
00:28:41,910 --> 00:28:43,160
but it's order n-squared.

395
00:28:47,820 --> 00:28:50,810
I'm doing a lot of
these additions.

396
00:28:50,810 --> 00:28:54,380
So I can't ignore all of
these extra terms and

397
00:28:54,380 --> 00:28:55,630
say they don't matter.

398
00:28:58,590 --> 00:29:02,090
It's almost as bad as comparing
every element to

399
00:29:02,090 --> 00:29:04,750
every other element.

400
00:29:04,750 --> 00:29:09,140
So, selection sort is
order n-squared.

401
00:29:09,140 --> 00:29:13,260
And you can do it by
understanding that sum or you

402
00:29:13,260 --> 00:29:16,410
can look at the code here.

403
00:29:16,410 --> 00:29:18,700
And that sort of will
also tip you off.

404
00:29:25,330 --> 00:29:26,580
Ok. so now, can we do better?

405
00:29:28,710 --> 00:29:32,550
There was a while where people
were pretty unsure whether you

406
00:29:32,550 --> 00:29:34,820
could do better.

407
00:29:34,820 --> 00:29:36,070
But we can.

408
00:29:40,790 --> 00:29:47,130
If we think about it now, it was
a method invented by John

409
00:29:47,130 --> 00:29:50,490
von Neumann, a very
famous guy.

410
00:29:50,490 --> 00:29:56,790
And he, back in the '40s
amazingly enough, viewed this

411
00:29:56,790 --> 00:29:59,640
as a kind of divide and
conquer algorithm.

412
00:29:59,640 --> 00:30:01,405
And we've looked at divide
and conquer before.

413
00:30:04,330 --> 00:30:06,860
What is the general form
of divide and conquer?

414
00:30:16,190 --> 00:30:21,330
A phrase you've heard me use
many times, popularized, by

415
00:30:21,330 --> 00:30:24,380
the way, I think, by Machiavelli
in The Prince, in

416
00:30:24,380 --> 00:30:25,630
a not very nice context.

417
00:30:28,470 --> 00:30:32,220
So, what we do-- and they're
all of a kind, the same--

418
00:30:32,220 --> 00:30:33,490
we start with 1.

419
00:30:33,490 --> 00:30:35,750
Let me get over here and get
a full board for this.

420
00:30:43,300 --> 00:30:46,020
First, we have to choose
a threshold size.

421
00:31:01,070 --> 00:31:02,320
Let's call it n0.

422
00:31:05,400 --> 00:31:07,770
And that will be, essentially,
the smallest problem.

423
00:31:15,590 --> 00:31:20,010
So, we can keep dividing, making
our problem smaller--

424
00:31:20,010 --> 00:31:23,710
this is what we saw with binary
search, for example--

425
00:31:23,710 --> 00:31:26,890
until it's small enough that we
say, oh the heck with it.

426
00:31:26,890 --> 00:31:27,970
We'll stop dividing it.

427
00:31:27,970 --> 00:31:29,370
Now we'll just solve
it directly.

428
00:31:35,350 --> 00:31:39,400
So, that's how small we need to
do it, the smallest thing

429
00:31:39,400 --> 00:31:42,760
we'll divide things into.

430
00:31:42,760 --> 00:31:53,090
The next thing we have to ask
ourselves is, how many

431
00:31:53,090 --> 00:31:57,685
instances at each division?

432
00:32:02,590 --> 00:32:03,640
We have a big problem.

433
00:32:03,640 --> 00:32:05,970
We divide it into smaller
problems.

434
00:32:05,970 --> 00:32:07,790
How many are we going
to divide it into?

435
00:32:13,790 --> 00:32:17,390
We divide it into smaller
problems until we reach the

436
00:32:17,390 --> 00:32:20,930
threshold where we can
solve it directly.

437
00:32:20,930 --> 00:32:26,120
And then the third and most
important part is we need some

438
00:32:26,120 --> 00:32:28,960
algorithm to combine
the sub-solutions.

439
00:32:34,290 --> 00:32:37,850
It's no good solving the small
problem if we don't have some

440
00:32:37,850 --> 00:32:39,996
way to combine them to solve
the larger problem.

441
00:32:45,950 --> 00:32:50,530
We saw that before, and now
we're going to see it again.

442
00:32:50,530 --> 00:32:53,800
And we're going to see it, in
particular, in the context of

443
00:32:53,800 --> 00:32:55,950
merge sort.

444
00:32:55,950 --> 00:32:58,910
If I use this board, can people
see it or is the screen

445
00:32:58,910 --> 00:33:01,860
going to occlude it?

446
00:33:01,860 --> 00:33:05,990
Is there anyone who cannot see
this board if I write on it?

447
00:33:05,990 --> 00:33:07,300
All right then, I will
write on it.

448
00:33:11,130 --> 00:33:16,030
Let's first look at
this problem.

449
00:33:18,720 --> 00:33:23,470
What von Neumann observed
in 1945 is

450
00:33:23,470 --> 00:33:25,760
given two sorted lists--

451
00:33:25,760 --> 00:33:28,790
and amazingly enough, this is
still the most popular sorting

452
00:33:28,790 --> 00:33:32,780
algorithm or one of the two most
popular I should say--

453
00:33:35,610 --> 00:33:40,240
you can merge them quickly.

454
00:33:40,240 --> 00:33:42,640
Let's look at an example.

455
00:33:42,640 --> 00:33:54,340
I'll take the lists 1, 5,
12, 18, 19, and 20.

456
00:33:54,340 --> 00:33:56,730
That's list one.

457
00:33:56,730 --> 00:34:03,620
And I'll try and merge it with
the list 2, 3, 4, and 17.

458
00:34:07,920 --> 00:34:11,550
The way you do the merge is
you start by comparing the

459
00:34:11,550 --> 00:34:14,040
first element to the
first element.

460
00:34:17,710 --> 00:34:20,380
And then you choose and
say all right, 1 is

461
00:34:20,380 --> 00:34:23,090
smaller than 2.

462
00:34:23,090 --> 00:34:27,330
So that will be the first
element of the merge list.

463
00:34:27,330 --> 00:34:29,340
I'm now done with 1,
and I never have

464
00:34:29,340 --> 00:34:30,590
to look at it again.

465
00:34:34,440 --> 00:34:40,929
The next thing I do is I compare
5 and 2, the head of

466
00:34:40,929 --> 00:34:43,212
the two remaining lists.

467
00:34:43,212 --> 00:34:45,440
And I say, well, 2 is
smaller than 5.

468
00:34:48,500 --> 00:34:52,170
I never have to look
at 2 again.

469
00:34:52,170 --> 00:34:55,449
And then compare 5 and 3.

470
00:34:55,449 --> 00:34:56,699
I say 3 is smaller.

471
00:34:59,259 --> 00:35:02,240
I never have to look
at 3 again.

472
00:35:02,240 --> 00:35:05,340
I then compare 4 and 5.

473
00:35:05,340 --> 00:35:06,590
4 is smaller.

474
00:35:09,230 --> 00:35:13,280
I then compare 5 and 17.

475
00:35:13,280 --> 00:35:16,360
5 is smaller.

476
00:35:16,360 --> 00:35:17,610
Et cetera.

477
00:35:22,600 --> 00:35:26,655
Now, how many comparisons am
I going to do this time?

478
00:35:34,730 --> 00:35:38,110
Well, let's first ask the
question, how many elements am

479
00:35:38,110 --> 00:35:40,480
I going to copy from one of
these lists to this list?

480
00:35:44,080 --> 00:35:46,720
Copy each element once, right?

481
00:35:46,720 --> 00:35:55,340
So, the number of copies is
order len of the list.

482
00:36:01,260 --> 00:36:02,530
That's pretty good.

483
00:36:02,530 --> 00:36:03,930
That's linear.

484
00:36:03,930 --> 00:36:07,326
That's sort of at
the lower bound.

485
00:36:07,326 --> 00:36:10,730
But how many comparisons?

486
00:36:10,730 --> 00:36:12,410
That's a little trickier
to think about.

487
00:36:16,874 --> 00:36:18,870
AUDIENCE: [INAUDIBLE]

488
00:36:18,870 --> 00:36:19,150
PROFESSOR JOHN GUTTAG: Pardon?

489
00:36:19,150 --> 00:36:22,030
AUDIENCE: At most, the length
of the longer list.

490
00:36:22,030 --> 00:36:24,070
PROFESSOR JOHN GUTTAG: At most,
the length of the longer

491
00:36:24,070 --> 00:36:28,920
list, which would also
be, we could claim to

492
00:36:28,920 --> 00:36:33,740
be, order len of--

493
00:36:33,740 --> 00:36:37,850
I sort of cheated using L
when we have two lists.

494
00:36:37,850 --> 00:36:40,100
But just think of it
as the longer list.

495
00:36:40,100 --> 00:36:42,276
So, you'd think that
many comparisons.

496
00:36:48,500 --> 00:36:50,725
You think we can do this whole
thing in linear time?

497
00:36:53,920 --> 00:36:55,310
And the answer is yes.

498
00:36:59,910 --> 00:37:01,160
That's our merge.

499
00:37:04,900 --> 00:37:07,130
That's a good thing.

500
00:37:07,130 --> 00:37:10,740
Now, that takes care
of this step.

501
00:37:15,740 --> 00:37:18,900
But now we have to ask,
how many times are we

502
00:37:18,900 --> 00:37:20,150
going to do a merge?

503
00:37:24,980 --> 00:37:27,970
Because remember, this
worked because

504
00:37:27,970 --> 00:37:29,840
these lists were sorted.

505
00:37:29,840 --> 00:37:32,090
And so I only had to compare
the front of each list.

506
00:37:37,710 --> 00:37:42,550
When I think about how I'm going
to do the binary or the

507
00:37:42,550 --> 00:37:47,260
merge sort, what I'm going to do
is take the original list,

508
00:37:47,260 --> 00:37:50,400
break it up, break it up, break
it up, break it up,

509
00:37:50,400 --> 00:37:53,865
until I have a list
of length 1.

510
00:37:53,865 --> 00:37:58,070
Well, those are all sorted,
trivially sorted.

511
00:37:58,070 --> 00:38:00,780
And then I'll have, at
the end, a bunch of

512
00:38:00,780 --> 00:38:02,630
lists of length 1.

513
00:38:02,630 --> 00:38:04,300
I'll merge pairs of those.

514
00:38:06,870 --> 00:38:09,750
Now I'll have sorted
lists of length 2.

515
00:38:09,750 --> 00:38:14,620
Then I'll merge those, getting
sorted lists of length 4.

516
00:38:14,620 --> 00:38:18,610
Until at the end, I'll be
merging two lists, each half

517
00:38:18,610 --> 00:38:23,030
the length of the
original list.

518
00:38:23,030 --> 00:38:23,255
Right.

519
00:38:23,255 --> 00:38:24,505
Does that make sense
to everybody?

520
00:38:29,470 --> 00:38:34,080
Now I have to ask the question,
how many times am I

521
00:38:34,080 --> 00:38:35,330
going to call merge?

522
00:38:39,910 --> 00:38:40,740
Yeah.

523
00:38:40,740 --> 00:38:43,230
AUDIENCE: Base 2 log of
one of the lists.

524
00:38:43,230 --> 00:38:48,078
PROFESSOR JOHN GUTTAG: Yes, I'm
going to call merge log

525
00:38:48,078 --> 00:38:50,054
length of the list times.

526
00:38:53,020 --> 00:39:06,630
So, if each merge is order n
where n is length of the list,

527
00:39:06,630 --> 00:39:12,150
and I call merge log n times,
what's the total complexity of

528
00:39:12,150 --> 00:39:13,714
the merge sort?

529
00:39:13,714 --> 00:39:14,590
AUDIENCE: nlog(n).

530
00:39:14,590 --> 00:39:16,490
PROFESSOR JOHN GUTTAG:
nlog(n).

531
00:39:16,490 --> 00:39:17,740
Thank you.

532
00:39:20,440 --> 00:39:22,850
Let's see, I have to choose
a heavy candy

533
00:39:22,850 --> 00:39:26,160
because they carry better.

534
00:39:26,160 --> 00:39:27,830
Not well enough though.

535
00:39:27,830 --> 00:39:29,280
All right, you can
relay it back.

536
00:39:31,890 --> 00:39:33,800
Now let's look at an
implementation.

537
00:39:42,170 --> 00:39:45,470
Here's the implementation
of sort.

538
00:39:45,470 --> 00:39:49,340
And I don't think you need
to look at it in detail.

539
00:39:49,340 --> 00:39:51,410
It's doing exactly what
I did on the board.

540
00:39:51,410 --> 00:39:53,480
Actually, you do need to
look at in detail,

541
00:39:53,480 --> 00:39:55,890
but not in real time.

542
00:39:55,890 --> 00:39:58,500
And then sort.

543
00:39:58,500 --> 00:40:00,430
Now, there's a little
complication here because I

544
00:40:00,430 --> 00:40:04,640
wanted to show another
feature to you.

545
00:40:04,640 --> 00:40:07,740
For the moment, we'll ignore the
complication, which is--

546
00:40:12,080 --> 00:40:15,945
it's, in principle, working,
but it's not very bright.

547
00:40:19,350 --> 00:40:22,550
I'll use the mouse.

548
00:40:22,550 --> 00:40:29,490
What we see here is, whenever
you do a sort, you're sorting

549
00:40:29,490 --> 00:40:32,126
by some ordering metric.

550
00:40:32,126 --> 00:40:33,700
It could be less than.

551
00:40:33,700 --> 00:40:35,030
It could be greater than.

552
00:40:35,030 --> 00:40:38,550
It could be anything you want.

553
00:40:38,550 --> 00:40:40,990
If you're sorting people, you
could sort them by weight or

554
00:40:40,990 --> 00:40:42,770
you could sort them by height.

555
00:40:42,770 --> 00:40:46,010
You could sort them by,
God forbid, GPA,

556
00:40:46,010 --> 00:40:47,260
whatever you want.

557
00:40:50,590 --> 00:40:57,620
So, I've written sort to take
as an argument the ordering.

558
00:40:57,620 --> 00:41:03,270
I've used this funny thing
called lambda, which you don't

559
00:41:03,270 --> 00:41:05,330
actually have to be
responsible for.

560
00:41:05,330 --> 00:41:07,550
You're never going to, probably,
need to use it in

561
00:41:07,550 --> 00:41:08,760
this course.

562
00:41:08,760 --> 00:41:13,800
But it's a way to dynamically
build a function on the fly.

563
00:41:13,800 --> 00:41:19,740
The function I've built is I've
said the default value of

564
00:41:19,740 --> 00:41:24,930
LT is x less than y.

565
00:41:24,930 --> 00:41:28,910
Lambda x, lambda xy says
x and y are the

566
00:41:28,910 --> 00:41:31,150
parameters to a function.

567
00:41:31,150 --> 00:41:34,970
And the body of the function is
simply return the value x

568
00:41:34,970 --> 00:41:37,310
less than y.

569
00:41:37,310 --> 00:41:38,800
All right?

570
00:41:38,800 --> 00:41:41,490
Nothing very exciting there.

571
00:41:41,490 --> 00:41:44,680
What is exciting is having a
function as an argument.

572
00:41:44,680 --> 00:41:47,170
And that is something that
you'll be doing in future

573
00:41:47,170 --> 00:41:48,670
problem sets.

574
00:41:48,670 --> 00:41:51,150
Because it's one of the very
powerful and most useful

575
00:41:51,150 --> 00:41:54,160
features in Python, is using
functional arguments.

576
00:41:56,910 --> 00:41:57,040
Right.

577
00:41:57,040 --> 00:42:01,670
Having got past that, what we
see is we first say if the

578
00:42:01,670 --> 00:42:04,880
length of L is less than 2--

579
00:42:04,880 --> 00:42:07,890
that's my threshold--

580
00:42:07,890 --> 00:42:15,600
then I'm just going to return
L, actually a copy of L.

581
00:42:15,600 --> 00:42:22,810
Otherwise, I'm going to find
roughly the middle of L.

582
00:42:22,810 --> 00:42:27,970
Then I'm going to call sort
recursively with the part to

583
00:42:27,970 --> 00:42:30,040
the left of the middle and the
part to the right of the

584
00:42:30,040 --> 00:42:37,640
middle, and then merge them.

585
00:42:37,640 --> 00:42:40,720
So I'm going to go all the way
down until I get to list of

586
00:42:40,720 --> 00:42:43,240
length 1, and then bubble
all the way back up,

587
00:42:43,240 --> 00:42:44,490
merging as I go.

588
00:42:49,250 --> 00:42:53,550
So, we can see that the depth
of the recursion will be

589
00:42:53,550 --> 00:42:57,210
log(n), as observed before.

590
00:42:57,210 --> 00:42:58,920
This is exactly what we
looked at when we

591
00:42:58,920 --> 00:43:01,120
looked at binary search.

592
00:43:01,120 --> 00:43:03,870
How many times can you divide
something in half --

593
00:43:03,870 --> 00:43:06,700
log(n) times?

594
00:43:06,700 --> 00:43:13,080
And each recursion we're
going to call merge.

595
00:43:13,080 --> 00:43:15,645
So, this is consistent with the
notion that the complexity

596
00:43:15,645 --> 00:43:18,760
of the overall algorithm
is nlog(n).

597
00:43:22,580 --> 00:43:25,620
Let's run it.

598
00:43:25,620 --> 00:43:28,150
And I'm going to print as we
go what's getting merged.

599
00:43:50,640 --> 00:43:51,510
Get rid of this one.

600
00:43:51,510 --> 00:43:52,780
This was our selection sort.

601
00:43:52,780 --> 00:43:55,770
We already looked at that.

602
00:43:55,770 --> 00:43:57,020
Yeah.

603
00:44:05,630 --> 00:44:12,890
So what we'll see here is the
first example, I was just

604
00:44:12,890 --> 00:44:16,401
sorting a list of integers.

605
00:44:16,401 --> 00:44:18,240
Maybe we'll look at that
all by itself.

606
00:44:26,520 --> 00:44:29,600
I didn't pass it in the second
argument, so it used the

607
00:44:29,600 --> 00:44:30,880
default less than.

608
00:44:33,920 --> 00:44:37,110
It was first merge 4 and 5.

609
00:44:37,110 --> 00:44:40,960
Then it had to merge 35
with 4 and 5, then 29

610
00:44:40,960 --> 00:44:44,460
with 17, 58 and 0.

611
00:44:44,460 --> 00:44:54,130
And then the longer list, 1729
with 058, 04535 with 0172958.

612
00:44:54,130 --> 00:44:55,380
And then we were done.

613
00:44:57,940 --> 00:45:00,950
So, indeed it did a logarithmic
number of merges.

614
00:45:03,450 --> 00:45:13,360
The next piece of code, I'm
taking advantage of the fact

615
00:45:13,360 --> 00:45:19,080
that this function can sort
lists of different kinds.

616
00:45:19,080 --> 00:45:22,740
And I'm calling it now with
the list of floats.

617
00:45:22,740 --> 00:45:25,320
And I am passing in the
second argument,

618
00:45:25,320 --> 00:45:27,020
which is going to be--

619
00:45:27,020 --> 00:45:30,800
well, let's for fun, I wonder
what happens if I make this

620
00:45:30,800 --> 00:45:32,050
greater than.

621
00:45:34,180 --> 00:45:35,430
Let's see what we get.

622
00:45:41,490 --> 00:45:44,060
Now you note, it's sorted
it in the other order.

623
00:45:46,950 --> 00:45:49,850
Because I passed in the ordering
that said I want to

624
00:45:49,850 --> 00:45:52,730
use a different comparison then
less than, I want to use

625
00:45:52,730 --> 00:45:55,090
greater than.

626
00:45:55,090 --> 00:45:57,740
So the same code did the
sort the other way.

627
00:46:01,040 --> 00:46:02,640
I can do more interesting
things.

628
00:46:09,470 --> 00:46:14,320
So, here I'm assuming I
have a list of names.

629
00:46:14,320 --> 00:46:20,300
And I've written two ordering
functions myself, one that

630
00:46:20,300 --> 00:46:23,800
first compares the last names
and then the first names.

631
00:46:23,800 --> 00:46:25,750
And a different one that
compares the first names and

632
00:46:25,750 --> 00:46:27,000
then the last names.

633
00:46:30,070 --> 00:46:33,290
And we can look at those.

634
00:46:42,550 --> 00:46:44,610
Just to avoid cluttering
up the screen, let

635
00:46:44,610 --> 00:46:45,860
me get rid of this.

636
00:46:56,370 --> 00:46:58,640
What we can see is we got--

637
00:46:58,640 --> 00:47:01,770
we did the same way of dividing
things initially, but

638
00:47:01,770 --> 00:47:05,600
now we got different
orderings.

639
00:47:05,600 --> 00:47:08,750
So, if we look at the first
ordering I used, we start with

640
00:47:08,750 --> 00:47:12,710
Giselle Brady and then Tom
Brady and then Chancellor

641
00:47:12,710 --> 00:47:14,800
Grimson, et cetera.

642
00:47:14,800 --> 00:47:17,020
And if we do the second
ordering, we see, among other

643
00:47:17,020 --> 00:47:21,590
things, you have me between
Giselle and Tom.

644
00:47:21,590 --> 00:47:23,205
Not a bad outcome from
my perspective.

645
00:47:27,920 --> 00:47:31,480
But again, a lot
of flexibility.

646
00:47:31,480 --> 00:47:36,760
By using this functional
argument, I can define

647
00:47:36,760 --> 00:47:40,880
whatever functions I want, and
using the same sort, get lots

648
00:47:40,880 --> 00:47:43,570
of different code.

649
00:47:43,570 --> 00:47:47,760
And you will discover that in
fact the built in sort of

650
00:47:47,760 --> 00:47:52,000
Python has this kind
of flexibility.

651
00:47:52,000 --> 00:47:55,040
You will also find, as you
write your own programs,

652
00:47:55,040 --> 00:47:58,680
increasingly you'll want to use
functions as arguments.

653
00:47:58,680 --> 00:48:02,020
Because it allows you to write
a lot less code to accomplish

654
00:48:02,020 --> 00:48:03,270
the same tasks.