1
00:00:00,040 --> 00:00:02,460
The following content is
provided under a Creative

2
00:00:02,460 --> 00:00:03,870
Commons license.

3
00:00:03,870 --> 00:00:06,320
Your support will help
MIT OpenCourseWare

4
00:00:06,320 --> 00:00:10,560
continue to offer high-quality
educational resources for free.

5
00:00:10,560 --> 00:00:13,300
To make a donation or
view additional materials

6
00:00:13,300 --> 00:00:17,210
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:17,210 --> 00:00:18,793
at ocw.mit.edu.

8
00:00:35,860 --> 00:00:36,850
HERBERT GROSS: Hi.

9
00:00:36,850 --> 00:00:41,060
Today we do a somewhat
computational bit.

10
00:00:41,060 --> 00:00:45,240
And actually, the
lecture for today

11
00:00:45,240 --> 00:00:47,680
is not nearly as
difficult, once you

12
00:00:47,680 --> 00:00:50,320
get through the
maze of symbolism,

13
00:00:50,320 --> 00:00:53,940
as it is to apply the material.

14
00:00:53,940 --> 00:00:57,320
In other words, we're going
to devote the next two

15
00:00:57,320 --> 00:01:01,190
units of our course to this
particular topic, which

16
00:01:01,190 --> 00:01:03,060
is known as the chain rule.

17
00:01:03,060 --> 00:01:07,870
But we'll give one lecture
to cover both units.

18
00:01:07,870 --> 00:01:11,100
And again, the idea is
that it's not so much

19
00:01:11,100 --> 00:01:14,780
that the concept becomes
more difficult as much as it

20
00:01:14,780 --> 00:01:17,820
is that you must
develop a certain amount

21
00:01:17,820 --> 00:01:22,650
of dexterity keeping track of
the various partial derivatives

22
00:01:22,650 --> 00:01:25,090
and the like.

23
00:01:25,090 --> 00:01:27,330
At any rate, maybe
I think the best way

24
00:01:27,330 --> 00:01:31,120
is to just barge into a
hypothetical situation

25
00:01:31,120 --> 00:01:34,060
and see what the
situation really is.

26
00:01:34,060 --> 00:01:37,610
The idea is essentially
the following.

27
00:01:37,610 --> 00:01:40,150
We're given some
function, say, w.

28
00:01:40,150 --> 00:01:43,100
w is a function of, say, the
three independent variables

29
00:01:43,100 --> 00:01:45,010
x, y, and z.

30
00:01:45,010 --> 00:01:47,020
Now, for some reason
or other, which

31
00:01:47,020 --> 00:01:49,670
we won't worry
about right now, it

32
00:01:49,670 --> 00:01:53,880
turns out that x, y,
and z are, in turn,

33
00:01:53,880 --> 00:01:57,720
conveniently expressible in
terms of the two variables

34
00:01:57,720 --> 00:01:58,915
r and s.

35
00:01:58,915 --> 00:02:01,980
In fact, if you want a physical
interpretation of this,

36
00:02:01,980 --> 00:02:06,260
you can think of if x,
y, and z are functions

37
00:02:06,260 --> 00:02:09,050
of the two independent
variables r and s,

38
00:02:09,050 --> 00:02:11,780
that means that we have
two degrees of freedom.

39
00:02:11,780 --> 00:02:15,450
So we may think of this as
parametrically representing

40
00:02:15,450 --> 00:02:17,930
the equation of a surface.

41
00:02:17,930 --> 00:02:19,840
And what we're
talking about here

42
00:02:19,840 --> 00:02:24,010
is w being a function of
something in space and asking,

43
00:02:24,010 --> 00:02:27,900
what does w look like when
you restrict your space

44
00:02:27,900 --> 00:02:29,300
to a particular surface?

45
00:02:29,300 --> 00:02:31,700
I mean, that's just a
geometrical interpretation

46
00:02:31,700 --> 00:02:33,050
that one could talk about.

47
00:02:33,050 --> 00:02:34,350
But the idea is the following.

48
00:02:34,350 --> 00:02:38,730
After all, if w depends
on x, y, and z, and x, y,

49
00:02:38,730 --> 00:02:43,100
and z each depend on r
and s, in particular then,

50
00:02:43,100 --> 00:02:46,660
it's clear that w itself is
some function of r and s,

51
00:02:46,660 --> 00:02:49,500
where, again, I use
the usual notation

52
00:02:49,500 --> 00:02:52,090
of using a g here
rather than the f

53
00:02:52,090 --> 00:02:57,090
up here to indicate that the
relationship between r and s

54
00:02:57,090 --> 00:02:59,900
which specifies w,
may very well be

55
00:02:59,900 --> 00:03:02,930
a different algebraic
relationship than that which

56
00:03:02,930 --> 00:03:05,820
relates x, y, and z to give w.

57
00:03:05,820 --> 00:03:08,750
But the point that we have
in mind is the following.

58
00:03:08,750 --> 00:03:10,750
Given that w is a
function of x, y,

59
00:03:10,750 --> 00:03:15,170
and z, given that x, y, and
z are functions of r and s,

60
00:03:15,170 --> 00:03:17,560
hence w is a
function of r and s.

61
00:03:17,560 --> 00:03:20,930
The question that we ask in
calculus of several variables

62
00:03:20,930 --> 00:03:24,910
is, first of all, if we can
be sure that these were all

63
00:03:24,910 --> 00:03:28,060
continuously
differentiable functions,

64
00:03:28,060 --> 00:03:32,190
can we be sure that w will be
a continuously differentiable

65
00:03:32,190 --> 00:03:33,310
function of r and s?

66
00:03:33,310 --> 00:03:35,290
That's the first question.

67
00:03:35,290 --> 00:03:38,470
And the second question
is, OK, assuming

68
00:03:38,470 --> 00:03:41,490
that the answer to the first
question is in the affirmative,

69
00:03:41,490 --> 00:03:44,890
that w is a continuously
differentiable function of r

70
00:03:44,890 --> 00:03:49,110
and s, how could we compute,
for example, the partial of w

71
00:03:49,110 --> 00:03:53,940
with respect to r, knowing
all of the so-called obvious

72
00:03:53,940 --> 00:03:55,330
other partial derivatives?

73
00:03:55,330 --> 00:03:56,590
What do I mean by that?

74
00:03:56,590 --> 00:04:00,240
Well, what I mean is if you were
to look just at this equation,

75
00:04:00,240 --> 00:04:01,960
just looking at
this equation, what

76
00:04:01,960 --> 00:04:04,060
are the obvious partial
derivatives to take?

77
00:04:04,060 --> 00:04:05,810
You say, well, we'll
take the partial of w

78
00:04:05,810 --> 00:04:09,190
with respect to x, the partial
of w with respect to y,

79
00:04:09,190 --> 00:04:11,790
and the partial of
w with respect to z.

80
00:04:11,790 --> 00:04:14,180
And if you were to look,
say, at this equation,

81
00:04:14,180 --> 00:04:17,050
the natural thing to ask
is, what is the partial

82
00:04:17,050 --> 00:04:18,350
of x with respect to r?

83
00:04:18,350 --> 00:04:21,029
What is the partial of
x with respect to s?

84
00:04:21,029 --> 00:04:22,010
Et cetera.

85
00:04:22,010 --> 00:04:23,830
In other words, what
we're saying is,

86
00:04:23,830 --> 00:04:27,580
in this particular problem, we
would like to figure out how,

87
00:04:27,580 --> 00:04:30,410
for example, to compute the
partial of w with respect

88
00:04:30,410 --> 00:04:34,010
to r, knowing that we
have at our disposal

89
00:04:34,010 --> 00:04:38,390
the partials of w with
respect to x, y, and z;

90
00:04:38,390 --> 00:04:43,810
the partials of x, y, and z
with respect to r; et cetera;

91
00:04:43,810 --> 00:04:45,950
meaning we also have
the partials of x, y,

92
00:04:45,950 --> 00:04:48,310
and z with respect to theta.

93
00:04:48,310 --> 00:04:51,370
Before I go any further,
notice, by the way,

94
00:04:51,370 --> 00:04:53,930
that if I left out that
phrase that we were talking

95
00:04:53,930 --> 00:04:57,120
about in our last lecture,
"continuously differentiable,"

96
00:04:57,120 --> 00:04:59,370
notice that all of
this would make sense,

97
00:04:59,370 --> 00:05:02,330
provided that the
derivatives existed.

98
00:05:02,330 --> 00:05:05,010
There was no place here
do I make any statement

99
00:05:05,010 --> 00:05:09,090
that the partials have to not
only exist but be continuous.

100
00:05:09,090 --> 00:05:10,270
I never say that at all.

101
00:05:10,270 --> 00:05:11,720
This is what the problem is.

102
00:05:11,720 --> 00:05:13,490
I would like to
use the chain rule.

103
00:05:13,490 --> 00:05:15,880
Do you see why it's
called the chain rule?

104
00:05:15,880 --> 00:05:18,410
w is a function of x, y, and z.

105
00:05:18,410 --> 00:05:21,400
x, y, and z are each
functions of r and s.

106
00:05:21,400 --> 00:05:24,510
Now, what I claim is that
not only is it possible

107
00:05:24,510 --> 00:05:27,830
to do this but the
recipe for doing

108
00:05:27,830 --> 00:05:32,130
this is a very, very suggestive
thing, one which is very, very

109
00:05:32,130 --> 00:05:35,930
easy to remember, once you
see how it's put together.

110
00:05:35,930 --> 00:05:37,590
If you don't see how
it's put together,

111
00:05:37,590 --> 00:05:39,580
the thing is just
a mess-- namely,

112
00:05:39,580 --> 00:05:42,351
the claim is that the partial
of w with respect to r

113
00:05:42,351 --> 00:05:44,850
is the partial-- I'll just read
it to you-- the partial of w

114
00:05:44,850 --> 00:05:48,210
with respect to x times the
partial of x with respect to r,

115
00:05:48,210 --> 00:05:49,960
plus the partial
of w with respect

116
00:05:49,960 --> 00:05:53,120
to y times the partial
of y with respect to r,

117
00:05:53,120 --> 00:05:54,820
plus the partial
of w with respect

118
00:05:54,820 --> 00:05:57,520
to z times the partial
of z with respect to r.

119
00:05:57,520 --> 00:05:59,640
And as I say, if you
try to memorize that,

120
00:05:59,640 --> 00:06:01,670
it's a very, very
nasty business.

121
00:06:01,670 --> 00:06:03,905
But let's look at this
in three separate pieces.

122
00:06:09,910 --> 00:06:13,390
In a way, can you
sense that this

123
00:06:13,390 --> 00:06:17,260
is nothing more than the
change in w with respect

124
00:06:17,260 --> 00:06:20,410
to r due to the
change in x alone?

125
00:06:20,410 --> 00:06:22,110
In other words, you're
taking here what?

126
00:06:22,110 --> 00:06:25,970
The change in w due to
x and multiplying that

127
00:06:25,970 --> 00:06:28,490
by the change in x
with respect to r.

128
00:06:28,490 --> 00:06:32,270
So this is the contribution of
the change in w with respect

129
00:06:32,270 --> 00:06:35,490
to r due to x alone.

130
00:06:35,490 --> 00:06:38,590
On the other hand, this is
the partial of w with respect

131
00:06:38,590 --> 00:06:41,840
to r due to the
change in y alone.

132
00:06:41,840 --> 00:06:43,760
And this is the partial
of w with respect

133
00:06:43,760 --> 00:06:46,670
to r due to the
change in z alone.

134
00:06:46,670 --> 00:06:49,630
And since x, y, and
z are independent,

135
00:06:49,630 --> 00:06:52,990
the change in x, the change
in y, and the change in z

136
00:06:52,990 --> 00:06:54,900
are also independent variables.

137
00:06:54,900 --> 00:06:58,150
Consequently, it seems
reasonable to assume

138
00:06:58,150 --> 00:07:01,840
that to find the total change
of w with respect to r,

139
00:07:01,840 --> 00:07:04,990
we just add up all of the
partial contributions.

140
00:07:04,990 --> 00:07:07,700
Namely, we take the
partial of w with respect

141
00:07:07,700 --> 00:07:11,570
to r due to x alone, add
on to that the partial of w

142
00:07:11,570 --> 00:07:14,070
with respect to
r due to y alone,

143
00:07:14,070 --> 00:07:16,140
add on to that the
partial of w with respect

144
00:07:16,140 --> 00:07:19,940
to r due to z alone,
and that that sum should

145
00:07:19,940 --> 00:07:23,810
be the total change
in w with respect r,

146
00:07:23,810 --> 00:07:25,120
treating s as a constant.

147
00:07:25,120 --> 00:07:29,000
And by the way, let me point out
a pitfall with this notation.

148
00:07:29,000 --> 00:07:31,980
We're so used to using
fractional notation here.

149
00:07:31,980 --> 00:07:34,350
Have you noticed that if
you're not careful here,

150
00:07:34,350 --> 00:07:37,400
you're almost
tempted to cancel--

151
00:07:37,400 --> 00:07:40,077
I don't want to write this,
because you'll think that it's

152
00:07:40,077 --> 00:07:41,160
the right way of doing it.

153
00:07:41,160 --> 00:07:43,620
But see if we say, let's cancel
the partials with respect

154
00:07:43,620 --> 00:07:46,300
to x here, let's cancel
the partials with respect

155
00:07:46,300 --> 00:07:49,180
to y here, and let's cancel
the partials with respect

156
00:07:49,180 --> 00:07:50,200
to z here.

157
00:07:50,200 --> 00:07:53,450
By the way, if you did that,
notice what you would get

158
00:07:53,450 --> 00:07:57,590
is the contradiction that the
partial of w with respect to r

159
00:07:57,590 --> 00:07:59,750
is equal to the partial
of w with respect

160
00:07:59,750 --> 00:08:02,320
to r, plus the partial
of w with respect

161
00:08:02,320 --> 00:08:04,935
to r, plus the partial
of w with respect to r.

162
00:08:04,935 --> 00:08:06,310
In other words,
it seems that you

163
00:08:06,310 --> 00:08:09,340
would get that the partial
of w with respect to r

164
00:08:09,340 --> 00:08:12,150
is always three
times itself, which

165
00:08:12,150 --> 00:08:15,080
is, I hope, a glaring enough
contradiction so I don't

166
00:08:15,080 --> 00:08:18,500
have to go into any more detail
about the contradiction part.

167
00:08:18,500 --> 00:08:26,460
Notice again, though, why
I have made such a fetish

168
00:08:26,460 --> 00:08:29,090
over labeling the variables.

169
00:08:29,090 --> 00:08:31,440
Notice that when you're
taking the partial of w

170
00:08:31,440 --> 00:08:35,919
with respect to x, you're
assuming that y and z

171
00:08:35,919 --> 00:08:38,490
are the variables that
are being held constant.

172
00:08:38,490 --> 00:08:41,620
And when you're taking the
partial of x with respect to r,

173
00:08:41,620 --> 00:08:44,900
it's s that you're assuming
is being held constant.

174
00:08:44,900 --> 00:08:47,080
And as soon as you look
at these subscripts here,

175
00:08:47,080 --> 00:08:50,120
somehow or other that
should put you on your guard

176
00:08:50,120 --> 00:08:53,070
to be careful about crossing
out because, after all,

177
00:08:53,070 --> 00:08:55,620
the changes are being
made with respect

178
00:08:55,620 --> 00:08:57,660
to different sets of variables.

179
00:08:57,660 --> 00:09:00,710
At any rate, this
is the statement.

180
00:09:00,710 --> 00:09:04,990
And my other claim is that
the proof follows immediately

181
00:09:04,990 --> 00:09:09,940
from the main, key theorem that
we stressed last time, even

182
00:09:09,940 --> 00:09:11,380
though we didn't prove it.

183
00:09:11,380 --> 00:09:14,280
But we've had ample
exercises using this.

184
00:09:14,280 --> 00:09:16,870
Namely, notice that
we have already

185
00:09:16,870 --> 00:09:21,630
seen that if w does happen to
be a continuously differentiable

186
00:09:21,630 --> 00:09:26,840
function of x, y, and z, then
delta w is the partial of w

187
00:09:26,840 --> 00:09:30,290
with respect to x times delta
x, plus the partial of w

188
00:09:30,290 --> 00:09:33,660
with respect to y times delta
y, plus the partial of w

189
00:09:33,660 --> 00:09:37,790
with respect to z times
delta z, plus an error term.

190
00:09:37,790 --> 00:09:39,200
And what is that error term?

191
00:09:39,200 --> 00:09:43,740
It's k_1 delta x, plus k_2
delta y, plus k_3 delta

192
00:09:43,740 --> 00:09:48,200
z, where k_1, k_2,
and k_3 all approach

193
00:09:48,200 --> 00:09:52,220
0 as delta x, delta y,
and delta z approach 0.

194
00:09:52,220 --> 00:09:56,350
Now again, the key
step in all of this

195
00:09:56,350 --> 00:10:01,940
is that this amount here I
could always call delta w tan,

196
00:10:01,940 --> 00:10:04,810
or as Professor Thomas calls
it for more than two variables,

197
00:10:04,810 --> 00:10:10,530
delta w sub lin, l-i-n, meaning
that this is a linear equation.

198
00:10:10,530 --> 00:10:12,760
Remember-- I've made
an abbreviation here--

199
00:10:12,760 --> 00:10:17,340
these partials are assumed to be
evaluated at a particular point

200
00:10:17,340 --> 00:10:19,040
that we're interested in.

201
00:10:19,040 --> 00:10:24,060
But the idea is, granted that
I can always call this delta w

202
00:10:24,060 --> 00:10:30,070
tan, to say that the error has
this small a magnitude depends

203
00:10:30,070 --> 00:10:33,110
on the fact that w is a
continuously differentiable

204
00:10:33,110 --> 00:10:35,290
function of x, y, and z.

205
00:10:35,290 --> 00:10:37,310
That's why the theory
is so important.

206
00:10:37,310 --> 00:10:40,380
What happens in real life
is that most examples

207
00:10:40,380 --> 00:10:44,850
you encounter in real-life
engineering, the functions

208
00:10:44,850 --> 00:10:48,170
that you're dealing with are
continuously differentiable.

209
00:10:48,170 --> 00:10:51,260
So it seems like we're making
a big issue over nothing.

210
00:10:51,260 --> 00:10:54,100
I should point out that on
the frontiers of knowledge,

211
00:10:54,100 --> 00:10:57,000
enough situations occur where
the functions that we're

212
00:10:57,000 --> 00:10:59,800
dealing with are not
continuously differentiable,

213
00:10:59,800 --> 00:11:03,610
that some horrible mistakes can
be made by assuming that you

214
00:11:03,610 --> 00:11:09,090
can replace delta w by this,
without any significant error.

215
00:11:09,090 --> 00:11:11,660
But as long as this is
the case, we can do this.

216
00:11:11,660 --> 00:11:14,990
And now notice, what does the
partial of w with respect to r

217
00:11:14,990 --> 00:11:15,680
mean?

218
00:11:15,680 --> 00:11:20,050
It means you take delta
w divided by delta r.

219
00:11:20,050 --> 00:11:21,820
And let me just do that here.

220
00:11:21,820 --> 00:11:24,890
I'll just divide
every term by delta r.

221
00:11:32,360 --> 00:11:34,610
And now, what do I have to
do next to get the partial?

222
00:11:34,610 --> 00:11:37,970
I have to take the limit
as delta r approaches 0.

223
00:11:37,970 --> 00:11:41,720
Now the interesting point
is as delta approaches 0,

224
00:11:41,720 --> 00:11:45,300
holding s fixed, this
term obviously becomes

225
00:11:45,300 --> 00:11:48,820
the partial of x with
respect to r, by definition.

226
00:11:48,820 --> 00:11:51,940
This term becomes the partial
of y with respect to r,

227
00:11:51,940 --> 00:11:53,180
by definition.

228
00:11:53,180 --> 00:11:56,650
And this term becomes a
partial of z with respect to r,

229
00:11:56,650 --> 00:11:57,860
by definition.

230
00:11:57,860 --> 00:12:01,070
By the way, notice that even
though delta x, delta y,

231
00:12:01,070 --> 00:12:05,020
and delta z are all going
to 0 as delta r goes to 0,

232
00:12:05,020 --> 00:12:08,620
you can not immediately conclude
that these terms drop out.

233
00:12:08,620 --> 00:12:11,800
Because after all, delta
r is also approaching 0.

234
00:12:11,800 --> 00:12:15,380
So delta x over delta r
is that 0 over 0 form.

235
00:12:15,380 --> 00:12:18,890
In fact, that's
precisely the partial

236
00:12:18,890 --> 00:12:21,560
of x with respect to r term
that we're talking about.

237
00:12:21,560 --> 00:12:23,270
The beauty is what?

238
00:12:23,270 --> 00:12:25,780
That as delta x,
delta y, and delta z

239
00:12:25,780 --> 00:12:28,840
approach 0, each of
the k's approach 0.

240
00:12:31,490 --> 00:12:35,530
You see, the reason that the
error term becomes negligible,

241
00:12:35,530 --> 00:12:39,180
becomes 0 in the limit, isn't
because delta x, delta y,

242
00:12:39,180 --> 00:12:41,350
and delta z are becoming small.

243
00:12:41,350 --> 00:12:43,550
Because these small
numbers are being divided

244
00:12:43,550 --> 00:12:44,850
by another small number.

245
00:12:44,850 --> 00:12:49,010
It's because the k_1, k_2,
and k_3 are getting small.

246
00:12:49,010 --> 00:12:52,010
At any rate, putting
this all together,

247
00:12:52,010 --> 00:12:54,260
notice that now, in
a manner completely

248
00:12:54,260 --> 00:12:57,380
analogous to our part-one
treatment of the chain rule,

249
00:12:57,380 --> 00:12:59,680
except that we're now dealing
with several variables,

250
00:12:59,680 --> 00:13:01,530
these three terms drop out.

251
00:13:01,530 --> 00:13:06,410
And these three terms become
the claim that we made before.

252
00:13:06,410 --> 00:13:10,040
In other words, this is how the
partial of w with respect to r

253
00:13:10,040 --> 00:13:10,640
is computed.

254
00:13:10,640 --> 00:13:14,600
And again, the theory is
the easiest part of this.

255
00:13:14,600 --> 00:13:15,860
That's the easy part.

256
00:13:15,860 --> 00:13:17,785
The hard part is
getting familiarity

257
00:13:17,785 --> 00:13:18,910
with how to work with this.

258
00:13:18,910 --> 00:13:21,270
And I think the best way
to get some familiarity

259
00:13:21,270 --> 00:13:24,470
for working with this is to pick
particularly simple problems

260
00:13:24,470 --> 00:13:26,300
for the lecture,
problems where it's

261
00:13:26,300 --> 00:13:28,620
so easy to do the
problem both ways

262
00:13:28,620 --> 00:13:31,100
that no hangup can
possibly occur.

263
00:13:31,100 --> 00:13:32,900
Let's take a very
simple example.

264
00:13:32,900 --> 00:13:34,830
Let's suppose that
w equals x squared

265
00:13:34,830 --> 00:13:37,260
plus y squared plus z squared.

266
00:13:37,260 --> 00:13:42,690
Suppose we also know that x
is r plus s, y is r minus s,

267
00:13:42,690 --> 00:13:44,990
and z happens to be 2r.

268
00:13:44,990 --> 00:13:47,410
In this particular
case, notice that we

269
00:13:47,410 --> 00:13:52,330
would find the partial of w with
respect to r very conveniently

270
00:13:52,330 --> 00:13:53,980
by direct substitution.

271
00:13:53,980 --> 00:13:57,270
Namely, we simply
replace x by r plus s,

272
00:13:57,270 --> 00:14:02,620
we replace y by r minus
s, we replace z by 2r.

273
00:14:02,620 --> 00:14:06,700
And then w simply becomes
this expression here,

274
00:14:06,700 --> 00:14:09,260
which when we collect
terms, becomes

275
00:14:09,260 --> 00:14:12,460
6 r squared plus 2 s squared.

276
00:14:12,460 --> 00:14:14,460
And again, the arithmetic
there is simple enough

277
00:14:14,460 --> 00:14:16,543
so I'm not even going to
bother worrying about how

278
00:14:16,543 --> 00:14:18,600
we justify these steps.

279
00:14:18,600 --> 00:14:22,690
At which stage, to take the
partial of w with respect to r,

280
00:14:22,690 --> 00:14:25,956
holding s constant,
this is simply what?

281
00:14:25,956 --> 00:14:27,070
12r.

282
00:14:27,070 --> 00:14:29,110
Because s is being
treated as a constant,

283
00:14:29,110 --> 00:14:31,990
its derivative with
respect to r is 0.

284
00:14:31,990 --> 00:14:34,360
You see, in an
example like this,

285
00:14:34,360 --> 00:14:37,530
one would not really be
tempted to use the chain rule.

286
00:14:37,530 --> 00:14:40,680
The chain rule is
used in many cases

287
00:14:40,680 --> 00:14:43,980
not just for convenience,
but in cases of great theory

288
00:14:43,980 --> 00:14:47,950
where you're only given that
w is some function of x, y,

289
00:14:47,950 --> 00:14:50,570
and z, and you're
not told explicitly

290
00:14:50,570 --> 00:14:51,740
what the function is.

291
00:14:51,740 --> 00:14:53,690
You're just given f of x, y, z.

292
00:14:53,690 --> 00:14:57,110
In the case where the
function is given explicitly,

293
00:14:57,110 --> 00:15:00,380
it's sometimes very easy
to substitute directly.

294
00:15:00,380 --> 00:15:03,050
At any rate, what the chain
rule says is roughly this.

295
00:15:03,050 --> 00:15:03,920
They say, lookit.

296
00:15:03,920 --> 00:15:06,440
From this equation,
you could immediately

297
00:15:06,440 --> 00:15:09,590
say the partial of w
with respect to x is 2x,

298
00:15:09,590 --> 00:15:12,470
the partial of w with
respect to y is 2y,

299
00:15:12,470 --> 00:15:15,710
the partial of w with
respect to z is 2z.

300
00:15:15,710 --> 00:15:17,460
From this equation,
you could immediately

301
00:15:17,460 --> 00:15:20,340
say that the partial of
x with respect to r is 1,

302
00:15:20,340 --> 00:15:23,040
the partial of x with
respect to s is 1,

303
00:15:23,040 --> 00:15:25,920
the partial of y with
respect to r is 1,

304
00:15:25,920 --> 00:15:29,850
the partial of y with
respect to s is minus 1,

305
00:15:29,850 --> 00:15:32,910
the partial of z with
respect to r is 2,

306
00:15:32,910 --> 00:15:36,450
and the partial of z
with respect to s is 0.

307
00:15:36,450 --> 00:15:43,380
In particular, summarizing our
results, we have these here.

308
00:15:43,380 --> 00:15:45,300
Now, what the chain
rule says is what?

309
00:15:45,300 --> 00:15:48,190
To find the partial of
w with respect to r,

310
00:15:48,190 --> 00:15:50,410
you just take the
partial of w with respect

311
00:15:50,410 --> 00:15:53,540
to x times the partial
of x with respect to r,

312
00:15:53,540 --> 00:15:55,630
plus the partial
of w with respect

313
00:15:55,630 --> 00:15:58,600
to y times the partial
of y with respect to r,

314
00:15:58,600 --> 00:16:00,320
plus the partial
of w with respect

315
00:16:00,320 --> 00:16:03,100
to z times the partial
of z with respect to r.

316
00:16:03,100 --> 00:16:06,150
And if we do that in this
case, we simply get what?

317
00:16:06,150 --> 00:16:11,380
2x plus 2y plus 4z.

318
00:16:11,380 --> 00:16:14,170
Now again, I picked,
deliberately,

319
00:16:14,170 --> 00:16:16,260
a very simple problem here.

320
00:16:16,260 --> 00:16:24,840
Remember, by definition, x
is r plus s, y is r minus s,

321
00:16:24,840 --> 00:16:28,340
and z happens to be 2r.

322
00:16:28,340 --> 00:16:31,770
And now you can see very quickly
here that when I substitute in,

323
00:16:31,770 --> 00:16:32,380
I get what?

324
00:16:32,380 --> 00:16:40,360
2r plus 2r is 4r, plus 8r is
12r, and 2s minus 2s is 0.

325
00:16:40,360 --> 00:16:42,750
The partial of w
with respect to r

326
00:16:42,750 --> 00:16:45,710
is also 12r, also meaning what?

327
00:16:45,710 --> 00:16:48,390
We found that same
answer before.

328
00:16:48,390 --> 00:16:51,630
At least that's how
the chain rule works.

329
00:16:51,630 --> 00:16:55,220
And again, we have to remember
that the chain rule does not

330
00:16:55,220 --> 00:16:57,390
depend on the
number of variables,

331
00:16:57,390 --> 00:17:00,730
even though this may start
to look a little bit sticky.

332
00:17:00,730 --> 00:17:02,340
Let's word it as follows.

333
00:17:02,340 --> 00:17:07,109
Suppose w happens to be a
continuously differentiable

334
00:17:07,109 --> 00:17:13,099
function of the n independent
variables x_1 up to x_n.

335
00:17:13,099 --> 00:17:16,190
See, that's what this
parenthetical remark means.

336
00:17:16,190 --> 00:17:20,210
I'm saying that not only do
the partials of f with respect

337
00:17:20,210 --> 00:17:22,880
to x_1 up to x_n exist
at a given point,

338
00:17:22,880 --> 00:17:24,670
but they are continuous there.

339
00:17:24,670 --> 00:17:26,310
And why do I want that in there?

340
00:17:26,310 --> 00:17:28,650
So I can say that my
error term is never

341
00:17:28,650 --> 00:17:33,550
any greater than that k_1
delta x_1, plus k_2 delta x_2,

342
00:17:33,550 --> 00:17:36,680
plus, et cetera, k_n delta
x_n, where the k's go

343
00:17:36,680 --> 00:17:39,310
to 0 as the delta x's go to 0.

344
00:17:39,310 --> 00:17:41,880
I'm going to spare you
the details of proofs.

345
00:17:41,880 --> 00:17:43,530
But I just want
you to keep seeing

346
00:17:43,530 --> 00:17:45,400
why these things are necessary.

347
00:17:45,400 --> 00:17:50,170
At any rate, let's suppose now
that each of the n variables

348
00:17:50,170 --> 00:17:56,020
x_1 up to x_n turn out to be
functions of the m variables.

349
00:17:56,020 --> 00:17:58,800
n and m could
conceivably be equal.

350
00:17:58,800 --> 00:18:01,500
But m could even be more than n.

351
00:18:01,500 --> 00:18:02,637
It can be less than n.

352
00:18:02,637 --> 00:18:04,470
There's no reason why
they have to be equal.

353
00:18:04,470 --> 00:18:07,630
All we're saying is, speaking
in the most general terms,

354
00:18:07,630 --> 00:18:12,040
suppose each of the n x's is
a continuously differentiable

355
00:18:12,040 --> 00:18:16,870
function of the m independent
variables y sub 1 up

356
00:18:16,870 --> 00:18:17,880
to y sub m.

357
00:18:17,880 --> 00:18:20,400
In fact, that's what this
"et cetera" means here.

358
00:18:20,400 --> 00:18:23,370
The "et cetera" refers to the
parenthetical remark here.

359
00:18:23,370 --> 00:18:27,590
I mean that not only are the
x's functions of y_1 up to y_m,

360
00:18:27,590 --> 00:18:30,890
but they're continuously
differentiable functions.

361
00:18:30,890 --> 00:18:35,180
Now obviously, what we're saying
is that if w can be expressed

362
00:18:35,180 --> 00:18:38,260
in terms of the x's, the x's
can be expressed in terms

363
00:18:38,260 --> 00:18:42,480
of the y's, obviously then,
w can be expressed in terms

364
00:18:42,480 --> 00:18:43,090
of the y's.

365
00:18:43,090 --> 00:18:47,590
In other words, w is some
function of y_1 up to y_m.

366
00:18:47,590 --> 00:18:51,040
Now the question that comes up
is that just looking at this,

367
00:18:51,040 --> 00:18:54,810
I can talk about the partial
of w with respect to y_1,

368
00:18:54,810 --> 00:18:57,640
the partial of w
with respect to y_2,

369
00:18:57,640 --> 00:19:02,660
the partial of w with respect to
y_3, et cetera, all the way up

370
00:19:02,660 --> 00:19:05,820
to the partial of w
with respect to y sub m.

371
00:19:05,820 --> 00:19:07,610
And the question is, lookit.

372
00:19:07,610 --> 00:19:10,900
From the original form of
w, it was easy to talk about

373
00:19:10,900 --> 00:19:14,060
the partials of w with
respect to the x's.

374
00:19:14,060 --> 00:19:17,620
From how the x's are given in
terms of the y's, it's easy

375
00:19:17,620 --> 00:19:20,420
to talk about the derivatives
of the x's with respect

376
00:19:20,420 --> 00:19:21,550
to the y's.

377
00:19:21,550 --> 00:19:23,170
And so the question
is, how do you

378
00:19:23,170 --> 00:19:26,660
find the partial of w with
respect to, say, y sub

379
00:19:26,660 --> 00:19:29,690
1, given all of these
other partial derivatives?

380
00:19:29,690 --> 00:19:32,460
And the answer, again, is
something that you just

381
00:19:32,460 --> 00:19:33,640
have to get used to.

382
00:19:33,640 --> 00:19:36,600
The proof goes through
for n and m the same way

383
00:19:36,600 --> 00:19:38,780
as it did for the
lower-dimensional case.

384
00:19:38,780 --> 00:19:41,410
And the intuitive
interpretation is the same.

385
00:19:41,410 --> 00:19:45,380
Namely, to find the partial
of w with respect to y_1,

386
00:19:45,380 --> 00:19:48,590
we simply see how much
w changed with respect

387
00:19:48,590 --> 00:19:52,300
to y_1 due to the
change in x_1 alone,

388
00:19:52,300 --> 00:19:54,940
add on to that the
change in w with respect

389
00:19:54,940 --> 00:19:59,720
to y_1 due to the change
in x_2 alone, et cetera,

390
00:19:59,720 --> 00:20:07,450
add on to that, finally, the
change in w with respect to y

391
00:20:07,450 --> 00:20:10,970
sub 1 due to the change
in x sub n alone.

392
00:20:10,970 --> 00:20:12,580
In other words,
again, if you think

393
00:20:12,580 --> 00:20:15,740
of this in terms
of cancellation,

394
00:20:15,740 --> 00:20:19,210
if you cross these things out,
don't think of adding them,

395
00:20:19,210 --> 00:20:20,710
but think of them as what?

396
00:20:20,710 --> 00:20:24,130
Giving you the
individual components

397
00:20:24,130 --> 00:20:27,450
that tell you how the partial
of w with respect to y_1

398
00:20:27,450 --> 00:20:28,350
is made up.

399
00:20:28,350 --> 00:20:30,422
By the way, there is
one parenthetical remark

400
00:20:30,422 --> 00:20:31,880
that I haven't
written on the board

401
00:20:31,880 --> 00:20:33,920
that I would like to
make at this time.

402
00:20:33,920 --> 00:20:36,440
In Professor
Thomas's text, he has

403
00:20:36,440 --> 00:20:39,360
elected to introduce
matrix algebra prior

404
00:20:39,360 --> 00:20:41,420
to this particular chapter.

405
00:20:41,420 --> 00:20:43,800
It again turns out
that one does not

406
00:20:43,800 --> 00:20:46,340
need matrices to talk
about the chain rule

407
00:20:46,340 --> 00:20:49,180
but that if one had
matrix notation,

408
00:20:49,180 --> 00:20:51,580
the matrix notation
is particularly

409
00:20:51,580 --> 00:20:54,680
convenient for summarizing
the chain rule.

410
00:20:54,680 --> 00:20:59,010
I have elected to hold
off on matrix algebra

411
00:20:59,010 --> 00:21:01,210
till the near future
because it comes up

412
00:21:01,210 --> 00:21:03,940
in a much better
motivated way, I

413
00:21:03,940 --> 00:21:07,570
think, in terms of these
linear approximations.

414
00:21:07,570 --> 00:21:10,410
But the point is if, as
you're reading the text,

415
00:21:10,410 --> 00:21:12,980
you see the matrix
notation, and you are not

416
00:21:12,980 --> 00:21:15,510
familiar with the
matrices, forget it.

417
00:21:15,510 --> 00:21:20,300
All the matrix is, is a shortcut
notation for saying this.

418
00:21:20,300 --> 00:21:22,470
And if I want a
shortcut notation here,

419
00:21:22,470 --> 00:21:24,530
I don't need matrices
for saying this.

420
00:21:24,530 --> 00:21:27,510
I can say this in terms
of our sigma notation.

421
00:21:27,510 --> 00:21:29,780
Notice that one
other way of writing

422
00:21:29,780 --> 00:21:33,560
this thing very compactly
that may be more suggestive

423
00:21:33,560 --> 00:21:34,810
is the following.

424
00:21:34,810 --> 00:21:39,440
Notice that I'm adding
up n terms here.

425
00:21:39,440 --> 00:21:43,680
Each term consists of two
factors, each of which

426
00:21:43,680 --> 00:21:45,430
looks like a fraction.

427
00:21:45,430 --> 00:21:47,260
The numerator of
the first fraction

428
00:21:47,260 --> 00:21:50,960
is always a partial of w.

429
00:21:50,960 --> 00:21:52,790
The denominator of
the second fraction

430
00:21:52,790 --> 00:21:55,980
is always the partial y_1.

431
00:21:55,980 --> 00:21:59,330
And it appears that the
denominator of the first

432
00:21:59,330 --> 00:22:03,600
and the numerator of the second
always have the same subscript,

433
00:22:03,600 --> 00:22:07,910
but they seem to vary
consecutively from 1 to n.

434
00:22:07,910 --> 00:22:10,850
And that's precisely where the
sigma notation comes in handy.

435
00:22:10,850 --> 00:22:13,130
Why don't we just
write, therefore,

436
00:22:13,130 --> 00:22:16,610
that this is the sum,
partial of w with respect

437
00:22:16,610 --> 00:22:20,200
to x sub k, plus the
partial of x sub k

438
00:22:20,200 --> 00:22:23,020
with respect to y_1,
as the subscript

439
00:22:23,020 --> 00:22:28,170
k ranges through all
integral values from 1 to n?

440
00:22:28,170 --> 00:22:31,270
In other words, notice that
in this particular form,

441
00:22:31,270 --> 00:22:34,310
we have simply rewritten
this thing compactly.

442
00:22:34,310 --> 00:22:36,830
But if you look at
this and look at this,

443
00:22:36,830 --> 00:22:40,220
I think it's very suggestive to
see how the chain rule works.

444
00:22:40,220 --> 00:22:43,940
You see, here's your partial
of w with respect to y_1.

445
00:22:43,940 --> 00:22:45,350
And these are what?

446
00:22:45,350 --> 00:22:51,040
The contributions due to each of
the changes of the n variables.

447
00:22:51,040 --> 00:22:54,350
This is the change due
to the x sub k variable.

448
00:22:54,350 --> 00:22:57,400
And you add these all up because
they're independent variables,

449
00:22:57,400 --> 00:22:59,620
as k goes from 1 to n.

450
00:22:59,620 --> 00:23:02,280
And I write "et cetera"
here simply to point out

451
00:23:02,280 --> 00:23:04,610
that I could have
computed the partial of w

452
00:23:04,610 --> 00:23:07,670
with respect to y_2
instead of y sub 1.

453
00:23:07,670 --> 00:23:10,560
By the way, the recipe would've
looked exactly the same,

454
00:23:10,560 --> 00:23:12,560
except that if there
was a 2 here, there

455
00:23:12,560 --> 00:23:13,940
would have been a 2 here.

456
00:23:13,940 --> 00:23:17,160
If there were a 3 here,
they would've been a 3 here.

457
00:23:17,160 --> 00:23:21,020
If there were an m here, there
would have been an m here.

458
00:23:21,020 --> 00:23:22,800
OK, now lookit.

459
00:23:22,800 --> 00:23:25,630
At this particular
stage of our lecture

460
00:23:25,630 --> 00:23:30,080
today, this could
end with the idea

461
00:23:30,080 --> 00:23:32,440
that for the unit
that's now assigned,

462
00:23:32,440 --> 00:23:34,450
this is as far as
you have to go.

463
00:23:34,450 --> 00:23:36,870
In other words,
for the exercises

464
00:23:36,870 --> 00:23:39,890
that I've given you in
this particular unit,

465
00:23:39,890 --> 00:23:43,460
we do nothing higher
than using the chain rule

466
00:23:43,460 --> 00:23:46,250
for first-order derivatives.

467
00:23:46,250 --> 00:23:49,670
The point is that in many
applications in real life,

468
00:23:49,670 --> 00:23:52,840
we must take
higher-order derivatives.

469
00:23:52,840 --> 00:23:55,480
In other words, there are
many differential equations,

470
00:23:55,480 --> 00:23:58,360
partial differential
equations, where we must work

471
00:23:58,360 --> 00:24:00,100
with higher-order derivatives.

472
00:24:00,100 --> 00:24:03,670
And for that reason, it becomes
very important, sometimes,

473
00:24:03,670 --> 00:24:06,360
to be able to take
a second derivative

474
00:24:06,360 --> 00:24:08,700
or a third derivative
or a fourth derivative

475
00:24:08,700 --> 00:24:10,480
by means of the chain rule.

476
00:24:10,480 --> 00:24:14,140
Now the interesting point is
that the theory that we've used

477
00:24:14,140 --> 00:24:16,420
so far doesn't change at all.

478
00:24:16,420 --> 00:24:19,070
What does happen is that
the average student,

479
00:24:19,070 --> 00:24:21,960
in learning this material
for the first time,

480
00:24:21,960 --> 00:24:24,080
gets swamped by the notation.

481
00:24:24,080 --> 00:24:26,340
Consequently, what
I want to do is

482
00:24:26,340 --> 00:24:30,090
to give you the lecture on
this material at the same time

483
00:24:30,090 --> 00:24:33,850
that I'm lecturing on
first-order derivatives, simply

484
00:24:33,850 --> 00:24:37,640
because the continuity
follows smoother this way,

485
00:24:37,640 --> 00:24:40,380
so that you see what the
whole overall picture is, then

486
00:24:40,380 --> 00:24:42,740
to make sure that you
cement these things down.

487
00:24:42,740 --> 00:24:44,830
The next unit after
this will give

488
00:24:44,830 --> 00:24:47,490
you drill on taking
higher-order derivatives.

489
00:24:47,490 --> 00:24:49,440
What this may mean
is that many of you

490
00:24:49,440 --> 00:24:54,640
may prefer to watch
this half of the film

491
00:24:54,640 --> 00:24:57,920
a second time,
after you've already

492
00:24:57,920 --> 00:24:59,490
tried working some
of the problems

493
00:24:59,490 --> 00:25:01,530
with higher-order
derivatives, if you're still

494
00:25:01,530 --> 00:25:02,490
confused by this.

495
00:25:02,490 --> 00:25:03,960
But at any rate,
let's take a look

496
00:25:03,960 --> 00:25:05,660
at a hypothetical situation.

497
00:25:05,660 --> 00:25:08,160
Since we're so used
to polar coordinates,

498
00:25:08,160 --> 00:25:10,910
let's talk in terms
of polar coordinates.

499
00:25:10,910 --> 00:25:13,690
Suppose w happens to be a
continuously differentiable

500
00:25:13,690 --> 00:25:15,875
function of x and y.

501
00:25:15,875 --> 00:25:20,760
x and y, in turn, are
continuously differentiable

502
00:25:20,760 --> 00:25:23,280
functions of the polar
coordinates r and theta.

503
00:25:23,280 --> 00:25:25,860
In fact, they're x
equals r cosine theta,

504
00:25:25,860 --> 00:25:27,850
y equals r sine theta.

505
00:25:27,850 --> 00:25:28,540
Now lookit.

506
00:25:28,540 --> 00:25:32,710
If all I want to do is find the
partial of w with respect to r,

507
00:25:32,710 --> 00:25:35,650
I can do that by the
ordinary chain rule.

508
00:25:35,650 --> 00:25:38,010
Namely, it's the partial
of w with respect

509
00:25:38,010 --> 00:25:40,700
to x times the partial
of x with respect to r,

510
00:25:40,700 --> 00:25:42,360
plus the partial
of w with respect

511
00:25:42,360 --> 00:25:45,680
to y times the partial
of y with respect to r.

512
00:25:45,680 --> 00:25:48,240
Now, knowing what x
looks like explicitly

513
00:25:48,240 --> 00:25:51,480
in terms of r and theta and
what y looks like explicitly

514
00:25:51,480 --> 00:25:53,690
in terms of r and
theta, I can certainly

515
00:25:53,690 --> 00:25:56,300
compute the partials
of x and y with respect

516
00:25:56,300 --> 00:25:59,560
to r, holding theta constant.

517
00:25:59,560 --> 00:26:02,550
In particular, the partial
of x with respect to r

518
00:26:02,550 --> 00:26:04,200
is simply cosine theta.

519
00:26:04,200 --> 00:26:06,740
And the partial of
y with respect to r

520
00:26:06,740 --> 00:26:08,170
is simply sine theta.

521
00:26:08,170 --> 00:26:11,140
So the partial of
w with respect to r

522
00:26:11,140 --> 00:26:13,560
is partial of w with
respect to x times

523
00:26:13,560 --> 00:26:18,010
cosine theta, plus partial of
w with respect to y times sine

524
00:26:18,010 --> 00:26:18,510
theta.

525
00:26:18,510 --> 00:26:23,410
And by the way, notice that I
cannot simplify these terms.

526
00:26:23,410 --> 00:26:25,790
I cannot simplify
these terms in general,

527
00:26:25,790 --> 00:26:29,570
because all I'm given is that
w is some function of x and y.

528
00:26:29,570 --> 00:26:33,120
I don't know what w looks like
explicitly in terms of x and y.

529
00:26:33,120 --> 00:26:36,660
So all I can do is talk about
the partials of w with respect

530
00:26:36,660 --> 00:26:40,320
to x, partial of w respect
to y, without worrying

531
00:26:40,320 --> 00:26:43,300
any more about this, with the
understanding that if I knew

532
00:26:43,300 --> 00:26:46,440
what w looked like explicitly
in terms of x and y,

533
00:26:46,440 --> 00:26:48,985
I could work out
what this thing was.

534
00:26:48,985 --> 00:26:50,440
Now, here's the key point.

535
00:26:50,440 --> 00:26:52,340
That's why I've accentuated it.

536
00:26:52,340 --> 00:26:55,400
In the same way
that w is a function

537
00:26:55,400 --> 00:27:00,120
of both x and y, so also are
the partials of w with respect

538
00:27:00,120 --> 00:27:03,530
to x and the partials
of w with respect to y.

539
00:27:03,530 --> 00:27:06,750
In other words, even
though this looks

540
00:27:06,750 --> 00:27:09,180
like this emphasizes
the x, notice

541
00:27:09,180 --> 00:27:11,290
that when you take
the derivative

542
00:27:11,290 --> 00:27:16,040
of a function of both x and y
with respect to x, in general,

543
00:27:16,040 --> 00:27:17,640
the resulting
function will again

544
00:27:17,640 --> 00:27:20,300
be a function of both x and y.

545
00:27:20,300 --> 00:27:23,070
And so what we're saying is
that if the partials of w

546
00:27:23,070 --> 00:27:27,310
with respect to x and the
partials of w with respect to y

547
00:27:27,310 --> 00:27:30,580
also happen to be continuously
differentiable functions of x

548
00:27:30,580 --> 00:27:34,905
and y, we could, if we wished,
use the chain rule again.

549
00:27:34,905 --> 00:27:37,120
In other words, suppose
in the particular problem

550
00:27:37,120 --> 00:27:38,890
that I was dealing
with, it wasn't

551
00:27:38,890 --> 00:27:42,320
enough to know the partial
of w with respect to r.

552
00:27:42,320 --> 00:27:45,670
Suppose, for example, I
wanted the second partial of w

553
00:27:45,670 --> 00:27:46,980
with respect to r.

554
00:27:46,980 --> 00:27:50,220
Well obviously, that
simply means what?

555
00:27:50,220 --> 00:27:54,360
Take the partial of
this with respect to r.

556
00:27:54,360 --> 00:27:57,590
In other words, the second
partial of w with respect to r

557
00:27:57,590 --> 00:28:02,070
is just the partial of the
partial of w with respect to r,

558
00:28:02,070 --> 00:28:03,770
with respect r.

559
00:28:03,770 --> 00:28:07,510
I'm just going to differentiate
this thing with respect to r.

560
00:28:07,510 --> 00:28:10,720
In other words, writing this
out more succinctly for you,

561
00:28:10,720 --> 00:28:13,860
the second partial derivative
of w with respect to r

562
00:28:13,860 --> 00:28:17,290
is the partial with respect
to r of cosine theta

563
00:28:17,290 --> 00:28:21,400
partial of w with respect to
x, plus sine theta partial

564
00:28:21,400 --> 00:28:23,640
of w with respect to y.

565
00:28:23,640 --> 00:28:24,750
Now, here's the key point.

566
00:28:24,750 --> 00:28:26,890
When we differentiate
here, we're

567
00:28:26,890 --> 00:28:29,570
assuming that theta is
being held constant.

568
00:28:29,570 --> 00:28:30,850
Isn't that right?

569
00:28:30,850 --> 00:28:33,720
So consequently, when I'm
differentiating with respect

570
00:28:33,720 --> 00:28:37,180
to r, cosine theta
is a constant.

571
00:28:37,180 --> 00:28:40,530
I can skip over that,
see, and differentiate

572
00:28:40,530 --> 00:28:42,764
what's left with respect to r.

573
00:28:42,764 --> 00:28:43,930
In other words, that's what?

574
00:28:43,930 --> 00:28:47,440
It's the partial of w with
respect to x differentiated

575
00:28:47,440 --> 00:28:49,430
with respect to r.

576
00:28:49,430 --> 00:28:54,090
See, I'm using the ordinary rule
for the derivative of a sum.

577
00:28:54,090 --> 00:28:55,920
Now, the derivative
of sine theta--

578
00:28:55,920 --> 00:28:58,860
see, sine theta is a
constant with respect to r.

579
00:28:58,860 --> 00:29:00,590
So the derivative
of this term is just

580
00:29:00,590 --> 00:29:03,800
sine theta times the
derivative of the partial

581
00:29:03,800 --> 00:29:09,730
of w with respect to y, with
respect to r, written this way.

582
00:29:09,730 --> 00:29:13,320
Now, the key point is that
both of these functions

583
00:29:13,320 --> 00:29:18,712
here, both of these are
functions of x and y.

584
00:29:18,712 --> 00:29:22,240
x and y, in turn, are
functions of r and theta.

585
00:29:22,240 --> 00:29:24,920
So in other words,
to differentiate

586
00:29:24,920 --> 00:29:30,120
this thing with respect to r, I
must use the chain rule again.

587
00:29:30,120 --> 00:29:33,000
Now, because this may
seem difficult for you,

588
00:29:33,000 --> 00:29:34,920
all I'm really
saying is, lookit.

589
00:29:34,920 --> 00:29:37,670
If this term here
looks messy, since we

590
00:29:37,670 --> 00:29:40,000
know that the partial
of w with respect to x

591
00:29:40,000 --> 00:29:44,480
is some function of x and y,
let's call that h of x, y.

592
00:29:44,480 --> 00:29:50,570
Then all we're saying is that
the partial of the partial of w

593
00:29:50,570 --> 00:29:53,820
with respect to x,
with respect to r,

594
00:29:53,820 --> 00:29:56,860
is just the partial of
h with respect to r.

595
00:29:56,860 --> 00:29:59,540
But to find the partial
of h with respect to r,

596
00:29:59,540 --> 00:30:01,640
we know how to use
the chain rule there.

597
00:30:01,640 --> 00:30:02,580
It's just what?

598
00:30:02,580 --> 00:30:04,350
It's the partial
of h with respect

599
00:30:04,350 --> 00:30:07,720
to x times the partial
of x with respect to r,

600
00:30:07,720 --> 00:30:10,040
plus the partial
of h with respect

601
00:30:10,040 --> 00:30:13,780
to y times the partial
of y with respect to r.

602
00:30:13,780 --> 00:30:17,655
Of course, if we now
remember what h is-- see,

603
00:30:17,655 --> 00:30:21,210
h is the partial of
w with respect to x.

604
00:30:21,210 --> 00:30:25,110
So if I differentiate
again with respect to x,

605
00:30:25,110 --> 00:30:28,210
I get the second partial
of w with respect to x.

606
00:30:28,210 --> 00:30:30,760
We've already seen that the
partial of x with respect to r

607
00:30:30,760 --> 00:30:34,220
is cosine theta, so
I have this term.

608
00:30:34,220 --> 00:30:37,950
The partial of h with respect
to y really says what?

609
00:30:37,950 --> 00:30:41,310
Differentiate the
partial of w with respect

610
00:30:41,310 --> 00:30:44,470
to x, with respect to y.

611
00:30:44,470 --> 00:30:46,820
And the usual way
of abbreviating that

612
00:30:46,820 --> 00:30:49,590
is like this, which, again,
is explained in the reading

613
00:30:49,590 --> 00:30:51,100
material.

614
00:30:51,100 --> 00:30:55,540
And we now multiply that by
the partial of y with respect

615
00:30:55,540 --> 00:30:58,580
to r, which happens
to be sine theta.

616
00:30:58,580 --> 00:31:03,870
Now, look at this a few
times in your spare time,

617
00:31:03,870 --> 00:31:05,060
if it's bothering you.

618
00:31:05,060 --> 00:31:07,390
It is not really
that difficult. It

619
00:31:07,390 --> 00:31:10,260
is messy notation in the
sense that you're not

620
00:31:10,260 --> 00:31:12,804
used to notation that's
quite that messy.

621
00:31:12,804 --> 00:31:13,720
That's why it's messy.

622
00:31:13,720 --> 00:31:15,150
Once you get used
to it, it is not

623
00:31:15,150 --> 00:31:18,550
any tougher than the chain rule
for one independent variable.

624
00:31:18,550 --> 00:31:21,910
In fact, to take the
partial of the partial of w

625
00:31:21,910 --> 00:31:26,270
with respect to y, with respect
to r, I'll do that in one step

626
00:31:26,270 --> 00:31:27,830
without using a substitution.

627
00:31:27,830 --> 00:31:30,220
All I'm saying is
that this function

628
00:31:30,220 --> 00:31:32,200
depends on both x and y.

629
00:31:32,200 --> 00:31:35,710
So to see what its derivative
is with respect to r,

630
00:31:35,710 --> 00:31:38,790
I'll see what the contribution
of its derivative with respect

631
00:31:38,790 --> 00:31:41,780
to r is due to just x alone.

632
00:31:41,780 --> 00:31:44,120
Then I'll see what
contribution of its derivative

633
00:31:44,120 --> 00:31:46,940
with respect to r is
due to just y alone.

634
00:31:46,940 --> 00:31:48,780
And by the way, when
I say it that way,

635
00:31:48,780 --> 00:31:51,070
notice how quick it is
to write this thing down.

636
00:31:51,070 --> 00:31:52,490
I differentiate
this with respect

637
00:31:52,490 --> 00:31:56,120
to x multiplied by the partial
of x with respect to r.

638
00:31:56,120 --> 00:31:59,130
Add on to that the partial
of this with respect to y.

639
00:31:59,130 --> 00:32:02,390
Multiply that by the partial
of y with respect to r.

640
00:32:02,390 --> 00:32:05,400
If I do this, notice
now I have what?

641
00:32:05,400 --> 00:32:07,710
I have the partial
with respect to y.

642
00:32:07,710 --> 00:32:10,370
And I differentiate
that with respect to x.

643
00:32:10,370 --> 00:32:12,350
That's written this way.

644
00:32:12,350 --> 00:32:16,960
And by the way, notice that
this is the reverse order

645
00:32:16,960 --> 00:32:19,020
of what we did over here.

646
00:32:19,020 --> 00:32:22,060
Namely, in one case, we first
differentiated with respect

647
00:32:22,060 --> 00:32:24,410
to x and then with respect to y.

648
00:32:24,410 --> 00:32:26,810
In the other case, we
differentiated first

649
00:32:26,810 --> 00:32:29,770
with respect to y and
then with respect to x.

650
00:32:29,770 --> 00:32:32,510
So that actually, conceptually
there is a difference.

651
00:32:32,510 --> 00:32:35,980
That's why we write
these things differently.

652
00:32:35,980 --> 00:32:40,010
It does, again, turn
out that in most cases,

653
00:32:40,010 --> 00:32:42,680
the answer that you
get-- thank goodness--

654
00:32:42,680 --> 00:32:44,820
doesn't depend on the
order in which you

655
00:32:44,820 --> 00:32:46,120
perform the derivatives.

656
00:32:46,120 --> 00:32:48,070
But this is not at
all self-evident,

657
00:32:48,070 --> 00:32:50,170
even though you'd like
to believe that it is.

658
00:32:50,170 --> 00:32:54,050
But we'll talk about that more
in the exercises and the like.

659
00:32:54,050 --> 00:32:57,320
But all I'm saying now is that
if we put everything together

660
00:32:57,320 --> 00:33:00,270
of what we've had
before, we can obtain,

661
00:33:00,270 --> 00:33:03,390
in this particular case,
that the second partial of w

662
00:33:03,390 --> 00:33:08,700
with respect to r is this
somewhat messy but nonetheless

663
00:33:08,700 --> 00:33:10,770
straightforward expression.

664
00:33:10,770 --> 00:33:12,280
And see, I've
circled these things

665
00:33:12,280 --> 00:33:16,120
to sort of tell you that if it
is permissible to interchange

666
00:33:16,120 --> 00:33:20,670
the order of differentiation, we
could combine these two terms.

667
00:33:20,670 --> 00:33:24,650
On the other hand, if you
couldn't interchange the order,

668
00:33:24,650 --> 00:33:27,040
this would be a
rather dangerous thing

669
00:33:27,040 --> 00:33:31,090
to do over here because these
might be different answers.

670
00:33:31,090 --> 00:33:34,960
As I say again, if you
have enough continuity,

671
00:33:34,960 --> 00:33:37,514
it turns out that these
two factors are the same.

672
00:33:37,514 --> 00:33:39,180
But that's not the
important issue here.

673
00:33:39,180 --> 00:33:41,160
The important issue
here is that I

674
00:33:41,160 --> 00:33:45,400
can keep using the chain rule to
take higher-order derivatives.

675
00:33:45,400 --> 00:33:47,780
And even though the
notation is messier,

676
00:33:47,780 --> 00:33:51,020
this happened when we dealt with
functions of a single variable.

677
00:33:51,020 --> 00:33:52,750
Remember when we
used the chain rule

678
00:33:52,750 --> 00:33:58,640
to find dy/dx when y and x were
given, say, as functions of t?

679
00:33:58,640 --> 00:34:01,910
We could also use the chain rule
to find the second derivative

680
00:34:01,910 --> 00:34:03,380
of y with respect to x.

681
00:34:03,380 --> 00:34:06,550
But we had to be a little bit
more careful of the computation

682
00:34:06,550 --> 00:34:10,409
because certain factors crept
in that we had to keep track of.

683
00:34:10,409 --> 00:34:13,870
At any rate, again, to
illustrate this idea

684
00:34:13,870 --> 00:34:15,980
rather than to keep
droning on about it,

685
00:34:15,980 --> 00:34:19,449
let me take a particularly
simple computational problem

686
00:34:19,449 --> 00:34:20,409
to check this thing on.

687
00:34:20,409 --> 00:34:21,908
In other words,
what I'm going to do

688
00:34:21,908 --> 00:34:25,469
is take this messy formula over
here and apply it to a case

689
00:34:25,469 --> 00:34:29,929
where the arithmetic happens
to be very, very simple.

690
00:34:29,929 --> 00:34:32,070
I'm going to rig this
very, very nicely.

691
00:34:32,070 --> 00:34:35,130
I'm going to let f of x, y just
be x squared plus y squared,

692
00:34:35,130 --> 00:34:36,489
in this case.

693
00:34:36,489 --> 00:34:38,800
Let w be x squared
plus y squared.

694
00:34:38,800 --> 00:34:41,790
In polar coordinates, notice
that x squared plus y squared

695
00:34:41,790 --> 00:34:45,030
is just r squared.

696
00:34:45,030 --> 00:34:46,820
So w is just r squared.

697
00:34:46,820 --> 00:34:49,870
What is the partial of w
with respect to r, then?

698
00:34:49,870 --> 00:34:54,139
The partial of w with
respect to r is 2r.

699
00:34:54,139 --> 00:34:56,675
And if I now differentiate
that with respect to r,

700
00:34:56,675 --> 00:35:00,010
the second partial of w
with respect to r is 2.

701
00:35:00,010 --> 00:35:03,760
Obviously, one would not use
the chain rule in real life

702
00:35:03,760 --> 00:35:06,090
to find the answer to
this particular problem.

703
00:35:06,090 --> 00:35:10,340
We've chosen this problem
simply to emphasize how

704
00:35:10,340 --> 00:35:12,570
the chain rule would work here.

705
00:35:12,570 --> 00:35:15,350
At any rate, going
back here, notice

706
00:35:15,350 --> 00:35:18,000
that it's very simple to
see from this equation

707
00:35:18,000 --> 00:35:20,910
that the partial of w
with respect to x is 2x.

708
00:35:20,910 --> 00:35:24,480
Therefore, the second partial
of w with respect to x is 2.

709
00:35:24,480 --> 00:35:28,090
The partial of w with
respect to y is 2y.

710
00:35:28,090 --> 00:35:34,400
Therefore, the second partial of
w with respect to y is also 2.

711
00:35:34,400 --> 00:35:37,800
The partial of w with respect
to x is a function of x alone,

712
00:35:37,800 --> 00:35:38,870
in this case.

713
00:35:38,870 --> 00:35:42,450
Consequently, the derivative
with respect to y will be 0.

714
00:35:42,450 --> 00:35:46,510
Similarly, the partial of w with
respect to y is a function of y

715
00:35:46,510 --> 00:35:47,320
alone.

716
00:35:47,320 --> 00:35:48,970
Consequently, when
I differentiate

717
00:35:48,970 --> 00:35:51,930
that with respect to x,
meaning I'm holding y constant,

718
00:35:51,930 --> 00:35:54,190
that derivative will also be 0.

719
00:35:54,190 --> 00:36:00,670
And the interesting point
now is if I take these values

720
00:36:00,670 --> 00:36:06,430
and substitute those into
this equation, what happens?

721
00:36:06,430 --> 00:36:06,990
Look.

722
00:36:06,990 --> 00:36:10,900
The second partial of w
with respect to x is just 2.

723
00:36:10,900 --> 00:36:15,840
The second partial of w
with respect to y is just 2.

724
00:36:15,840 --> 00:36:19,700
The mixed partials are both
0, regardless of which order

725
00:36:19,700 --> 00:36:20,860
you did them in.

726
00:36:20,860 --> 00:36:22,620
That's what we saw over here.

727
00:36:22,620 --> 00:36:25,100
So consequently,
according to this recipe,

728
00:36:25,100 --> 00:36:27,840
the second partial of
w with respect to r

729
00:36:27,840 --> 00:36:32,580
is 2 cosine squared
theta plus 0 plus 2 sine

730
00:36:32,580 --> 00:36:35,390
squared theta plus 0,
where the reason I've

731
00:36:35,390 --> 00:36:37,326
written these 0's
in is simply so

732
00:36:37,326 --> 00:36:39,200
that when you're looking
at your notes later,

733
00:36:39,200 --> 00:36:42,980
that traces the analog
of these terms over here.

734
00:36:42,980 --> 00:36:47,310
At any rate, notice now
that if I add these up,

735
00:36:47,310 --> 00:36:49,740
2 cosine squared
theta plus 2 sine

736
00:36:49,740 --> 00:36:51,900
squared theta,
since sine squared

737
00:36:51,900 --> 00:36:54,330
theta plus cosine
squared theta is 1,

738
00:36:54,330 --> 00:36:57,570
this sum is just-- I'll
write that in white chalk

739
00:36:57,570 --> 00:37:00,020
just so we don't accentuate it.

740
00:37:00,020 --> 00:37:01,830
Let it just be
part of the answer.

741
00:37:01,830 --> 00:37:05,180
This is 2 plus 0, which is 2.

742
00:37:05,180 --> 00:37:09,280
And this certainly does
check with the result

743
00:37:09,280 --> 00:37:11,520
that we got the
so-called easier way.

744
00:37:11,520 --> 00:37:14,990
And again, I don't want
to leave you with the idea

745
00:37:14,990 --> 00:37:18,320
that the second way was
just the hard way of doing

746
00:37:18,320 --> 00:37:22,140
the same problem that we
did easily the first way.

747
00:37:22,140 --> 00:37:25,490
I picked a simple example so
you can see how this works.

748
00:37:25,490 --> 00:37:28,140
I'm going to have a
multitude of exercises

749
00:37:28,140 --> 00:37:31,920
for you to do in the next unit,
simply so that you'll pick up

750
00:37:31,920 --> 00:37:36,160
the kind of know-how that will
allow you to change variables

751
00:37:36,160 --> 00:37:40,750
using the chain rule, with a
minimum degree of difficulty.

752
00:37:40,750 --> 00:37:42,660
In fact, hopefully,
I would like to feel

753
00:37:42,660 --> 00:37:45,930
by the time we're through
with the next two units,

754
00:37:45,930 --> 00:37:49,280
you will be doing this
almost as second nature.

755
00:37:49,280 --> 00:37:51,950
Well, we have other
topics to consider

756
00:37:51,950 --> 00:37:55,780
in terms of our linear
approximations and the like.

757
00:37:55,780 --> 00:37:58,950
We'll talk about that more
as the course unfolds.

758
00:37:58,950 --> 00:38:02,560
For the time being, I would
like you to concentrate simply

759
00:38:02,560 --> 00:38:04,930
on mastering the chain rule.

760
00:38:04,930 --> 00:38:07,620
And so until we meet
next time, good bye.

761
00:38:12,620 --> 00:38:15,010
Funding for the
publication of this video

762
00:38:15,010 --> 00:38:19,860
was provided by the Gabriella
and Paul Rosenbaum Foundation.

763
00:38:19,860 --> 00:38:24,040
Help OCW continue to provide
free and open access to MIT

764
00:38:24,040 --> 00:38:28,435
courses by making a donation
at ocw.mit.edu/donate.