1
00:00:01,370 --> 00:00:04,250
We will now talk about
conditional expectations of

2
00:00:04,250 --> 00:00:07,280
one random variable
given another.

3
00:00:07,280 --> 00:00:11,240
As we will see, there will be
nothing new here, except for

4
00:00:11,240 --> 00:00:14,560
older results but given
in new notation.

5
00:00:14,560 --> 00:00:17,840
Any PMF has an associated
expectation.

6
00:00:17,840 --> 00:00:23,550
And so conditional PMFs also
have associated expectations,

7
00:00:23,550 --> 00:00:26,420
which we call conditional
expectations.

8
00:00:26,420 --> 00:00:29,630
We have already seen them for
the case where we condition on

9
00:00:29,630 --> 00:00:31,400
an event, A.

10
00:00:31,400 --> 00:00:34,850
The case where we condition
on random variables

11
00:00:34,850 --> 00:00:37,440
is exactly the same.

12
00:00:37,440 --> 00:00:42,660
We let the event, A, be the
event that Y takes on a

13
00:00:42,660 --> 00:00:45,970
specific value.

14
00:00:45,970 --> 00:00:50,780
And then we calculate the
expectation using the relevant

15
00:00:50,780 --> 00:00:53,500
conditional probabilities, those
that are given by the

16
00:00:53,500 --> 00:00:55,010
conditional PMF.

17
00:00:55,010 --> 00:00:59,380
So the conditional expectation
of X given that Y takes on a

18
00:00:59,380 --> 00:01:04,920
certain value is defined as the
usual expectation, except

19
00:01:04,920 --> 00:01:08,710
that we use the conditional
probabilities that apply given

20
00:01:08,710 --> 00:01:14,289
that Y takes on a specific
value little y.

21
00:01:14,289 --> 00:01:17,460
Recall now the expected value
rule for ordinary

22
00:01:17,460 --> 00:01:19,300
expectations.

23
00:01:19,300 --> 00:01:22,480
And also the Expected Value
Rule for conditional

24
00:01:22,480 --> 00:01:25,310
expectations given an event,
something that we

25
00:01:25,310 --> 00:01:27,630
have already seen.

26
00:01:27,630 --> 00:01:32,640
Now, in PMF notation, the
expected value rule takes a

27
00:01:32,640 --> 00:01:34,550
similar form.

28
00:01:34,550 --> 00:01:39,259
The event, A is replaced by
the specific event that Y

29
00:01:39,259 --> 00:01:41,490
takes on a specific value.

30
00:01:41,490 --> 00:01:44,620
And in that case, the
conditional PMF given the

31
00:01:44,620 --> 00:01:48,310
event A is just the conditional
PMF given that

32
00:01:48,310 --> 00:01:54,550
random variable Y takes on a
specific value, little y.

33
00:01:54,550 --> 00:01:58,210
For the case where we condition
on events, we also

34
00:01:58,210 --> 00:02:02,000
developed a version of the total
probability theorem and

35
00:02:02,000 --> 00:02:05,910
the total expectation theorem.

36
00:02:05,910 --> 00:02:09,360
We can do the same when we
condition on random variables.

37
00:02:09,360 --> 00:02:12,650
So suppose that the sample space
has been partitioned

38
00:02:12,650 --> 00:02:15,910
into n, disjoint scenarios.

39
00:02:15,910 --> 00:02:19,579
The total probability theorem
tells us that the probability

40
00:02:19,579 --> 00:02:26,190
of the event that random
variable X takes on a value

41
00:02:26,190 --> 00:02:30,610
little x, can be found by taking
the probabilities of

42
00:02:30,610 --> 00:02:35,079
this event under each one of
the possible scenarios.

43
00:02:35,079 --> 00:02:38,870
And then weighing those
probabilities according to the

44
00:02:38,870 --> 00:02:43,390
probabilities of the different
scenarios.

45
00:02:43,390 --> 00:02:47,160
Now, suppose that we are dealing
with a random variable

46
00:02:47,160 --> 00:02:52,995
that takes values in a set
consisting of n elements.

47
00:02:57,300 --> 00:03:03,550
And let us consider scenarios
Ai, the i-th scenario is the

48
00:03:03,550 --> 00:03:06,750
event that the random variable
Y takes on the

49
00:03:06,750 --> 00:03:10,270
i-th possible value.

50
00:03:10,270 --> 00:03:12,070
We can apply the total
probability

51
00:03:12,070 --> 00:03:14,550
theorem to this situation.

52
00:03:14,550 --> 00:03:17,590
We can find the probability
that the random variable X

53
00:03:17,590 --> 00:03:21,680
takes on a certain value, little
x, by considering the

54
00:03:21,680 --> 00:03:25,320
probability of this event
happening under each possible

55
00:03:25,320 --> 00:03:30,640
scenario, where a scenario is
that Y took on a specific

56
00:03:30,640 --> 00:03:35,620
value, and then weigh those
probabilities according to the

57
00:03:35,620 --> 00:03:37,770
probabilities of the different
scenarios.

58
00:03:40,280 --> 00:03:41,670
The story with the total

59
00:03:41,670 --> 00:03:45,190
expectation theorem is similar.

60
00:03:45,190 --> 00:03:48,579
We know that an expectation
can be found by taking the

61
00:03:48,579 --> 00:03:52,630
conditional expectations under
each one of the scenarios and

62
00:03:52,630 --> 00:03:55,520
weighing them according to
the probabilities of

63
00:03:55,520 --> 00:03:58,280
the different scenarios.

64
00:03:58,280 --> 00:04:03,650
Again, let the event that Y
takes on a specific value be a

65
00:04:03,650 --> 00:04:06,490
different scenario.

66
00:04:06,490 --> 00:04:10,730
And with this correspondence
we obtain the following

67
00:04:10,730 --> 00:04:14,830
version of the total expectation
theorem.

68
00:04:14,830 --> 00:04:16,940
We have a sum of different
terms.

69
00:04:16,940 --> 00:04:20,660
And each term in the sum is
the probability of a given

70
00:04:20,660 --> 00:04:25,200
scenario times the expected
value of X under this

71
00:04:25,200 --> 00:04:26,450
particular scenario.

72
00:04:28,910 --> 00:04:31,460
At this point, I have to
add a comment of a more

73
00:04:31,460 --> 00:04:33,270
mathematical flavor.

74
00:04:33,270 --> 00:04:36,650
We have been talking about a
partition of the sample space

75
00:04:36,650 --> 00:04:41,520
into finitely many scenarios.

76
00:04:41,520 --> 00:04:46,960
But if Y takes on values in a
discrete but infinite set, for

77
00:04:46,960 --> 00:04:52,420
example, if Y can take on any
integer value, the argument

78
00:04:52,420 --> 00:04:55,580
that we have given is
not quite complete.

79
00:04:55,580 --> 00:04:59,820
Fortunately, the total
probability theorem and the

80
00:04:59,820 --> 00:05:05,150
total expectation theorem, they
both remain true, even

81
00:05:05,150 --> 00:05:10,710
for the case where Y ranges over
an infinite set as long

82
00:05:10,710 --> 00:05:16,310
as the random variable X has
a well-defined expectation.

83
00:05:16,310 --> 00:05:19,640
For the total probability
theorem, the proof for the

84
00:05:19,640 --> 00:05:23,850
general case can be carried
out without a lot of

85
00:05:23,850 --> 00:05:28,020
difficulty, just using the
countable additivity axiom.

86
00:05:28,020 --> 00:05:31,590
However, for the total
expectation theorem, it takes

87
00:05:31,590 --> 00:05:34,470
some harder mathematical work.

88
00:05:34,470 --> 00:05:36,620
And this is beyond our scope.

89
00:05:36,620 --> 00:05:39,800
But we will just take this fact
for granted, that the

90
00:05:39,800 --> 00:05:43,540
total expectation theorem
carries over to the case where

91
00:05:43,540 --> 00:05:47,680
we're adding over an infinite
sequence of possible values of

92
00:05:47,680 --> 00:05:48,870
Y.

93
00:05:48,870 --> 00:05:51,830
In the rest of the course we
will often use the total

94
00:05:51,830 --> 00:05:56,520
expectation theorem, including
in cases where Y ranges over

95
00:05:56,520 --> 00:05:58,960
an infinite discrete set.

96
00:05:58,960 --> 00:06:03,110
In fact, we will see that this
theorem is an extremely useful

97
00:06:03,110 --> 00:06:05,670
tool that can be used
to divide and

98
00:06:05,670 --> 00:06:07,670
conquer complicated models.