1
00:00:00,380 --> 00:00:04,070
If you remember our discussion
from a long time ago, we said

2
00:00:04,070 --> 00:00:06,790
that much of this class consists
of variations of a

3
00:00:06,790 --> 00:00:10,540
few basic skills and ideas,
one of which is the Bayes

4
00:00:10,540 --> 00:00:13,670
rule, the foundation
of inference.

5
00:00:13,670 --> 00:00:16,580
So let's look here at the
Bayes rule again and its

6
00:00:16,580 --> 00:00:18,670
different incarnations.

7
00:00:18,670 --> 00:00:22,530
In a discrete setting we have a
random variable with a known

8
00:00:22,530 --> 00:00:26,210
PMF but whose values
are not observed.

9
00:00:26,210 --> 00:00:29,450
Instead we observe the value
of another random variable,

10
00:00:29,450 --> 00:00:33,040
call it Y, which has some
relation with X.

11
00:00:33,040 --> 00:00:36,800
And we will use the value of Y
to make some inferences about

12
00:00:36,800 --> 00:00:40,570
X. The relation between the
two random variables is

13
00:00:40,570 --> 00:00:45,390
captured by specifying the
conditional PMF of Y given any

14
00:00:45,390 --> 00:00:49,690
value of X. Think of X as an
unknown state of the world and

15
00:00:49,690 --> 00:00:54,530
of Y as a noisy observation of
X. The conditional PMF tells

16
00:00:54,530 --> 00:00:57,840
us the distribution of
Y under each possible

17
00:00:57,840 --> 00:00:59,970
state of the world.

18
00:00:59,970 --> 00:01:03,420
Once we observe the value of Y
we obtain some information

19
00:01:03,420 --> 00:01:06,990
about X. And we use this
information to make inferences

20
00:01:06,990 --> 00:01:11,070
about the likely values of X.
Mathematically, instead of

21
00:01:11,070 --> 00:01:16,820
relying on the prior for X, we
form some revised beliefs.

22
00:01:16,820 --> 00:01:19,280
That is, we form the
conditional [PMF]

23
00:01:19,280 --> 00:01:24,130
of X given the particular
observation that we have seen.

24
00:01:24,130 --> 00:01:27,370
All this becomes possible
because of the Bayes rule.

25
00:01:27,370 --> 00:01:29,400
We have seen the Bayes
rule for events.

26
00:01:29,400 --> 00:01:33,090
But it is easy to translate
into PMF notation.

27
00:01:33,090 --> 00:01:35,140
We take the multiplication
rule.

28
00:01:35,140 --> 00:01:38,990
And we use it twice in different
orders to get two

29
00:01:38,990 --> 00:01:40,610
different forms--

30
00:01:40,610 --> 00:01:42,090
or two different expressions--

31
00:01:42,090 --> 00:01:44,479
for the joint PMF.

32
00:01:44,479 --> 00:01:48,990
We then take one of the terms
involved here and send it to

33
00:01:48,990 --> 00:01:50,710
the other side.

34
00:01:50,710 --> 00:01:54,100
We obtain this expression,
which is the Bayes rule.

35
00:01:54,100 --> 00:01:55,560
What [do] we have here?

36
00:01:55,560 --> 00:01:59,630
We want to calculate the
conditional distribution of X

37
00:01:59,630 --> 00:02:01,920
which we typically call
the posterior.

38
00:02:07,600 --> 00:02:13,860
And to do this we rely on the
prior of X as well as on the

39
00:02:13,860 --> 00:02:16,740
model that we have for
the observations.

40
00:02:16,740 --> 00:02:21,660
The denominator requires us to
compute the marginal of Y. But

41
00:02:21,660 --> 00:02:25,350
this is something that is easily
done because we have

42
00:02:25,350 --> 00:02:27,250
the joint available.

43
00:02:27,250 --> 00:02:31,190
The numerator, this expression
here, is just the joint PMF.

44
00:02:31,190 --> 00:02:34,790
And using the joint PMF
you can always find

45
00:02:34,790 --> 00:02:36,770
the marginal PMF.

46
00:02:36,770 --> 00:02:39,910
Essentially, we're using here
the total probability theorem.

47
00:02:39,910 --> 00:02:43,160
And we're using the pieces of
information that were given to

48
00:02:43,160 --> 00:02:47,600
us, the prior and the model
of the observations.

49
00:02:47,600 --> 00:02:50,020
When we're dealing with
continuous random variables

50
00:02:50,020 --> 00:02:51,690
the story is identical.

51
00:02:51,690 --> 00:02:54,880
We still have two versions of
the multiplication rule.

52
00:02:54,880 --> 00:02:56,810
By sending one term--

53
00:02:56,810 --> 00:02:57,570
this term--

54
00:02:57,570 --> 00:03:00,270
to the other side of
the equation we

55
00:03:00,270 --> 00:03:02,150
get the Bayes rule.

56
00:03:02,150 --> 00:03:05,070
And then we use the total
probability theorem to

57
00:03:05,070 --> 00:03:07,600
calculate the denominator
term.

58
00:03:07,600 --> 00:03:11,860
So as far as mathematics go,
the story is pretty simple.

59
00:03:11,860 --> 00:03:14,500
It is exactly the same
in the discrete and

60
00:03:14,500 --> 00:03:15,740
the continuous case.

61
00:03:15,740 --> 00:03:18,440
This story will be our stepping
stone for dealing

62
00:03:18,440 --> 00:03:22,270
with more complex models and
also when we go into more

63
00:03:22,270 --> 00:03:24,300
detail on the subject
of inference

64
00:03:24,300 --> 00:03:25,550
later in this course.