1
00:00:01,640 --> 00:00:04,040
The following content is
provided under a Creative

2
00:00:04,040 --> 00:00:05,580
Commons license.

3
00:00:05,580 --> 00:00:07,880
Your support will help
MIT OpenCourseWare

4
00:00:07,880 --> 00:00:12,270
continue to offer high-quality,
educational resources for free.

5
00:00:12,270 --> 00:00:14,870
To make a donation or
view additional materials

6
00:00:14,870 --> 00:00:18,830
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,830 --> 00:00:22,400
at ocw.mit.edu.

8
00:00:22,400 --> 00:00:24,590
TOMER ULLMAN: And today,
with your active help

9
00:00:24,590 --> 00:00:27,470
and participation, I hope to
run a probabilistic programming

10
00:00:27,470 --> 00:00:30,459
tutorial in the time
that we have left.

11
00:00:30,459 --> 00:00:32,000
And we're going to
focus specifically

12
00:00:32,000 --> 00:00:33,874
on a language
called Church, which

13
00:00:33,874 --> 00:00:36,290
is a probabilistic programming
language that was developed

14
00:00:36,290 --> 00:00:39,020
in Josh Tenenbaum's
Group, but it's

15
00:00:39,020 --> 00:00:41,600
now taken on a life of
its own and has set up

16
00:00:41,600 --> 00:00:44,510
shop in other places.

17
00:00:44,510 --> 00:00:46,010
Before I get started,
I should say,

18
00:00:46,010 --> 00:00:47,720
I was, sort of, looking
for a good image.

19
00:00:47,720 --> 00:00:49,400
I didn't like a
blank page, so I was

20
00:00:49,400 --> 00:00:51,237
googling just Church tutorial.

21
00:00:51,237 --> 00:00:52,570
This is the first thing I found.

22
00:00:52,570 --> 00:00:54,290
It's an image for
Minecraft about how

23
00:00:54,290 --> 00:00:56,030
to build a church in Minecraft.

24
00:00:56,030 --> 00:00:56,960
Does any of us--

25
00:00:56,960 --> 00:00:58,375
people have heard of Minecraft?

26
00:00:58,375 --> 00:00:59,000
AUDIENCE: Yeah.

27
00:00:59,000 --> 00:00:59,900
TOMER ULLMAN: They've
played with Minecraft?

28
00:00:59,900 --> 00:01:01,730
OK-- just in case
you don't know,

29
00:01:01,730 --> 00:01:04,959
Minecraft is a sort of
procedurally-generated world

30
00:01:04,959 --> 00:01:06,932
where you get some
building blocks, literally,

31
00:01:06,932 --> 00:01:08,890
building blocks, that
you can build stuff with.

32
00:01:08,890 --> 00:01:11,330
And you can build an infinite
number of things, including

33
00:01:11,330 --> 00:01:13,239
a computer, and a church.

34
00:01:13,239 --> 00:01:14,030
And it's very cool.

35
00:01:14,030 --> 00:01:15,530
And I thought it's
actually not that

36
00:01:15,530 --> 00:01:18,542
bad of an image for a tutorial
about probabilistic programming

37
00:01:18,542 --> 00:01:20,000
language, which is
also about, sort

38
00:01:20,000 --> 00:01:22,160
of, procedurally-generative
things that

39
00:01:22,160 --> 00:01:24,984
use small building blocks
to build up an entire world.

40
00:01:24,984 --> 00:01:26,900
And I thought, OK, that's
the first hit I got.

41
00:01:26,900 --> 00:01:28,010
What's the other hit?

42
00:01:28,010 --> 00:01:29,820
Well, it's just another
church, and another church,

43
00:01:29,820 --> 00:01:31,340
and another church, another
church, another church

44
00:01:31,340 --> 00:01:34,080
that you can build in Minecraft
from many different angles

45
00:01:34,080 --> 00:01:36,620
and many different tutorials,
so maybe, instead of this,

46
00:01:36,620 --> 00:01:38,870
you can just train some
deep-learning algorithm

47
00:01:38,870 --> 00:01:43,110
to, I don't know, learn a
billion churches and do that.

48
00:01:43,110 --> 00:01:45,000
That's not what we're after.

49
00:01:45,000 --> 00:01:47,240
So probabilistic
programming, Josh already

50
00:01:47,240 --> 00:01:48,887
talked a bunch
about this, so I'll

51
00:01:48,887 --> 00:01:50,720
sort of be repeating
him, or channeling him.

52
00:01:50,720 --> 00:01:53,245
It's about combining
the best of both worlds

53
00:01:53,245 --> 00:01:54,620
in the, sort of,
two states of AI

54
00:01:54,620 --> 00:01:58,354
right now, which is
statistical modeling and logic.

55
00:01:58,354 --> 00:01:59,770
And in many models,
you have this,

56
00:01:59,770 --> 00:02:02,854
sort of, dual question of
representation and learning.

57
00:02:02,854 --> 00:02:05,270
And it's really, sort of, a
problem for cognitive science,

58
00:02:05,270 --> 00:02:07,687
going back to the days of
before cognitive science, right?

59
00:02:07,687 --> 00:02:10,020
I mean, this is the sort of
problem that a lot of people

60
00:02:10,020 --> 00:02:12,170
had when they tried to
model the human mind.

61
00:02:12,170 --> 00:02:13,330
This goes back to Turing.

62
00:02:13,330 --> 00:02:15,890
Sort of, when we want
to build a system that

63
00:02:15,890 --> 00:02:18,380
is human-like in
its intelligence,

64
00:02:18,380 --> 00:02:19,880
the two questions
that we face are,

65
00:02:19,880 --> 00:02:21,740
what are the representations
that it will have,

66
00:02:21,740 --> 00:02:23,114
and how is it
going to learn them

67
00:02:23,114 --> 00:02:25,190
or how is it going to
learn anything new?

68
00:02:25,190 --> 00:02:26,210
And you often have
to, sort of-- it's

69
00:02:26,210 --> 00:02:27,440
a short blanket problem, right?

70
00:02:27,440 --> 00:02:28,880
If you try to cover
your head, your feet

71
00:02:28,880 --> 00:02:29,765
are sort of not
getting anything.

72
00:02:29,765 --> 00:02:30,950
If you try to cover
your feet, your head's

73
00:02:30,950 --> 00:02:32,120
not getting anything.

74
00:02:32,120 --> 00:02:34,250
Because oftentimes, you
find that, if you stick

75
00:02:34,250 --> 00:02:36,110
to a particularly easy
representation that's

76
00:02:36,110 --> 00:02:39,080
sort of easy to code
or rather something,

77
00:02:39,080 --> 00:02:41,390
a kind of a presentation
that's easy to learn,

78
00:02:41,390 --> 00:02:44,000
like say a vector of weights
that you're just trying

79
00:02:44,000 --> 00:02:47,600
to shift your weights
around, then, yes, that

80
00:02:47,600 --> 00:02:49,520
might be easy, relatively
easy, but you're

81
00:02:49,520 --> 00:02:51,155
sort of stuck with
the representation

82
00:02:51,155 --> 00:02:53,270
that you can learn are weights.

83
00:02:53,270 --> 00:02:55,310
Or Josh was making a
big point about this,

84
00:02:55,310 --> 00:02:57,770
and it is a big point, that
if you try to learn something

85
00:02:57,770 --> 00:03:01,190
like causal Bayes nets,
then you're sort of limited

86
00:03:01,190 --> 00:03:02,242
by that representation.

87
00:03:02,242 --> 00:03:03,950
That is your representation
of these sort

88
00:03:03,950 --> 00:03:07,010
of circles and arrows that
go into other circles.

89
00:03:07,010 --> 00:03:09,302
And that might get
you very, very far.

90
00:03:09,302 --> 00:03:11,510
And you might even have very
good learning algorithms

91
00:03:11,510 --> 00:03:14,000
for those particular
models, for those particular

92
00:03:14,000 --> 00:03:15,560
representations,
that are tailored

93
00:03:15,560 --> 00:03:17,000
for those representations.

94
00:03:17,000 --> 00:03:19,040
Like, in these causal
circles and arrows,

95
00:03:19,040 --> 00:03:22,852
belief propagation might be a
very good learning algorithm,

96
00:03:22,852 --> 00:03:24,560
but if you commit to
that representation,

97
00:03:24,560 --> 00:03:26,750
then you are sort of stuck
with that representation.

98
00:03:26,750 --> 00:03:28,250
And you might not
be flexible enough

99
00:03:28,250 --> 00:03:30,530
to learn all the stuff
that you want to.

100
00:03:30,530 --> 00:03:33,320
And a very flexible
representation, sort of,

101
00:03:33,320 --> 00:03:35,060
one of the more
flexible ones that

102
00:03:35,060 --> 00:03:36,860
have come onto the
scene in the past years

103
00:03:36,860 --> 00:03:39,442
is why don't we try
to learn a program?

104
00:03:39,442 --> 00:03:41,400
I say it's come onto the
scene in recent years.

105
00:03:41,400 --> 00:03:42,400
That's not exactly true.

106
00:03:42,400 --> 00:03:44,705
People have been interested
in learning programs

107
00:03:44,705 --> 00:03:46,580
for many, many years,
for many, many decades,

108
00:03:46,580 --> 00:03:48,350
but they sort of
try to infer them

109
00:03:48,350 --> 00:03:49,880
from kind of a
logical perspective,

110
00:03:49,880 --> 00:03:51,680
not really getting these
probabilistic learning

111
00:03:51,680 --> 00:03:52,070
algorithms.

112
00:03:52,070 --> 00:03:53,611
I'm sort of throwing
words out there,

113
00:03:53,611 --> 00:03:55,600
but it'll make more
sense as I go through it.

114
00:03:55,600 --> 00:03:57,530
And you already have
some of Josh's stuff

115
00:03:57,530 --> 00:03:58,470
to carry you through.

116
00:03:58,470 --> 00:04:00,470
But the point is, there's
always these questions

117
00:04:00,470 --> 00:04:02,134
of learning and representation.

118
00:04:02,134 --> 00:04:03,800
For probabilistic
programming languages,

119
00:04:03,800 --> 00:04:05,930
the representation is
not circles and arrows,

120
00:04:05,930 --> 00:04:09,132
it's not vectors of
weights, it is programs.

121
00:04:09,132 --> 00:04:10,590
That's what you're
trying to learn.

122
00:04:10,590 --> 00:04:12,810
That's what you're trying to
figure out the world with.

123
00:04:12,810 --> 00:04:14,226
And then there's
a question of how

124
00:04:14,226 --> 00:04:18,709
do you learn these programs,
but we'll get to that.

125
00:04:18,709 --> 00:04:22,646
OK, let's see, so we think it's
a good representation for AI

126
00:04:22,646 --> 00:04:24,020
and cognition for
all the reasons

127
00:04:24,020 --> 00:04:25,922
that Josh just talked about.

128
00:04:25,922 --> 00:04:27,380
And there's been
a growing interest

129
00:04:27,380 --> 00:04:29,450
in these things for
the past 10 years,

130
00:04:29,450 --> 00:04:31,460
witnessed both by
the proliferation

131
00:04:31,460 --> 00:04:33,880
of many, many different types
of programming languages--

132
00:04:33,880 --> 00:04:36,690
sorry, probabilistic
programming languages.

133
00:04:36,690 --> 00:04:38,864
I don't know whether to
call them PPL, or people,

134
00:04:38,864 --> 00:04:39,530
or what exactly.

135
00:04:39,530 --> 00:04:41,210
But probabilistic
programming languages,

136
00:04:41,210 --> 00:04:42,980
there's been PyMC
based on Python,

137
00:04:42,980 --> 00:04:44,396
there's Church,
which you're going

138
00:04:44,396 --> 00:04:49,100
to play with right now, but
also BLOG, WinBUGS, ProbLog,

139
00:04:49,100 --> 00:04:51,787
Venture, many others that
I haven't mentioned here.

140
00:04:51,787 --> 00:04:53,370
So first of all,
there's many of them.

141
00:04:53,370 --> 00:04:55,250
And also, DARPA has
started taking interest

142
00:04:55,250 --> 00:04:59,080
and has given a large grant
to advance this field.

143
00:04:59,080 --> 00:05:00,602
They think it might be big.

144
00:05:00,602 --> 00:05:02,810
If you're in, sort of,
probabilistic programming more

145
00:05:02,810 --> 00:05:05,309
generally than Church, you think
it's interesting to follow,

146
00:05:05,309 --> 00:05:07,940
you want to learn more about
it, you should go to this thing,

147
00:05:07,940 --> 00:05:11,222
probabilistic-programming.org
wiki.

148
00:05:11,222 --> 00:05:12,680
It sort of keeps
it up-to-date with

149
00:05:12,680 --> 00:05:15,267
many, many, many, different
types of programming languages.

150
00:05:15,267 --> 00:05:17,600
You don't necessarily have
to write this down right now.

151
00:05:17,600 --> 00:05:18,980
I will send you
the slides later,

152
00:05:18,980 --> 00:05:21,110
but just, sort of,
keep it in mind form,

153
00:05:21,110 --> 00:05:23,400
link to it in your head.

154
00:05:23,400 --> 00:05:25,860
There's also this, sort of,
nice summary from this DARPA--

155
00:05:25,860 --> 00:05:28,310
so DARPA started sending
this about a year ago.

156
00:05:28,310 --> 00:05:30,102
And someone already
went to a summer school

157
00:05:30,102 --> 00:05:31,851
on probabilistic
programming and, sort of,

158
00:05:31,851 --> 00:05:33,110
wrote the state of the field.

159
00:05:33,110 --> 00:05:34,110
It's six months ago.

160
00:05:34,110 --> 00:05:36,814
It's a bit outdated,
but it also makes

161
00:05:36,814 --> 00:05:38,480
for an interesting
read for those of you

162
00:05:38,480 --> 00:05:40,134
who want to follow that.

163
00:05:40,134 --> 00:05:42,050
OK, so that's about
probabilistic programming,

164
00:05:42,050 --> 00:05:43,800
very, very, very generally.

165
00:05:43,800 --> 00:05:45,920
What about Church very,
very, very generally?

166
00:05:45,920 --> 00:05:48,470
So as I said, Church
is one example

167
00:05:48,470 --> 00:05:50,144
of a probabilistic
programming language.

168
00:05:50,144 --> 00:05:51,560
It was developed
by several people

169
00:05:51,560 --> 00:05:53,840
at MIT who have
since gone on to do

170
00:05:53,840 --> 00:05:55,610
other different
things like continue

171
00:05:55,610 --> 00:05:58,040
to develop Church at Stanford.

172
00:05:58,040 --> 00:05:59,950
That's Professor Noah Goodman.

173
00:05:59,950 --> 00:06:02,640
Although, of course, he's
doing many, many other things.

174
00:06:02,640 --> 00:06:04,460
There's also been
Vikash Mansinghka,

175
00:06:04,460 --> 00:06:06,980
who has gone on to develop
other probabilistic programming

176
00:06:06,980 --> 00:06:11,330
languages like Venture at MIT.

177
00:06:11,330 --> 00:06:14,450
And one thing to say generally
about probabilistic programming

178
00:06:14,450 --> 00:06:16,430
languages is that,
usually they are

179
00:06:16,430 --> 00:06:18,350
based on an already
existing language.

180
00:06:18,350 --> 00:06:21,350
So you take MATLAB and you
try to make it probabilistic.

181
00:06:21,350 --> 00:06:24,920
You take Python and you try
to make it probabilistic.

182
00:06:24,920 --> 00:06:27,580
Julia has a probabilistic
programming implementation.

183
00:06:27,580 --> 00:06:30,050
Church in particular
is based on Scheme,

184
00:06:30,050 --> 00:06:32,510
which is the derivative
of LISP, which is itself

185
00:06:32,510 --> 00:06:35,452
sort of an attempt to capture
lambda calculus, which is not

186
00:06:35,452 --> 00:06:37,160
a programming language,
it is an approach

187
00:06:37,160 --> 00:06:41,750
to trying to think about all
possible functions developed

188
00:06:41,750 --> 00:06:43,400
by Alonzo Church.

189
00:06:43,400 --> 00:06:45,320
And that's why Church
is called Church.

190
00:06:45,320 --> 00:06:48,930
It has nothing to do with
the actual buildings.

191
00:06:48,930 --> 00:06:51,830
So the point about
Scheme which is very nice

192
00:06:51,830 --> 00:06:53,240
is that it's very compositional.

193
00:06:53,240 --> 00:06:54,740
And anything that
you write can then

194
00:06:54,740 --> 00:06:57,230
be passed off into the
other functions as the data.

195
00:06:57,230 --> 00:06:59,390
You'll see some
examples of that.

196
00:06:59,390 --> 00:07:01,001
Church has several
inference engines

197
00:07:01,001 --> 00:07:02,000
that you can try to run.

198
00:07:02,000 --> 00:07:03,260
We'll get into that.

199
00:07:03,260 --> 00:07:06,020
The backbone of it is
Metropolis-Hastings-type

200
00:07:06,020 --> 00:07:08,390
sampling over possible
programs, but it

201
00:07:08,390 --> 00:07:11,815
has other types of programming,
including explicit enumeration.

202
00:07:11,815 --> 00:07:13,190
If your space is
small enough, it

203
00:07:13,190 --> 00:07:16,190
can just look at all the
possible ways to run a program.

204
00:07:16,190 --> 00:07:17,500
It has rejection query.

205
00:07:17,500 --> 00:07:18,547
Again, we'll get to this.

206
00:07:18,547 --> 00:07:20,630
Don't worry about, like,
what is he talking about.

207
00:07:23,427 --> 00:07:25,010
Yeah, so it has a
whole bunch of-- you

208
00:07:25,010 --> 00:07:27,300
know, particle filtering
is one attempt at that.

209
00:07:27,300 --> 00:07:29,270
But the point is there are--

210
00:07:29,270 --> 00:07:30,980
each probabilistic
programming language

211
00:07:30,980 --> 00:07:33,770
has its own set of
inference engine.

212
00:07:33,770 --> 00:07:36,000
Some of them try to go the
Metropolis-Hastings route.

213
00:07:36,000 --> 00:07:37,760
Some of them try
to say, well, it's

214
00:07:37,760 --> 00:07:39,301
a probabilistic
programming language,

215
00:07:39,301 --> 00:07:41,414
but it's really limited
to causal Bayes nets,

216
00:07:41,414 --> 00:07:42,830
so the inference
engines are going

217
00:07:42,830 --> 00:07:45,080
to be stuff that's good
for causal Bayes nets.

218
00:07:45,080 --> 00:07:49,530
But all of them sort
of share this dream of,

219
00:07:49,530 --> 00:07:53,870
it's easier to write the forward
model than the inference.

220
00:07:53,870 --> 00:07:54,950
And it's really annoying.

221
00:07:54,950 --> 00:07:57,650
Those of you who have ever tried
to write an inference engine

222
00:07:57,650 --> 00:07:59,942
or to write inference
over any sort of model,

223
00:07:59,942 --> 00:08:01,400
it's really annoying
to write that.

224
00:08:01,400 --> 00:08:03,290
And it usually sort of only
works for the one thing

225
00:08:03,290 --> 00:08:05,180
that you've built.
And one of the selling

226
00:08:05,180 --> 00:08:07,100
points of probabilistic
programming languages,

227
00:08:07,100 --> 00:08:08,630
one of the reasons
that DARPA took

228
00:08:08,630 --> 00:08:11,150
an interest, beyond the fact
that they can try to capture

229
00:08:11,150 --> 00:08:14,240
the human mind, and flexible
AI, and all that, is they

230
00:08:14,240 --> 00:08:16,760
have this sort of promise,
this pitch that, why don't you

231
00:08:16,760 --> 00:08:18,830
just write down
the forward model,

232
00:08:18,830 --> 00:08:21,530
how you think the world works,
and we'll, kind of, take

233
00:08:21,530 --> 00:08:24,350
care of inference for you.

234
00:08:24,350 --> 00:08:26,090
And in many cases,
it turns out to be

235
00:08:26,090 --> 00:08:28,549
a lot easier to write
the forward model

236
00:08:28,549 --> 00:08:30,590
than to try to write the
inference engine for it.

237
00:08:30,590 --> 00:08:32,659
In fact, you can
very quickly get

238
00:08:32,659 --> 00:08:36,500
to something that's even, like,
five or six lines of code long,

239
00:08:36,500 --> 00:08:39,854
that would be intractable,
would be very hard to write down

240
00:08:39,854 --> 00:08:41,270
the analytic
expression for, would

241
00:08:41,270 --> 00:08:43,070
be very hard to think
about what would

242
00:08:43,070 --> 00:08:46,010
be the inference engine for, but
it's really just easy to write.

243
00:08:46,010 --> 00:08:47,305
I mean, all you have is
a set of assumptions.

244
00:08:47,305 --> 00:08:49,490
And you're trying to figure
out how they work together.

245
00:08:49,490 --> 00:08:51,180
Again, we'll see some
examples of that.

246
00:08:51,180 --> 00:08:53,380
But my point was all
probabilistic programming

247
00:08:53,380 --> 00:08:55,880
languages are about writing the
forward model and then, sort

248
00:08:55,880 --> 00:08:58,942
of, trying to do the
inference for you.

249
00:08:58,942 --> 00:09:00,650
Another point about
Church in particular,

250
00:09:00,650 --> 00:09:02,675
it is under construction,
so you'll notice

251
00:09:02,675 --> 00:09:03,800
this when you write it now.

252
00:09:03,800 --> 00:09:04,580
It will break.

253
00:09:04,580 --> 00:09:05,430
It will freeze.

254
00:09:05,430 --> 00:09:08,810
It will do all sorts
of annoying things,

255
00:09:08,810 --> 00:09:10,560
so it is under construction.

256
00:09:10,560 --> 00:09:12,976
It's not exactly something
that you would then go and work

257
00:09:12,976 --> 00:09:14,330
with like MATLAB.

258
00:09:14,330 --> 00:09:17,570
Let me put some caveats on
that caveat, which is these two

259
00:09:17,570 --> 00:09:18,680
asterisks right here.

260
00:09:18,680 --> 00:09:21,200
First of all, despite being
a, sort of, a toy language,

261
00:09:21,200 --> 00:09:23,660
it's already been used in
several serious scientific

262
00:09:23,660 --> 00:09:25,760
papers, including
a paper in Science,

263
00:09:25,760 --> 00:09:29,370
because it is very easy to make
certain points about cognition

264
00:09:29,370 --> 00:09:31,790
or about computational
cognition in Church that

265
00:09:31,790 --> 00:09:34,400
is very hard to do in
certain other languages.

266
00:09:34,400 --> 00:09:37,370
In particular, things that
require recursion, or inference

267
00:09:37,370 --> 00:09:39,675
over inference, where you
write down sort of the way

268
00:09:39,675 --> 00:09:41,300
that you think about
an agent, then you

269
00:09:41,300 --> 00:09:43,130
put that into
another agent, that

270
00:09:43,130 --> 00:09:45,260
can be very hard to write
in certain languages.

271
00:09:45,260 --> 00:09:49,110
Church can kind of
do that more easily.

272
00:09:49,110 --> 00:09:52,337
Let's see, I had another
caveat, which is--

273
00:09:52,337 --> 00:09:52,920
what was that?

274
00:09:52,920 --> 00:09:54,680
Oh, another caveat
is that, despite it

275
00:09:54,680 --> 00:09:56,600
being under construction, you
sort of think, well, why should

276
00:09:56,600 --> 00:09:57,641
I worry about this thing?

277
00:09:57,641 --> 00:09:59,770
Why should I even
bother hacking with it?

278
00:09:59,770 --> 00:10:03,470
Is because, you'll notice
there's probmods.org.

279
00:10:03,470 --> 00:10:05,530
And there are just a
ton, a ton of examples.

280
00:10:05,530 --> 00:10:08,060
There's a semester
worth of examples

281
00:10:08,060 --> 00:10:11,380
of all sorts of things from
both cognition, and AI,

282
00:10:11,380 --> 00:10:13,190
and interesting
statistical models

283
00:10:13,190 --> 00:10:16,449
that are very easy to
understand in Church.

284
00:10:16,449 --> 00:10:17,990
And for me at least,
it was very much

285
00:10:17,990 --> 00:10:20,990
a process of demystification
that something like this

286
00:10:20,990 --> 00:10:21,650
can help with.

287
00:10:21,650 --> 00:10:23,858
You learn about something
like the Chinese restaurant

288
00:10:23,858 --> 00:10:25,889
process, the Dirichlet
process, nonparametrics,

289
00:10:25,889 --> 00:10:28,430
and it's kind of hard to read
the textbook description of it.

290
00:10:28,430 --> 00:10:29,670
It's hard to wrap
your head around.

291
00:10:29,670 --> 00:10:31,430
And then you go and you
write three lines of code,

292
00:10:31,430 --> 00:10:32,346
or five lines of code.

293
00:10:32,346 --> 00:10:35,187
And you think, oh, that
wasn't so bad, right?

294
00:10:35,187 --> 00:10:36,770
And it's sort of
easy to write a bunch

295
00:10:36,770 --> 00:10:39,050
of these things in Church,
so it's a useful tool

296
00:10:39,050 --> 00:10:40,640
for demystification.

297
00:10:40,640 --> 00:10:42,140
It's a useful tool
to get a handle

298
00:10:42,140 --> 00:10:44,730
on certain models in
cognition and statistics,

299
00:10:44,730 --> 00:10:46,370
so those are the two asterisks.

300
00:10:46,370 --> 00:10:51,592
Be warned, but also, you
know, do play around with it.

301
00:10:51,592 --> 00:10:53,550
Let's see, the founding
paper, for those of you

302
00:10:53,550 --> 00:10:55,841
who are interested, you can
look at this link later on.

303
00:10:55,841 --> 00:10:58,260
It was by Goodman,
Mansinghka, Dan Roy, Bonawitz,

304
00:10:58,260 --> 00:10:59,240
and Tenenbaum.

305
00:10:59,240 --> 00:11:01,220
And for those of
you who, by the way,

306
00:11:01,220 --> 00:11:03,290
have already read
about Church a bit,

307
00:11:03,290 --> 00:11:05,840
you think that this tutorial
is a bit-- maybe it was--

308
00:11:05,840 --> 00:11:08,720
I should say, we'll start
off very, very easy, OK?

309
00:11:08,720 --> 00:11:09,980
We'll do things like addition.

310
00:11:09,980 --> 00:11:11,660
We'll do things like
flipping coins, OK?

311
00:11:11,660 --> 00:11:12,799
If you think that this is--

312
00:11:12,799 --> 00:11:14,590
maybe you've already
read through probmods,

313
00:11:14,590 --> 00:11:17,060
you've already done a few
chapters of that, by all means,

314
00:11:17,060 --> 00:11:19,130
use this time to
continue to think

315
00:11:19,130 --> 00:11:21,230
about probabilistic
programming, for example,

316
00:11:21,230 --> 00:11:22,610
either by talking
to me, and I'll

317
00:11:22,610 --> 00:11:26,930
find something for you, or
by going to forestdb.org.

318
00:11:26,930 --> 00:11:29,540
Again, I'll give you that link
for those of you who want it.

319
00:11:29,540 --> 00:11:31,790
It has a whole repository
of different probabilistic

320
00:11:31,790 --> 00:11:34,610
programming models that you
can play with, think about, see

321
00:11:34,610 --> 00:11:36,690
how you would change them,
and things like that.

322
00:11:36,690 --> 00:11:39,090
Also after this tutorial,
if you're still interested,

323
00:11:39,090 --> 00:11:41,330
you can go to that link.

324
00:11:41,330 --> 00:11:43,170
Oh and one last thing.

325
00:11:43,170 --> 00:11:45,890
There's sort of a-- you
can't see that right there.

326
00:11:45,890 --> 00:11:48,820
One last thing that I
should say about Church,

327
00:11:48,820 --> 00:11:49,820
it's based on Scheme.

328
00:11:49,820 --> 00:11:51,736
But a lot of the people
that have sort of been

329
00:11:51,736 --> 00:11:53,420
doing a lot of work
on it have become

330
00:11:53,420 --> 00:11:55,682
more in love with JavaScript.

331
00:11:55,682 --> 00:11:57,890
In fact, the thing that
you're going to be working on

332
00:11:57,890 --> 00:11:59,750
is sort of a JavaScript
implementation

333
00:11:59,750 --> 00:12:01,240
of Church under the hood.

334
00:12:01,240 --> 00:12:04,730
And they've started to implement
something called WebPPL, so

335
00:12:04,730 --> 00:12:07,040
Web Probabilistic
Programming Language.

336
00:12:07,040 --> 00:12:08,570
It's a language
that's specifically

337
00:12:08,570 --> 00:12:09,699
a derivative of JavaScript.

338
00:12:09,699 --> 00:12:11,240
For those of you
who like JavaScript,

339
00:12:11,240 --> 00:12:12,540
you can play with that.

340
00:12:12,540 --> 00:12:15,112
And if you go to WebPPL.org,
if you search for WebPPL,

341
00:12:15,112 --> 00:12:16,820
again, I can leave
you the link for that.

342
00:12:16,820 --> 00:12:18,950
It's sort of here,
but you can't see it.

343
00:12:18,950 --> 00:12:20,870
There are, again, a lot
of nice examples there

344
00:12:20,870 --> 00:12:22,730
of different programming
language-- programs

345
00:12:22,730 --> 00:12:24,920
that you can write
in JavaScript.

346
00:12:24,920 --> 00:12:29,420
OK, that was a very
long-winded introduction,

347
00:12:29,420 --> 00:12:31,807
caveats, and setting
up different things.

348
00:12:31,807 --> 00:12:33,890
The objectives for this
tutorial is, first of all,

349
00:12:33,890 --> 00:12:35,780
to become familiar
with the Church syntax,

350
00:12:35,780 --> 00:12:38,720
it can be a little wonky, if you
don't know it, at first, to run

351
00:12:38,720 --> 00:12:40,910
forward a few models to
give you an example of just,

352
00:12:40,910 --> 00:12:44,300
before inference, an example
of, here's my forward model,

353
00:12:44,300 --> 00:12:47,032
here's how I describe the world,
now let's try sampling from it.

354
00:12:47,032 --> 00:12:48,740
Let's sample, sample
again, sample again,

355
00:12:48,740 --> 00:12:51,665
sample again, see what
distributions we get.

356
00:12:51,665 --> 00:12:53,767
Get a sense for
the point that I'm

357
00:12:53,767 --> 00:12:55,850
going to make a few times,
which is once you write

358
00:12:55,850 --> 00:12:58,250
your forward model,
that is a representation

359
00:12:58,250 --> 00:12:59,935
of a distribution--

360
00:12:59,935 --> 00:13:01,310
and I'll come back
to this point,

361
00:13:01,310 --> 00:13:02,851
but just, sort of,
keep that in mind.

362
00:13:02,851 --> 00:13:03,950
You write down a program.

363
00:13:03,950 --> 00:13:04,908
And you run it forward.

364
00:13:04,908 --> 00:13:05,870
And you get a sample.

365
00:13:05,870 --> 00:13:07,870
You run it again and you
get a different sample.

366
00:13:07,870 --> 00:13:12,200
You run it in the limit,
you get some distribution.

367
00:13:12,200 --> 00:13:15,110
Some other constructs
like memoization--

368
00:13:15,110 --> 00:13:16,790
after we do all
of this, we'll try

369
00:13:16,790 --> 00:13:20,316
to get at sampling, and the
query operator, and really,

370
00:13:20,316 --> 00:13:21,440
conditioning and inference.

371
00:13:21,440 --> 00:13:23,750
So we said we'll try to
run a few models forward.

372
00:13:23,750 --> 00:13:26,910
Once we do that, we'll try
to get the hang of inference.

373
00:13:26,910 --> 00:13:29,840
So you'll try to write down
a forward model about things

374
00:13:29,840 --> 00:13:32,639
like a coin, or goal
inference, or things like that.

375
00:13:32,639 --> 00:13:34,430
And you'll try to
actually infer something,

376
00:13:34,430 --> 00:13:35,930
like what is the
weight of the coin,

377
00:13:35,930 --> 00:13:41,510
from some data, like some coin
flips, some very simple stuff.

378
00:13:41,510 --> 00:13:44,180
OK, and we'll go through some
examples, like, as I said,

379
00:13:44,180 --> 00:13:46,520
coin flipping, maybe causal
networks, maybe intuitive

380
00:13:46,520 --> 00:13:48,320
physics and
intuitive psychology.

381
00:13:48,320 --> 00:13:50,150
I do hope to get to
intuitive psychology.

382
00:13:50,150 --> 00:13:52,300
We'll see if we get to that.

383
00:13:52,300 --> 00:13:53,810
So some prerequisites
and set up,

384
00:13:53,810 --> 00:13:55,770
that's what I asked you
to do at the beginning.

385
00:13:55,770 --> 00:13:58,340
If you happen to have
a local implementation,

386
00:13:58,340 --> 00:13:59,690
you can open that now.

387
00:13:59,690 --> 00:14:05,720
If you didn't, just go to
probmods.org/play-space.html

388
00:14:05,720 --> 00:14:08,120
and open that up.

389
00:14:08,120 --> 00:14:11,510
And we're going to play a
game of Noisy Tomer Says.

390
00:14:11,510 --> 00:14:13,280
So now you should also--

391
00:14:13,280 --> 00:14:15,430
open this, open a
browser, go to that,

392
00:14:15,430 --> 00:14:17,270
or open your local
implementation.

393
00:14:17,270 --> 00:14:23,144
Also open up the file
that I sent you of--

394
00:14:23,144 --> 00:14:25,310
it should have have been
called, like, student copy,

395
00:14:25,310 --> 00:14:26,630
something like that.

396
00:14:26,630 --> 00:14:28,713
It contains a bunch of
things that we're basically

397
00:14:28,713 --> 00:14:31,887
going to just sort of copy,
paste into the browser.

398
00:14:31,887 --> 00:14:33,470
Now, the nice thing
about this browser

399
00:14:33,470 --> 00:14:35,270
is, it is sort of a working
implementation of Church.

400
00:14:35,270 --> 00:14:36,395
You just paste in the code.

401
00:14:36,395 --> 00:14:37,130
You hit run.

402
00:14:37,130 --> 00:14:38,906
It runs, OK?

403
00:14:38,906 --> 00:14:41,405
So you guys should all more or
less have a screen like this.

404
00:14:44,095 --> 00:14:46,282
I'll take this out so I
don't sit on it right now.

405
00:14:46,282 --> 00:14:47,990
Does everyone have
more or less something

406
00:14:47,990 --> 00:14:49,823
like this, some sort
of browser that you can

407
00:14:49,823 --> 00:14:51,920
type things into and press run?

408
00:14:51,920 --> 00:14:52,980
Over there?

409
00:14:52,980 --> 00:14:55,730
OK, we'll start off with
some very, very simple stuff

410
00:14:55,730 --> 00:15:00,320
that you should already have
in the syntax of the Church

411
00:15:00,320 --> 00:15:02,960
tutorial, so just try
either pasting in or typing

412
00:15:02,960 --> 00:15:06,300
in things like this thing.

413
00:15:06,300 --> 00:15:09,080
So the first thing you'll
notice is that, over here, it's

414
00:15:09,080 --> 00:15:11,040
what's called--

415
00:15:11,040 --> 00:15:13,810
sorry, let me adjust this
screen so it's not actually--

416
00:15:13,810 --> 00:15:15,110
so that you can see it.

417
00:15:18,000 --> 00:15:20,320
Zone C over here, you
should be looking--

418
00:15:20,320 --> 00:15:26,977
I've sort of done over here,
plus 2 2, and the result is 4.

419
00:15:26,977 --> 00:15:28,560
So the first thing
to see, some of you

420
00:15:28,560 --> 00:15:31,185
may be familiar with this, who's
somebody with Polish notation,

421
00:15:31,185 --> 00:15:34,020
where you just go plus 2 2?

422
00:15:34,020 --> 00:15:35,630
Instead of going 2 plus--

423
00:15:35,630 --> 00:15:38,670
who is not familiar
with Polish notation?

424
00:15:38,670 --> 00:15:40,710
OK, good, thank you.

425
00:15:40,710 --> 00:15:43,650
Polish notation just means that,
instead of writing 2 plus 2,

426
00:15:43,650 --> 00:15:46,710
you write plus 2 2, so you write
that the thing that operates,

427
00:15:46,710 --> 00:15:49,620
the function, outside, and
you write all the arguments

428
00:15:49,620 --> 00:15:51,296
for the function like that.

429
00:15:51,296 --> 00:15:52,920
In fact, most of the
time, you do this.

430
00:15:52,920 --> 00:15:54,390
When you write down
functions for code,

431
00:15:54,390 --> 00:15:56,430
you usually write the
function then the things

432
00:15:56,430 --> 00:15:57,327
that it operates on.

433
00:15:57,327 --> 00:15:59,160
But here, it's going
to work for everything.

434
00:15:59,160 --> 00:16:00,743
And it can be a bit
confusing at first

435
00:16:00,743 --> 00:16:03,252
when you do things
like plus 2 2.

436
00:16:03,252 --> 00:16:05,460
The second thing is that
you put brackets on anything

437
00:16:05,460 --> 00:16:07,346
that you want to evaluate, OK?

438
00:16:07,346 --> 00:16:08,970
So, for example, here
is an expression.

439
00:16:08,970 --> 00:16:10,740
The expression is plus 2 2.

440
00:16:10,740 --> 00:16:13,350
And you want to evaluate
that expression.

441
00:16:13,350 --> 00:16:17,550
So for example, I wanted to
evaluate the expression--

442
00:16:17,550 --> 00:16:19,737
I think I put some,
like, cursor for--

443
00:16:19,737 --> 00:16:21,570
so you can see what I'm
doing with my thing.

444
00:16:21,570 --> 00:16:24,960
OK, if you want to do something
like, you know, times 2 2,

445
00:16:24,960 --> 00:16:26,230
that would be the same thing.

446
00:16:26,230 --> 00:16:30,520
And I would go to run.

447
00:16:30,520 --> 00:16:32,595
And that would be,
of course, 4 again.

448
00:16:32,595 --> 00:16:34,860
It let's you do some
other examples from here.

449
00:16:34,860 --> 00:16:36,660
Like there's a bunch
of simple logic,

450
00:16:36,660 --> 00:16:38,640
like you might do display.

451
00:16:38,640 --> 00:16:40,560
Display is just a way
to run it, to-- sorry,

452
00:16:40,560 --> 00:16:43,200
to display the result over here.

453
00:16:43,200 --> 00:16:46,030
You can do a bunch of
logic things, like equal.

454
00:16:46,030 --> 00:16:47,670
So again, the
operator is outside.

455
00:16:47,670 --> 00:16:49,914
And you would do equal
question mark 2 2,

456
00:16:49,914 --> 00:16:51,330
and then evaluate
that expression.

457
00:16:51,330 --> 00:16:53,040
And you can do
bigger than equals,

458
00:16:53,040 --> 00:16:54,165
all these different things.

459
00:16:54,165 --> 00:16:55,080
AUDIENCE: the question mark?

460
00:16:55,080 --> 00:16:56,130
TOMER ULLMAN: The
question mark is just--

461
00:16:56,130 --> 00:16:57,005
I've just named it that way.

462
00:16:57,005 --> 00:16:58,463
It doesn't actually
have any sense.

463
00:16:58,463 --> 00:17:02,754
I could have just called it
equal-- sorry, no, sorry.

464
00:17:02,754 --> 00:17:04,920
There is no particular
meaning to the question mark.

465
00:17:04,920 --> 00:17:06,690
It's just that this
thing, this operator,

466
00:17:06,690 --> 00:17:08,432
is called equal question mark.

467
00:17:08,432 --> 00:17:09,390
That's the name for it.

468
00:17:09,390 --> 00:17:11,069
And it's just-- it is
the equals operator.

469
00:17:11,069 --> 00:17:12,652
That's how you check
if two things are

470
00:17:12,652 --> 00:17:13,661
equal to one another.

471
00:17:13,661 --> 00:17:15,119
In languages like
Python, you would

472
00:17:15,119 --> 00:17:18,119
do, you know, equals
equals, like that.

473
00:17:18,119 --> 00:17:22,170
This is how you do it here, OK?

474
00:17:22,170 --> 00:17:24,270
Let's see, a few other
simple syntax things.

475
00:17:24,270 --> 00:17:26,880
So you might say, for
example, the statement

476
00:17:26,880 --> 00:17:30,060
for defining variables
is, shockingly enough,

477
00:17:30,060 --> 00:17:33,720
define, so you
would do define x 3.

478
00:17:33,720 --> 00:17:37,220
And now, the next time that
I do x, then hopefully--

479
00:17:37,220 --> 00:17:40,020
and I run that--
then it'll show 3.

480
00:17:40,020 --> 00:17:42,300
There are a few other
basic syntax things,

481
00:17:42,300 --> 00:17:45,030
like lists, that might be
important, like, you know,

482
00:17:45,030 --> 00:17:48,300
define x to be a list of 1 2 3.

483
00:17:48,300 --> 00:17:51,150
And if you run that,
then you'll get 1 2 3.

484
00:17:51,150 --> 00:17:53,490
Again, we're starting
out very, very slow,

485
00:17:53,490 --> 00:17:56,070
but we'll hopefully
get soon to more things

486
00:17:56,070 --> 00:17:58,510
like Gaussian processes.

487
00:17:58,510 --> 00:18:01,740
Some simple things like
if-then statements--

488
00:18:01,740 --> 00:18:04,680
OK, I'm just copying and
pasting off of this document

489
00:18:04,680 --> 00:18:06,330
that you should
all have, so that's

490
00:18:06,330 --> 00:18:07,871
why I'm, sort of,
running through it.

491
00:18:07,871 --> 00:18:09,900
But the point is that
you would do-- the syntax

492
00:18:09,900 --> 00:18:17,280
for doing an if-then conditional
statement is like this.

493
00:18:17,280 --> 00:18:21,940
You write down if, and then
you write down the condition

494
00:18:21,940 --> 00:18:24,935
that either evaluates
to true or to false.

495
00:18:24,935 --> 00:18:29,910
So it's if this condition,
do the first thing.

496
00:18:29,910 --> 00:18:33,206
If it's false, do
the second thing.

497
00:18:33,206 --> 00:18:34,830
In this particular
case, I have defined

498
00:18:34,830 --> 00:18:36,330
a variable called socrates.

499
00:18:36,330 --> 00:18:38,400
I've defined it as drunk.

500
00:18:38,400 --> 00:18:43,110
And then I run the condition
equal socrates drunk,

501
00:18:43,110 --> 00:18:45,462
if that's true, then
return the answer true.

502
00:18:45,462 --> 00:18:47,670
Or, you know, I could have
written return the answer,

503
00:18:47,670 --> 00:18:48,600
Socrates is a drunk.

504
00:18:48,600 --> 00:18:50,730
If it's false, return
the answer false.

505
00:18:50,730 --> 00:18:53,430
Did everyone more or
less get the conditional?

506
00:18:53,430 --> 00:18:56,160
It just says, if condition,
return the first thing

507
00:18:56,160 --> 00:18:58,025
otherwise, the thing
on the second line.

508
00:18:58,025 --> 00:18:59,400
Another important
thing before we

509
00:18:59,400 --> 00:19:02,467
start getting at more
things like recursion

510
00:19:02,467 --> 00:19:04,050
and forward sampling
is the notion of,

511
00:19:04,050 --> 00:19:06,300
how would I define a function?

512
00:19:06,300 --> 00:19:08,160
So, so far we've defined
variables, right?

513
00:19:08,160 --> 00:19:12,124
I could have defined something
like define x 2, right?

514
00:19:12,124 --> 00:19:13,790
And then that would
have just been that.

515
00:19:13,790 --> 00:19:15,998
But I want to define,
probably, functions, so I might

516
00:19:15,998 --> 00:19:17,860
define something like define--

517
00:19:17,860 --> 00:19:19,050
and now I have two options.

518
00:19:19,050 --> 00:19:23,112
There are two ways of
defining functions in Church.

519
00:19:23,112 --> 00:19:24,570
One of them is to
do the following.

520
00:19:24,570 --> 00:19:26,860
You define square.

521
00:19:26,860 --> 00:19:31,230
And then you say, well, square
is, itself, a procedure.

522
00:19:31,230 --> 00:19:31,982
It is a lambda.

523
00:19:31,982 --> 00:19:33,690
And I'll explain this
as I go along, just

524
00:19:33,690 --> 00:19:35,190
watch me, sort of, type it.

525
00:19:35,190 --> 00:19:38,060
It takes in a particular
argument, say, x.

526
00:19:38,060 --> 00:19:41,794
And then what it does to,
is it multiplies x by x.

527
00:19:41,794 --> 00:19:46,180
So the point is, you say, well,
here, x is a particular thing.

528
00:19:46,180 --> 00:19:46,920
It is an object.

529
00:19:46,920 --> 00:19:47,419
What is it?

530
00:19:47,419 --> 00:19:48,460
It is just 2.

531
00:19:48,460 --> 00:19:50,940
Here, square is a thing.

532
00:19:50,940 --> 00:19:52,290
What sort of thing is it?

533
00:19:52,290 --> 00:19:53,830
It is this thing.

534
00:19:53,830 --> 00:19:55,770
Ah, what is this thing?

535
00:19:55,770 --> 00:19:59,940
This thing is a procedure that--
this is the only thing that you

536
00:19:59,940 --> 00:20:01,500
need to know about functions.

537
00:20:01,500 --> 00:20:04,290
Lambda is the thing that
actually defines functions, OK?

538
00:20:04,290 --> 00:20:06,870
It is a procedure that takes
in some number of arguments,

539
00:20:06,870 --> 00:20:08,310
in this case, just one argument.

540
00:20:08,310 --> 00:20:09,270
You could have
called it anything.

541
00:20:09,270 --> 00:20:09,990
I just called it x.

542
00:20:09,990 --> 00:20:11,430
You could have
called it argument1.

543
00:20:11,430 --> 00:20:12,540
You could have
called it socrates.

544
00:20:12,540 --> 00:20:14,010
You could have called it fubar.

545
00:20:14,010 --> 00:20:15,540
But the point is, it
takes in this argument.

546
00:20:15,540 --> 00:20:17,760
And then what does it do
with it is the next thing?

547
00:20:17,760 --> 00:20:20,327
So you say, lambda, number of
arguments that you take in.

548
00:20:20,327 --> 00:20:21,660
And then what do you do with it?

549
00:20:21,660 --> 00:20:24,320
In this case, you
just do times x x.

550
00:20:24,320 --> 00:20:27,240
So this is a function called
square, very basic stuff.

551
00:20:27,240 --> 00:20:30,250
It takes in an argument and
it multiplies it by itself,

552
00:20:30,250 --> 00:20:34,986
so it is the square of x,
x times x, very simple.

553
00:20:34,986 --> 00:20:37,110
There's another way of
doing that if you don't want

554
00:20:37,110 --> 00:20:38,880
to type out lambdas,
if you don't want

555
00:20:38,880 --> 00:20:41,100
to start doing lambda
this, lambda that,

556
00:20:41,100 --> 00:20:43,117
it's sort of annoying.

557
00:20:43,117 --> 00:20:44,700
Let me just give you
one more example.

558
00:20:44,700 --> 00:20:46,260
Like, if I wanted something
with two arguments,

559
00:20:46,260 --> 00:20:48,843
I could have done-- you know, I
could have called it something

560
00:20:48,843 --> 00:20:51,570
like my-proc lambda x y.

561
00:20:51,570 --> 00:20:53,950
And now, what it does
is, it multiplies xy.

562
00:20:53,950 --> 00:20:56,604
OK, this is an
example of a thing.

563
00:20:56,604 --> 00:20:57,645
What sort of thing is it?

564
00:20:57,645 --> 00:20:58,650
It is a procedure.

565
00:20:58,650 --> 00:21:00,900
I know it's a procedure
because it starts with lambda.

566
00:21:00,900 --> 00:21:02,535
It takes in two arguments.

567
00:21:02,535 --> 00:21:04,427
Here they're called x and y.

568
00:21:04,427 --> 00:21:05,510
What does that do with it?

569
00:21:05,510 --> 00:21:06,900
It multiplies x times y.

570
00:21:06,900 --> 00:21:08,460
Really, this is
just multiplication.

571
00:21:08,460 --> 00:21:09,930
So after I define
this procedure,

572
00:21:09,930 --> 00:21:12,120
I could then do,
like, my-proc-- sorry,

573
00:21:12,120 --> 00:21:13,470
I should have explained that.

574
00:21:13,470 --> 00:21:15,895
Then you do my-proc, say, 2
8, or something like that.

575
00:21:15,895 --> 00:21:16,770
AUDIENCE: [INAUDIBLE]

576
00:21:16,770 --> 00:21:17,894
TOMER ULLMAN: Yeah, sorry--

577
00:21:17,894 --> 00:21:19,845
that's a very good question.

578
00:21:19,845 --> 00:21:20,970
And it would bring back 16.

579
00:21:20,970 --> 00:21:25,590
Sorry, once I define my thing,
this is an operator now.

580
00:21:25,590 --> 00:21:28,470
This is an operator that
can be applied to arguments.

581
00:21:28,470 --> 00:21:33,060
And you apply it by doing that
parentheses that we just saw.

582
00:21:33,060 --> 00:21:35,381
If I just tried, by the way,
like, without applying it,

583
00:21:35,381 --> 00:21:37,880
if I just tried something like
this, what you would get back

584
00:21:37,880 --> 00:21:44,050
is, it would say, this is a
function, because it just says,

585
00:21:44,050 --> 00:21:44,980
what is this thing?

586
00:21:44,980 --> 00:21:46,030
You try to evaluate it.

587
00:21:46,030 --> 00:21:48,310
You're not evaluating on
anything, so it just returns,

588
00:21:48,310 --> 00:21:48,830
what is this thing?

589
00:21:48,830 --> 00:21:49,520
It's a function.

590
00:21:49,520 --> 00:21:53,594
It's a function that expects
x y and then multiplies them.

591
00:21:53,594 --> 00:21:55,510
If you actually want to
apply it on something,

592
00:21:55,510 --> 00:21:57,676
you would need to provide
with some input arguments.

593
00:21:57,676 --> 00:22:02,220
So I said, let's try to define
square as a lambda of x.

594
00:22:02,220 --> 00:22:05,890
That does-- it takes in an
x and multiplies x by x.

595
00:22:05,890 --> 00:22:09,400
There's one more way to
define a function, which,

596
00:22:09,400 --> 00:22:11,860
it sort of gets rid of
this lambda type thing.

597
00:22:11,860 --> 00:22:14,152
It's exactly equivalent to
the thing I just showed you,

598
00:22:14,152 --> 00:22:16,234
it just takes a bit less
writing, which is to say,

599
00:22:16,234 --> 00:22:16,780
define--

600
00:22:16,780 --> 00:22:19,090
I just misspelled
square, didn't I?

601
00:22:19,090 --> 00:22:20,080
Yes.

602
00:22:20,080 --> 00:22:23,530
Define square x-- like that--

603
00:22:23,530 --> 00:22:25,870
times x x.

604
00:22:25,870 --> 00:22:28,570
Now what this is
saying, so this just

605
00:22:28,570 --> 00:22:30,940
goes straight to saying,
like, before I would say,

606
00:22:30,940 --> 00:22:33,730
define this thing 2.

607
00:22:33,730 --> 00:22:36,280
OK, and then I said, define
this thing, the square,

608
00:22:36,280 --> 00:22:38,830
as this procedure.

609
00:22:38,830 --> 00:22:41,770
Here you can say, I want to
directly define a procedure.

610
00:22:41,770 --> 00:22:43,750
I'm not going to bother
with this lambda stuff.

611
00:22:43,750 --> 00:22:45,190
I want to directly
define a function.

612
00:22:45,190 --> 00:22:46,773
I want to directly
define a procedure.

613
00:22:46,773 --> 00:22:47,440
Can I do that?

614
00:22:47,440 --> 00:22:49,060
Yes, you could if you wanted to.

615
00:22:49,060 --> 00:22:51,370
You would just directly put
these brackets right there.

616
00:22:51,370 --> 00:22:52,690
You would say define.

617
00:22:52,690 --> 00:22:55,480
And if the next thing is
some brackets, then it says,

618
00:22:55,480 --> 00:22:57,190
OK, I'm going to
define a procedure

619
00:22:57,190 --> 00:22:59,470
where the name of the
procedure is square.

620
00:22:59,470 --> 00:23:02,020
And it takes in one
argument, which is x.

621
00:23:02,020 --> 00:23:04,562
And what it does
to it is times x x.

622
00:23:04,562 --> 00:23:06,520
And if you do it that
way, then under the hood,

623
00:23:06,520 --> 00:23:08,940
what Scheme does is actually
writes it out like this.

624
00:23:08,940 --> 00:23:10,600
It puts in the lambda
where it expects,

625
00:23:10,600 --> 00:23:13,050
but again, this is not
terribly important stuff.

626
00:23:13,050 --> 00:23:14,800
And those of you are,
sort of, tuning out,

627
00:23:14,800 --> 00:23:16,150
and saying, well, fine.

628
00:23:16,150 --> 00:23:18,108
And you just wanted to
learn about-- a bit more

629
00:23:18,108 --> 00:23:20,990
about how probabilistic
programming works, don't worry.

630
00:23:20,990 --> 00:23:23,254
We'll get to some examples
in about 10 minutes.

631
00:23:23,254 --> 00:23:25,420
Here's another very useful
thing that you might want

632
00:23:25,420 --> 00:23:27,220
to do in many of your things.

633
00:23:27,220 --> 00:23:28,540
This is called the map.

634
00:23:28,540 --> 00:23:31,420
And the way map works
is, you map a function

635
00:23:31,420 --> 00:23:32,780
to a bunch of arguments.

636
00:23:32,780 --> 00:23:34,750
So you would say--

637
00:23:34,750 --> 00:23:36,610
map is just a
high-level function

638
00:23:36,610 --> 00:23:38,470
which takes in a
particular procedure.

639
00:23:38,470 --> 00:23:42,640
Then it applies it to each one
of these things individually,

640
00:23:42,640 --> 00:23:43,270
OK?

641
00:23:43,270 --> 00:23:44,830
So square, in this
case, as we said,

642
00:23:44,830 --> 00:23:47,270
it is a thing that
takes in one argument.

643
00:23:47,270 --> 00:23:49,520
So this is now going to take
square and apply it to 1.

644
00:23:49,520 --> 00:23:50,845
So then I'm going to take
square and apply it to 2,

645
00:23:50,845 --> 00:23:52,121
take square and apply it to 3.

646
00:23:52,121 --> 00:23:53,620
And the result of
this is just going

647
00:23:53,620 --> 00:24:01,660
to be a list of squares, 1
4, 9, 16, 25, simple enough?

648
00:24:01,660 --> 00:24:02,830
Yes.

649
00:24:02,830 --> 00:24:04,160
But map is very useful.

650
00:24:04,160 --> 00:24:05,840
You should probably
know about it.

651
00:24:05,840 --> 00:24:08,710
OK, some simple things
like, recursion, OK,

652
00:24:08,710 --> 00:24:11,110
so suppose I wanted to
apply square to the list

653
00:24:11,110 --> 00:24:15,730
from 1 to 100, and suppose I
didn't have the range 1 to 100.

654
00:24:15,730 --> 00:24:17,560
Most languages in
Scheme actually

655
00:24:17,560 --> 00:24:20,110
does have something called
range, which gives you

656
00:24:20,110 --> 00:24:21,850
all the numbers from 1 to 100.

657
00:24:21,850 --> 00:24:22,750
Suppose I didn't.

658
00:24:22,750 --> 00:24:24,700
Suppose I want to construct
all the numbers 1 to 100.

659
00:24:24,700 --> 00:24:26,449
I don't want to actually
write them down--

660
00:24:26,449 --> 00:24:29,830
1, 2, 3, 4, 5, 6, all
the way up to 100.

661
00:24:29,830 --> 00:24:31,700
I can write down
something that does that.

662
00:24:31,700 --> 00:24:33,580
And it uses a little
bit of recursion.

663
00:24:33,580 --> 00:24:35,140
And the way it does it is this.

664
00:24:35,140 --> 00:24:37,610
This is just to get
you used to recursion,

665
00:24:37,610 --> 00:24:40,960
because we'll be seeing
it a little bit later.

666
00:24:40,960 --> 00:24:46,754
And this says, OK, I'm going to
define something called range,

667
00:24:46,754 --> 00:24:49,170
which takes in an argument--
you should now be used to it,

668
00:24:49,170 --> 00:24:51,390
this is the same thing
that we defined over here.

669
00:24:51,390 --> 00:24:53,235
We're going to call
something a procedure.

670
00:24:53,235 --> 00:24:54,474
And we're going to call--

671
00:24:54,474 --> 00:24:55,890
we're going to
define a procedure.

672
00:24:55,890 --> 00:24:57,000
It's called range.

673
00:24:57,000 --> 00:24:59,470
It takes in an argument,
n, one argument.

674
00:24:59,470 --> 00:25:00,460
What does it do?

675
00:25:00,460 --> 00:25:01,920
Well, it depends.

676
00:25:01,920 --> 00:25:03,160
It does a conditional.

677
00:25:03,160 --> 00:25:06,720
A conditional, it depends,
let's see, is n equal to 0?

678
00:25:06,720 --> 00:25:10,190
If it's 0, just give
me back an empty list.

679
00:25:10,190 --> 00:25:13,080
Does everyone sort of
see that, if equal n 0,

680
00:25:13,080 --> 00:25:14,220
give me back a list.

681
00:25:14,220 --> 00:25:15,090
What if it's not 0?

682
00:25:15,090 --> 00:25:16,650
What if I did range 10?

683
00:25:16,650 --> 00:25:18,739
Oh, well, in that case, append--

684
00:25:18,739 --> 00:25:21,030
another thing that you might
want to know, so it's just

685
00:25:21,030 --> 00:25:24,690
combine these two things--
append what with what?

686
00:25:24,690 --> 00:25:31,680
Append range again with
n minus 1 and with n.

687
00:25:31,680 --> 00:25:33,300
The point here is
to say, OK, how

688
00:25:33,300 --> 00:25:35,434
do I get the numbers 1 to 100?

689
00:25:35,434 --> 00:25:36,600
I just, sort of, say range--

690
00:25:36,600 --> 00:25:41,430
I want the range 1 to 100, so
I say, 100-- am I at 0 yet?

691
00:25:41,430 --> 00:25:46,140
No, so take 100 and
append it with range 99.

692
00:25:46,140 --> 00:25:47,740
What does range 99 do?

693
00:25:47,740 --> 00:25:49,110
Well, is 99 0?

694
00:25:49,110 --> 00:25:51,810
No, so give me
back 99 plus what?

695
00:25:51,810 --> 00:25:53,060
Plus range 98.

696
00:25:53,060 --> 00:25:54,210
Is 98 0?

697
00:25:54,210 --> 00:25:57,290
No, keep going, so it's
basically recursing-- range

698
00:25:57,290 --> 00:26:00,720
is a recursive function
that calls itself until it

699
00:26:00,720 --> 00:26:04,110
hits 0, very simple recursion.

700
00:26:04,110 --> 00:26:05,610
And now you can do
this to write out

701
00:26:05,610 --> 00:26:07,980
all the numbers from 1 to 100.

702
00:26:07,980 --> 00:26:09,750
And then you, if you
were so inclined,

703
00:26:09,750 --> 00:26:14,440
you could do math
square to that.

704
00:26:14,440 --> 00:26:15,670
OK, and we run that.

705
00:26:15,670 --> 00:26:18,010
And it gives me all the
numbers from-- the squares

706
00:26:18,010 --> 00:26:19,690
of the numbers from 1 to 100.

707
00:26:19,690 --> 00:26:23,740
So far we've talked just
about very basic stuff.

708
00:26:23,740 --> 00:26:26,290
This is no different
from Scheme.

709
00:26:26,290 --> 00:26:28,817
You are all experts in Scheme
notation and things like that.

710
00:26:28,817 --> 00:26:31,150
Let's move on to something a
little bit more interesting

711
00:26:31,150 --> 00:26:33,940
that Church can do, which
is, for example, take

712
00:26:33,940 --> 00:26:39,340
random sequences, and
it can take random--

713
00:26:39,340 --> 00:26:41,050
how should I put this?

714
00:26:41,050 --> 00:26:45,430
Kind of like plus is a basic
thing in certain programming

715
00:26:45,430 --> 00:26:48,082
languages, it's a
primitive, right?

716
00:26:48,082 --> 00:26:49,540
It's written into
the language what

717
00:26:49,540 --> 00:26:52,030
plus means, what times means.

718
00:26:52,030 --> 00:26:53,470
You don't have to define that.

719
00:26:53,470 --> 00:26:55,619
The way most languages
work is that they

720
00:26:55,619 --> 00:26:57,160
have this sort of
long list of things

721
00:26:57,160 --> 00:26:58,329
that they need to evaluate.

722
00:26:58,329 --> 00:26:59,620
And they start evaluating them.

723
00:26:59,620 --> 00:27:02,650
And they're, sort of, OK, did
I hit an expression I know,

724
00:27:02,650 --> 00:27:04,575
like a number or not?

725
00:27:04,575 --> 00:27:06,450
And it's, sort of, no,
you didn't hit it yet.

726
00:27:06,450 --> 00:27:08,040
OK, fine, keep evaluating,
keep evaluating,

727
00:27:08,040 --> 00:27:10,090
keep evaluating until you
get some sort of primitive.

728
00:27:10,090 --> 00:27:12,070
And a primitive procedure
could be something

729
00:27:12,070 --> 00:27:13,992
like plus or a number.

730
00:27:13,992 --> 00:27:15,700
In Church, there are
primitive procedures

731
00:27:15,700 --> 00:27:18,067
which are random
primitive procedures.

732
00:27:18,067 --> 00:27:19,900
They are procedures
that, when you hit them,

733
00:27:19,900 --> 00:27:23,650
what you do is, you just return
a value, a sampled value,

734
00:27:23,650 --> 00:27:26,480
from this expression, from
this probability distribution.

735
00:27:26,480 --> 00:27:30,202
So the most basic
random primitive,

736
00:27:30,202 --> 00:27:32,410
the most basic distribution
that you can do in Church

737
00:27:32,410 --> 00:27:34,120
is something called flip.

738
00:27:34,120 --> 00:27:38,300
And if you just write
down flip in Church,

739
00:27:38,300 --> 00:27:41,730
what you'll get, if you run
it like that, is it tells you,

740
00:27:41,730 --> 00:27:43,250
well, it's a function.

741
00:27:43,250 --> 00:27:45,050
And it depends on
certain arguments.

742
00:27:45,050 --> 00:27:46,400
And it tells you many,
many things about it,

743
00:27:46,400 --> 00:27:47,566
but that's not what we want.

744
00:27:47,566 --> 00:27:50,450
We want to evaluate it, so put
some parentheses around it.

745
00:27:50,450 --> 00:27:51,490
And we'll run it.

746
00:27:51,490 --> 00:27:54,440
And it will give us back false.

747
00:27:54,440 --> 00:27:55,575
OK, let's try that again.

748
00:27:55,575 --> 00:27:57,020
So let's run that again.

749
00:27:57,020 --> 00:27:59,629
It will gave us back
true, OK, interesting.

750
00:27:59,629 --> 00:28:01,670
And if we run that again,
you know, we get false.

751
00:28:01,670 --> 00:28:04,010
We run it again, and we get
maybe true, maybe false.

752
00:28:04,010 --> 00:28:07,610
You could do repeat
1,000 times flip.

753
00:28:07,610 --> 00:28:09,410
OK, repeat is another
important thing

754
00:28:09,410 --> 00:28:10,580
that you would need to know.

755
00:28:10,580 --> 00:28:12,860
It just says repeat
as many times

756
00:28:12,860 --> 00:28:14,970
as you want to repeat
some sort of function.

757
00:28:14,970 --> 00:28:16,970
In this case, the
function is flip.

758
00:28:16,970 --> 00:28:19,290
OK, so repeat flip 1,000 times.

759
00:28:19,290 --> 00:28:21,540
I hope you guys are trying
this while I'm saying this.

760
00:28:21,540 --> 00:28:23,039
Are people trying
this more or less?

761
00:28:23,039 --> 00:28:24,110
OK, cool.

762
00:28:24,110 --> 00:28:25,451
So repeat 1,000 times flip.

763
00:28:25,451 --> 00:28:27,200
And what you'll get
back is this long list

764
00:28:27,200 --> 00:28:29,670
of true, false, true, false,
false, true, false, true.

765
00:28:29,670 --> 00:28:31,253
And it's independent
from one another,

766
00:28:31,253 --> 00:28:33,627
because it's an exchangeable
random sequence.

767
00:28:33,627 --> 00:28:35,460
And if you want to see
what this looks like,

768
00:28:35,460 --> 00:28:37,293
well, you could just
do something like hist.

769
00:28:38,992 --> 00:28:39,950
And you would run that.

770
00:28:39,950 --> 00:28:42,110
And you would get, you
know, more or less 50-50.

771
00:28:42,110 --> 00:28:46,640
Not exactly 50-50, because
I only ran it 1,000 times.

772
00:28:46,640 --> 00:28:48,950
If I had run this in the
limit, what I would get

773
00:28:48,950 --> 00:28:52,550
is 50-50 on true-false.

774
00:28:52,550 --> 00:28:54,896
Now, what's nice about this
is that this sort of gets

775
00:28:54,896 --> 00:28:57,437
at this thing that I was talking
about earlier, where there's

776
00:28:57,437 --> 00:29:00,860
dual representation for any sort
of probability distribution.

777
00:29:00,860 --> 00:29:02,990
You could either write the
probability distribution

778
00:29:02,990 --> 00:29:04,050
in math.

779
00:29:04,050 --> 00:29:08,310
You could sort of say, well,
the probability of true is 0.5.

780
00:29:08,310 --> 00:29:11,600
And the probability
of false is 0.5.

781
00:29:11,600 --> 00:29:13,885
Now I've defined a
distribution in math.

782
00:29:13,885 --> 00:29:15,740
And now you can
say, well, what's

783
00:29:15,740 --> 00:29:18,650
conditioned on this, what can
you do, and things like that.

784
00:29:18,650 --> 00:29:22,310
Or what you can do is, you
can write a program such that,

785
00:29:22,310 --> 00:29:26,840
when you run it, it will
sample one of these values.

786
00:29:26,840 --> 00:29:28,820
And in the limit,
it samples it's such

787
00:29:28,820 --> 00:29:31,827
that it approximated the thing
that we just defined in math.

788
00:29:31,827 --> 00:29:34,160
And you might say, well, why
not just define it in math?

789
00:29:34,160 --> 00:29:37,160
Because oftentimes, it gets
very, very, hairy very, very

790
00:29:37,160 --> 00:29:38,090
fast.

791
00:29:38,090 --> 00:29:40,200
And in fact, any sort of
probability distribution

792
00:29:40,200 --> 00:29:42,110
that's well-defined
and well-behaved,

793
00:29:42,110 --> 00:29:45,020
you can write as a program.

794
00:29:45,020 --> 00:29:49,140
A program which, if you run
it many times, its sampling

795
00:29:49,140 --> 00:29:52,020
profile, the thing it will give
you back if you sample it many,

796
00:29:52,020 --> 00:29:53,810
many different times,
will give you back

797
00:29:53,810 --> 00:29:55,880
that probability distribution.

798
00:29:55,880 --> 00:29:57,440
Or you could
equivalently say that,

799
00:29:57,440 --> 00:29:59,390
what it means for a
probability distribution

800
00:29:59,390 --> 00:30:02,540
to be a probability distribution
is to be some sort of program,

801
00:30:02,540 --> 00:30:05,627
to be some sort of procedure
that gives you back a sample.

802
00:30:05,627 --> 00:30:07,460
And in the limit, you
get some sort of thing

803
00:30:07,460 --> 00:30:09,590
that we're going to call the
probability distribution.

804
00:30:09,590 --> 00:30:11,780
Actually, that's the way
we define the probability

805
00:30:11,780 --> 00:30:14,550
distribution.

806
00:30:14,550 --> 00:30:16,250
And again, this
gets in-- so one way

807
00:30:16,250 --> 00:30:18,860
to think about Church programs
is that any Church program

808
00:30:18,860 --> 00:30:21,590
that you write-- if you
just write plus 2 2,

809
00:30:21,590 --> 00:30:22,850
you'll get back 4.

810
00:30:22,850 --> 00:30:25,320
That's, in a way, a
deterministic program, right?

811
00:30:25,320 --> 00:30:29,420
The probability of getting back
4 on this execution equals 1,

812
00:30:29,420 --> 00:30:32,210
but there are many other
things that you could write

813
00:30:32,210 --> 00:30:34,350
and you could get back
interesting things for them.

814
00:30:34,350 --> 00:30:35,808
And the point is
to write something

815
00:30:35,808 --> 00:30:39,710
like a generative model that
describes some sort of thing

816
00:30:39,710 --> 00:30:40,550
about the world.

817
00:30:40,550 --> 00:30:43,005
And when you run it forward,
you get to a certain sample,

818
00:30:43,005 --> 00:30:44,880
but if you run in many,
many different times,

819
00:30:44,880 --> 00:30:47,090
it gives you the
probability distribution

820
00:30:47,090 --> 00:30:48,819
that this model describes.

821
00:30:48,819 --> 00:30:51,110
And now, if you-- and again,
I'm getting slightly ahead

822
00:30:51,110 --> 00:30:51,680
of myself.

823
00:30:51,680 --> 00:30:53,690
If you change that model,
if you, for example,

824
00:30:53,690 --> 00:30:56,690
condition on something,
you'll get a different model.

825
00:30:56,690 --> 00:30:58,181
You'll get a different program.

826
00:30:58,181 --> 00:30:59,930
And you're trying to
find the program such

827
00:30:59,930 --> 00:31:02,280
that its output
will match the data.

828
00:31:02,280 --> 00:31:04,380
OK, but let's back
up a little bit.

829
00:31:04,380 --> 00:31:06,740
And we're still in flip land.

830
00:31:06,740 --> 00:31:08,660
So we have here
something which is flip.

831
00:31:08,660 --> 00:31:11,110
That's very, very basic.

832
00:31:11,110 --> 00:31:11,926
Flip can also be--

833
00:31:11,926 --> 00:31:12,800
AUDIENCE: [INAUDIBLE]

834
00:31:12,800 --> 00:31:13,740
TOMER ULLMAN: OK.

835
00:31:13,740 --> 00:31:15,260
Flip can also be a biased coin.

836
00:31:15,260 --> 00:31:16,990
So for example, if I do--

837
00:31:16,990 --> 00:31:19,745
I define something
like, you know, define--

838
00:31:22,970 --> 00:31:24,850
let's do this
slightly differently.

839
00:31:24,850 --> 00:31:26,900
Let's call this
lambda something.

840
00:31:26,900 --> 00:31:32,450
And what it does is flip 0.9.

841
00:31:32,450 --> 00:31:34,700
So if you run this forward,
what you'll get now

842
00:31:34,700 --> 00:31:36,962
is that flip can actually
take in some arguments.

843
00:31:36,962 --> 00:31:38,420
If you don't give
it any arguments,

844
00:31:38,420 --> 00:31:39,861
it'll just do flip 50-50.

845
00:31:39,861 --> 00:31:41,360
If you give it some
arguments, it'll

846
00:31:41,360 --> 00:31:45,920
do flip a biased coin, where
the coin is biased towards 0.9.

847
00:31:45,920 --> 00:31:48,500
And you can see that, after
I repeated that 1,000 times,

848
00:31:48,500 --> 00:31:52,700
I get, you know, it's
approximately 90% heads,

849
00:31:52,700 --> 00:31:54,432
or true, and about 10% tails.

850
00:31:54,432 --> 00:31:56,390
AUDIENCE: Why did you
make the lambda in there?

851
00:31:56,390 --> 00:31:58,730
TOMER ULLMAN: Ah,
perfect, I'm glad somebody

852
00:31:58,730 --> 00:31:59,840
has asked that question.

853
00:31:59,840 --> 00:32:03,050
So if I were just to do
the following-- suppose

854
00:32:03,050 --> 00:32:09,140
that I were just do
repeat flip 0.9 like that,

855
00:32:09,140 --> 00:32:10,490
think about what would happen.

856
00:32:10,490 --> 00:32:13,910
What would happen is, I would
first evaluate flip 0.9.

857
00:32:13,910 --> 00:32:17,370
OK, that would give me back a
value, either true or false.

858
00:32:17,370 --> 00:32:20,320
And then this would say,
repeat that 1,000 times.

859
00:32:20,320 --> 00:32:24,050
You would get, like, 1,000
trues, or 1,000 falses,

860
00:32:24,050 --> 00:32:25,480
or whatever it was
that was first.

861
00:32:25,480 --> 00:32:27,470
In fact, it's going to
fail, because repeat

862
00:32:27,470 --> 00:32:28,419
expects a function.

863
00:32:28,419 --> 00:32:30,710
But the point is, the reason
that this is going to fail

864
00:32:30,710 --> 00:32:32,790
is because it wants a
particular function.

865
00:32:32,790 --> 00:32:35,780
This is not a function,
this is a value.

866
00:32:35,780 --> 00:32:36,964
You evaluate this first.

867
00:32:36,964 --> 00:32:38,630
It gives you a value
like true or false.

868
00:32:38,630 --> 00:32:40,520
And then you repeat
that value 1,000 times.

869
00:32:40,520 --> 00:32:41,561
That's not what you want.

870
00:32:41,561 --> 00:32:42,860
What you want is a procedure.

871
00:32:42,860 --> 00:32:46,800
A procedure, or a distribution,
or something like that,

872
00:32:46,800 --> 00:32:49,470
some sort of function
that, when you run it,

873
00:32:49,470 --> 00:32:52,580
you get a biased sample, so
what would that look like?

874
00:32:52,580 --> 00:32:53,800
That would look like this.

875
00:32:53,800 --> 00:32:55,799
It would be-- or I could
do something like this.

876
00:32:55,799 --> 00:32:59,990
Define my-coin weight--

877
00:32:59,990 --> 00:33:03,360
OK, something like this.

878
00:33:03,360 --> 00:33:07,800
And what it does is this.

879
00:33:07,800 --> 00:33:11,430
Now what I've defined is,
I've defined a procedure that

880
00:33:11,430 --> 00:33:13,919
takes in a particular weight.

881
00:33:13,919 --> 00:33:15,460
And what it does is
that it gives you

882
00:33:15,460 --> 00:33:16,980
back a flip on that weight.

883
00:33:16,980 --> 00:33:19,196
AUDIENCE: [INAUDIBLE]

884
00:33:19,196 --> 00:33:21,070
TOMER ULLMAN: Yes,
although you might, again,

885
00:33:21,070 --> 00:33:24,200
run into some problems, but
we can get to that, because--

886
00:33:24,200 --> 00:33:26,550
well, OK.

887
00:33:26,550 --> 00:33:29,992
So let's see--

888
00:33:29,992 --> 00:33:32,860
AUDIENCE: How would define
it as a lambda calculus?

889
00:33:32,860 --> 00:33:35,360
TOMER ULLMAN: OK, so how you
would define it with the lambda

890
00:33:35,360 --> 00:33:39,130
calculus is, you would
say my-coin lambda

891
00:33:39,130 --> 00:33:42,860
weight this thing.

892
00:33:42,860 --> 00:33:45,530
OK, now we're saying, what
sort of thing is coin?

893
00:33:45,530 --> 00:33:46,700
Coin is a procedure.

894
00:33:46,700 --> 00:33:48,033
How do we know it's a procedure?

895
00:33:48,033 --> 00:33:49,934
Because we have this
lambda right here.

896
00:33:49,934 --> 00:33:51,350
How many arguments
does it expect?

897
00:33:51,350 --> 00:33:52,790
One, it's called weight.

898
00:33:52,790 --> 00:33:53,825
What does it do?

899
00:33:53,825 --> 00:33:55,140
It flips a coin.

900
00:33:55,140 --> 00:33:56,545
It gives you back that sample.

901
00:33:56,545 --> 00:33:57,410
AUDIENCE: Can I do--

902
00:33:57,410 --> 00:33:59,118
TOMER ULLMAN: The
equivalent way of doing

903
00:33:59,118 --> 00:34:02,300
that is by writing this
thing without any lambdas.

904
00:34:02,300 --> 00:34:07,400
You would just write
define my-coin--

905
00:34:07,400 --> 00:34:09,199
notice the brackets
there, right?

906
00:34:09,199 --> 00:34:11,840
Before we didn't have brackets
around that-- define my-coin

907
00:34:11,840 --> 00:34:15,409
weight flip weight, like that.

908
00:34:15,409 --> 00:34:17,810
And now you're sort of saying,
like, this is a procedure.

909
00:34:17,810 --> 00:34:19,310
You should know it's a
procedure, because it's

910
00:34:19,310 --> 00:34:21,643
the first thing that you're
hitting after define because

911
00:34:21,643 --> 00:34:23,022
of the parentheses.

912
00:34:23,022 --> 00:34:24,230
What sort of procedure is it?

913
00:34:24,230 --> 00:34:25,070
It's called my-coin.

914
00:34:25,070 --> 00:34:26,785
It takes in weight.

915
00:34:26,785 --> 00:34:28,089
Again, these are equivalent.

916
00:34:28,089 --> 00:34:30,380
And to answer Nori's question
about how would I just do

917
00:34:30,380 --> 00:34:32,110
that without having
to define things,

918
00:34:32,110 --> 00:34:35,550
I would say something
like, hist repeat 1,000.

919
00:34:35,550 --> 00:34:36,800
Now, what do I want to repeat?

920
00:34:36,800 --> 00:34:40,370
I want to repeat some sort of
procedure that samples things.

921
00:34:40,370 --> 00:34:41,969
So it's-- I'll call it lambda.

922
00:34:41,969 --> 00:34:43,070
It's an empty lambda.

923
00:34:43,070 --> 00:34:44,480
It doesn't take
in any arguments.

924
00:34:44,480 --> 00:34:46,429
It's just the procedure.

925
00:34:46,429 --> 00:34:51,889
And what it does is,
it flips a coin 0.9.

926
00:34:51,889 --> 00:34:54,219
And if I run that,
I'll get that.

927
00:34:54,219 --> 00:34:56,434
OK, yes, no?

928
00:34:56,434 --> 00:34:58,290
OK, good.

929
00:34:58,290 --> 00:35:02,121
OK, so let's see, there
are many other primitives

930
00:35:02,121 --> 00:35:02,995
that we could get to.

931
00:35:02,995 --> 00:35:04,600
There is uniform-draw.

932
00:35:04,600 --> 00:35:06,640
You can look at this
online, but there's--

933
00:35:06,640 --> 00:35:08,139
the basic primitives
are things like

934
00:35:08,139 --> 00:35:10,710
multinomial, uniform, random
integer, beta, Dirichlet,

935
00:35:10,710 --> 00:35:13,040
there's also the Chinese
restaurant process.

936
00:35:13,040 --> 00:35:16,450
So let's see, we can build in
our own little distribution.

937
00:35:16,450 --> 00:35:17,860
OK, let's try doing that.

938
00:35:17,860 --> 00:35:21,939
So here I've defined something
which, under the hood,

939
00:35:21,939 --> 00:35:23,980
it's actually-- it's an
interesting distribution.

940
00:35:23,980 --> 00:35:25,000
You all probably know it.

941
00:35:25,000 --> 00:35:26,416
But the way I'm
going to define it

942
00:35:26,416 --> 00:35:31,060
is, I'm going to call it
times it counts until heads.

943
00:35:31,060 --> 00:35:33,490
This is a procedure that's
going to flip a coin.

944
00:35:33,490 --> 00:35:35,870
And if it comes up--

945
00:35:35,870 --> 00:35:38,200
it's going to flip a coin
with a particular weight.

946
00:35:38,200 --> 00:35:40,570
If it comes up true,
if it comes up heads,

947
00:35:40,570 --> 00:35:41,780
then it's just going to stop.

948
00:35:41,780 --> 00:35:43,420
It's going to give you back 0.

949
00:35:43,420 --> 00:35:46,094
If it doesn't stop, if
it comes back tails,

950
00:35:46,094 --> 00:35:47,260
it's going to tell you that.

951
00:35:47,260 --> 00:35:49,870
It's going to write
down somewhere, like, 1.

952
00:35:49,870 --> 00:35:51,640
And it's going to keep going.

953
00:35:51,640 --> 00:35:55,460
It's going to recurse somehow,
call itself, and then keep

954
00:35:55,460 --> 00:35:55,960
going.

955
00:35:55,960 --> 00:35:59,050
So this is for you, this
is an exercise for you.

956
00:35:59,050 --> 00:36:03,070
You have it under
the files, under 3.4,

957
00:36:03,070 --> 00:36:04,690
build your own distribution.

958
00:36:04,690 --> 00:36:06,370
I've left this open.

959
00:36:06,370 --> 00:36:08,320
Why don't you take two minutes.

960
00:36:08,320 --> 00:36:10,600
We're trying to build a
procedure that gives me

961
00:36:10,600 --> 00:36:13,960
the amount of times that I
need to flip a coin before I

962
00:36:13,960 --> 00:36:15,100
get back heads, OK?

963
00:36:15,100 --> 00:36:16,330
If I take a particular coin--

964
00:36:16,330 --> 00:36:19,090
I guess I don't want
to have one handy--

965
00:36:19,090 --> 00:36:20,230
but I flip a coin.

966
00:36:20,230 --> 00:36:21,910
And I just-- you
know, I flip it.

967
00:36:21,910 --> 00:36:24,670
If it comes back heads, I
write down 0 and I'm done.

968
00:36:24,670 --> 00:36:26,920
If it comes back tails, I'm
going to keep flipping it,

969
00:36:26,920 --> 00:36:28,720
so I flip it again.

970
00:36:28,720 --> 00:36:31,739
And you know, I might flip it
10 times until I get heads,

971
00:36:31,739 --> 00:36:34,030
so the point is that this
procedure will, in that case,

972
00:36:34,030 --> 00:36:34,875
return 10.

973
00:36:34,875 --> 00:36:36,375
That would be one
particular sample.

974
00:36:36,375 --> 00:36:38,916
Now, of course, if I take the
coin again and I flip it again,

975
00:36:38,916 --> 00:36:42,130
sometimes I get 10 times
until heads, sometimes once,

976
00:36:42,130 --> 00:36:45,280
sometimes 5,
sometimes 20, so I'm

977
00:36:45,280 --> 00:36:47,680
going to get a
particular distribution

978
00:36:47,680 --> 00:36:50,770
on the number of times I
need until I hit heads.

979
00:36:50,770 --> 00:36:53,020
And the thing that we're
trying to implement right now

980
00:36:53,020 --> 00:36:54,770
is just a procedure
that, what it does is,

981
00:36:54,770 --> 00:36:57,040
it implements this
counting thing that I just

982
00:36:57,040 --> 00:36:59,834
said by literally
flipping a coin-- well,

983
00:36:59,834 --> 00:37:01,750
I don't know if literally,
but under the hood,

984
00:37:01,750 --> 00:37:03,084
flipping a coin.

985
00:37:03,084 --> 00:37:05,500
If the coin comes back heads,
because this thing evaluates

986
00:37:05,500 --> 00:37:07,150
to true, give back 0.

987
00:37:07,150 --> 00:37:11,200
If it doesn't, give
back plus 1 plus what?

988
00:37:11,200 --> 00:37:12,790
So fill in those
dots-- it shouldn't

989
00:37:12,790 --> 00:37:15,070
be a long expression--
such that you'll get

990
00:37:15,070 --> 00:37:16,490
what I was just talking about.

991
00:37:16,490 --> 00:37:19,120
So, guys, let me tell
you what I was going for.

992
00:37:19,120 --> 00:37:27,590
An int plus 1
countsTillHeads coinweight.

993
00:37:27,590 --> 00:37:31,890
OK, and now if you do
something like countsTillHeads,

994
00:37:31,890 --> 00:37:35,940
I don't know, 0.1 or something
like that, and you run it.

995
00:37:35,940 --> 00:37:37,830
And it gets saved--

996
00:37:37,830 --> 00:37:40,260
so let's read through
this for a second.

997
00:37:40,260 --> 00:37:42,304
What happens is, you
defined a procedure.

998
00:37:42,304 --> 00:37:43,470
It's called countsTillHeads.

999
00:37:43,470 --> 00:37:45,390
It takes in a coin weight.

1000
00:37:45,390 --> 00:37:47,110
It flips a coin.

1001
00:37:47,110 --> 00:37:49,440
If it comes back head,
it gives you back 0.

1002
00:37:49,440 --> 00:37:51,780
If it didn't come back heads,
then you just do plus 1.

1003
00:37:51,780 --> 00:37:53,760
And then you just
call that thing again.

1004
00:37:53,760 --> 00:37:57,750
You do countTillHeads
coinweight again and again.

1005
00:37:57,750 --> 00:38:02,790
If it comes back 0, then this
time, you'll have plus 1 plus 0

1006
00:38:02,790 --> 00:38:05,880
if it came back heads in here.

1007
00:38:05,880 --> 00:38:09,550
But if it didn't, then this
will be plus 1 plus something.

1008
00:38:09,550 --> 00:38:11,400
In effect, what
we've defined here--

1009
00:38:11,400 --> 00:38:12,630
those of you that have
defined it, and if not,

1010
00:38:12,630 --> 00:38:13,421
just look at this--

1011
00:38:13,421 --> 00:38:16,080
what you've defined here
is sort of a procedure that

1012
00:38:16,080 --> 00:38:18,330
might give us back
infinity in some way,

1013
00:38:18,330 --> 00:38:21,240
except it's becoming
extremely unlikely to do so

1014
00:38:21,240 --> 00:38:23,560
with each particular
flip of the coin.

1015
00:38:23,560 --> 00:38:25,230
Now, I run it once with 0.1.

1016
00:38:25,230 --> 00:38:26,190
I get 15.

1017
00:38:26,190 --> 00:38:28,824
I can run it again and
I'll get, you know, 8.

1018
00:38:28,824 --> 00:38:30,240
That just means
that, on that run,

1019
00:38:30,240 --> 00:38:32,910
I flipped it eight times
before I got heads.

1020
00:38:32,910 --> 00:38:35,400
And again, I can do this
many, many different times.

1021
00:38:35,400 --> 00:38:41,880
Like, I can do hist repeat
1,000 and then this thing,

1022
00:38:41,880 --> 00:38:44,610
some empty procedure
that does that.

1023
00:38:44,610 --> 00:38:49,620
And what you gets is
this, which, in case it

1024
00:38:49,620 --> 00:38:51,205
doesn't look
familiar-- sorry, it's

1025
00:38:51,205 --> 00:38:52,830
just the way these
things usually look.

1026
00:38:52,830 --> 00:38:54,630
This is sort of flipping
the x- and y-axis.

1027
00:38:54,630 --> 00:38:56,827
But the point is,
how many times did I

1028
00:38:56,827 --> 00:38:59,160
have to flip it to get, you
know-- how many times did it

1029
00:38:59,160 --> 00:38:59,659
happen?

1030
00:38:59,659 --> 00:39:02,550
Did I flip it three times,
or one, or two, three times?

1031
00:39:02,550 --> 00:39:04,950
That's about 24%.

1032
00:39:04,950 --> 00:39:07,139
And it sort of goes
down, and down, and down,

1033
00:39:07,139 --> 00:39:09,180
because it becomes much,
much, much more unlikely

1034
00:39:09,180 --> 00:39:11,886
that I'll flip it 40
times until I get heads.

1035
00:39:11,886 --> 00:39:14,010
It could be that I'll keep
flipping it to infinity,

1036
00:39:14,010 --> 00:39:16,410
but it's not going to happen.

1037
00:39:16,410 --> 00:39:18,900
This, in case you didn't
know, falls off geometrically.

1038
00:39:18,900 --> 00:39:21,360
It's the geometric distribution.

1039
00:39:21,360 --> 00:39:24,180
That's a very fundamental,
simple distribution.

1040
00:39:24,180 --> 00:39:25,840
And one way to
write it is to say,

1041
00:39:25,840 --> 00:39:27,870
what's the probability of k?

1042
00:39:27,870 --> 00:39:30,090
The probability of k is--

1043
00:39:30,090 --> 00:39:33,195
let's say, we have a
coin which has the--

1044
00:39:33,195 --> 00:39:36,750
it's probability of
coming up heads is p.

1045
00:39:36,750 --> 00:39:38,370
Then we say the
probability of k is

1046
00:39:38,370 --> 00:39:44,010
p to the k minus 1
times 1 minus p, yes?

1047
00:39:44,010 --> 00:39:48,680
It's I flip the coin 1
minus p times to the k.

1048
00:39:48,680 --> 00:39:51,746
The point is, you can define the
geometric distribution by sort

1049
00:39:51,746 --> 00:39:53,120
of saying, what's
the probability

1050
00:39:53,120 --> 00:39:55,130
of any particular number?

1051
00:39:55,130 --> 00:39:59,072
Or you can define the
procedure for it, OK?

1052
00:39:59,072 --> 00:40:00,530
Instead of writing
down what should

1053
00:40:00,530 --> 00:40:04,580
be the probability of
any particular sequence,

1054
00:40:04,580 --> 00:40:06,980
you can just write down the
procedure that it describes.

1055
00:40:06,980 --> 00:40:07,920
This is the procedure.

1056
00:40:07,920 --> 00:40:09,628
The procedure doesn't
explicitly tell you

1057
00:40:09,628 --> 00:40:11,930
what the distribution
is, it just samples it.

1058
00:40:11,930 --> 00:40:13,910
You've built a procedure
for flipping a coin.

1059
00:40:13,910 --> 00:40:15,993
And if you do it many,
many, many different times,

1060
00:40:15,993 --> 00:40:19,070
what you'll get is the
geometric distribution.

1061
00:40:19,070 --> 00:40:22,610
This is will approach the
geometric distribution.

1062
00:40:22,610 --> 00:40:25,080
I can probably also do
density, and then it'll

1063
00:40:25,080 --> 00:40:26,510
show you it like that.

1064
00:40:28,912 --> 00:40:31,120
So that's what I was talking
about before with, like,

1065
00:40:31,120 --> 00:40:32,350
trying to wrap your
head around something

1066
00:40:32,350 --> 00:40:34,641
like the equivalence between
a probability distribution

1067
00:40:34,641 --> 00:40:37,540
that you can write down in math
or as an analytical expression

1068
00:40:37,540 --> 00:40:40,390
and writing down the equivalent
procedure for generating

1069
00:40:40,390 --> 00:40:42,527
that probability distribution.

1070
00:40:42,527 --> 00:40:44,860
Let's move on to something a
little bit more interesting

1071
00:40:44,860 --> 00:40:46,992
like Gaussian sampling.

1072
00:40:46,992 --> 00:40:48,700
If you're not with
us, you can look at it

1073
00:40:48,700 --> 00:40:51,640
in 3.5, Gaussian Samples.

1074
00:40:51,640 --> 00:40:53,770
What I've done here
is, basically, I'm

1075
00:40:53,770 --> 00:40:56,399
defining a particular center.

1076
00:40:56,399 --> 00:40:57,940
Let's walk through
this for a second.

1077
00:40:57,940 --> 00:41:00,340
I'm defining a
two-dimensional Gaussian.

1078
00:41:00,340 --> 00:41:02,470
What it does is, it takes
a particular center.

1079
00:41:02,470 --> 00:41:05,050
A center is just an x-y point.

1080
00:41:05,050 --> 00:41:10,040
And it does, you know,
Gaussian around the first one.

1081
00:41:10,040 --> 00:41:12,400
I'm trying to define a
two-dimensional Gaussian.

1082
00:41:12,400 --> 00:41:14,140
The way I do it
is, I take a point

1083
00:41:14,140 --> 00:41:17,350
around-- a one-dimensional
Gaussian around this point.

1084
00:41:17,350 --> 00:41:19,120
And I take a
one-dimensional Gaussian

1085
00:41:19,120 --> 00:41:20,320
around the second point.

1086
00:41:20,320 --> 00:41:21,492
And then I just draw it.

1087
00:41:21,492 --> 00:41:23,950
So in this particular case,
I'm going to define my Gaussian

1088
00:41:23,950 --> 00:41:26,200
center as 3, 2.

1089
00:41:26,200 --> 00:41:29,710
OK, I'm going to take it
x equals 3, y equals 2.

1090
00:41:29,710 --> 00:41:32,830
And I want to sample a
Gaussian around 3, 2.

1091
00:41:32,830 --> 00:41:37,070
So I'm going to sample
of Gaussian around 3

1092
00:41:37,070 --> 00:41:39,030
and a Gaussian around 2.

1093
00:41:39,030 --> 00:41:40,960
And I'm going to
give you that back.

1094
00:41:40,960 --> 00:41:44,530
And if I repeat this 1,000
times, then-- and I scatter it,

1095
00:41:44,530 --> 00:41:47,830
I'll end up with a plot
that looks a bit like this.

1096
00:41:47,830 --> 00:41:51,340
And you can see on
the x-axis, this is 3.

1097
00:41:51,340 --> 00:41:52,360
And this is 2.

1098
00:41:52,360 --> 00:41:54,790
And it's basically a
Gaussian with sampling points

1099
00:41:54,790 --> 00:41:57,250
from around this thing,
another forward procedure

1100
00:41:57,250 --> 00:41:59,060
that I can sample.

1101
00:41:59,060 --> 00:42:01,350
OK, is everyone more or
less on board with this?

1102
00:42:01,350 --> 00:42:04,610
Let's take two seconds
to read this again.

1103
00:42:04,610 --> 00:42:08,640
A basic procedure in
Church is Gaussian.

1104
00:42:08,640 --> 00:42:09,980
What I do is I basically--

1105
00:42:09,980 --> 00:42:13,100
I try to call Gaussian
on some number.

1106
00:42:13,100 --> 00:42:16,280
Gaussian takes in two arguments.

1107
00:42:16,280 --> 00:42:20,387
Gaussian takes in a
mean and a variance.

1108
00:42:20,387 --> 00:42:22,220
In particular, I'm going
to take a Gaussian.

1109
00:42:22,220 --> 00:42:26,930
And its mean is going to be
the first argument of center.

1110
00:42:26,930 --> 00:42:28,970
Its variance it's going to be 1.

1111
00:42:28,970 --> 00:42:30,620
I'm going to take
a Gaussian sampled

1112
00:42:30,620 --> 00:42:35,250
from the second argument,
the y, and a variance of 1.

1113
00:42:35,250 --> 00:42:37,920
And then I'm going to just
give you back to that point.

1114
00:42:37,920 --> 00:42:42,240
So this is a procedure that
takes in a center point.

1115
00:42:42,240 --> 00:42:44,940
And each time you sample it,
it will give you a sample

1116
00:42:44,940 --> 00:42:48,420
from around the mean 3, 2.

1117
00:42:48,420 --> 00:42:49,884
And if I run that--

1118
00:42:49,884 --> 00:42:51,550
so now I've defined
a particular center.

1119
00:42:51,550 --> 00:42:52,910
You know, I've defined it 3, 2.

1120
00:42:52,910 --> 00:42:55,114
I could have done many
other different things.

1121
00:42:55,114 --> 00:42:56,280
And I repeat that 100 times.

1122
00:42:56,280 --> 00:43:03,416
I've basically drawn a sample
from something around 3, 2.

1123
00:43:03,416 --> 00:43:05,790
This can quickly get more
interesting if you do something

1124
00:43:05,790 --> 00:43:07,189
like a mixture of Gaussians.

1125
00:43:07,189 --> 00:43:09,480
So a Gaussian mixture model
is usually just saying, OK,

1126
00:43:09,480 --> 00:43:10,680
I have some particular space.

1127
00:43:10,680 --> 00:43:12,638
And I'm trying to figure
out how many Gaussians

1128
00:43:12,638 --> 00:43:15,600
are in this scene, so let's
write down the forward model

1129
00:43:15,600 --> 00:43:16,360
for that thing.

1130
00:43:16,360 --> 00:43:18,270
What's the forward model
for a mixture model?

1131
00:43:18,270 --> 00:43:20,310
The forward model saying,
I'm going to draw out

1132
00:43:20,310 --> 00:43:22,080
some number of Gaussians.

1133
00:43:22,080 --> 00:43:23,580
I don't know how many.

1134
00:43:23,580 --> 00:43:25,050
And I don't
necessarily know what

1135
00:43:25,050 --> 00:43:27,019
their center point is, right?

1136
00:43:27,019 --> 00:43:28,560
And from each one
of these, I'm going

1137
00:43:28,560 --> 00:43:29,857
to draw some number of samples.

1138
00:43:29,857 --> 00:43:32,190
Does everyone understand,
more or less, that description

1139
00:43:32,190 --> 00:43:33,300
that I just gave?

1140
00:43:33,300 --> 00:43:34,780
We're going to write it out now.

1141
00:43:34,780 --> 00:43:35,940
But the point is,
the generative model

1142
00:43:35,940 --> 00:43:37,530
in your head for a
mixture of Gaussians

1143
00:43:37,530 --> 00:43:39,450
should be, there are
some number of Gaussians.

1144
00:43:39,450 --> 00:43:40,509
I don't know what it is.

1145
00:43:40,509 --> 00:43:42,300
Each one of them is
centered on some point.

1146
00:43:42,300 --> 00:43:43,440
I don't know what it is.

1147
00:43:43,440 --> 00:43:46,080
Let's say I know the
variance just for simplicity,

1148
00:43:46,080 --> 00:43:49,320
but I could obviously
put a prior on that.

1149
00:43:49,320 --> 00:43:51,030
And then I just
sample from that.

1150
00:43:51,030 --> 00:43:53,100
And I'll get some distribution.

1151
00:43:53,100 --> 00:43:54,965
And then you could use--
we'll later on see,

1152
00:43:54,965 --> 00:43:56,590
once you write down
that forward model,

1153
00:43:56,590 --> 00:43:58,940
it's pretty simple to then
just invert it and say,

1154
00:43:58,940 --> 00:44:00,960
OK, I see some number of points.

1155
00:44:00,960 --> 00:44:02,730
How many Gaussians
are there actually?

1156
00:44:02,730 --> 00:44:05,610
But let's write down
the forward model.

1157
00:44:05,610 --> 00:44:08,610
So I have already done
this ahead of time.

1158
00:44:08,610 --> 00:44:13,300
And I'll do it here.

1159
00:44:13,300 --> 00:44:16,470
So what I've done here,
minus the typo, thanks,

1160
00:44:16,470 --> 00:44:20,096
is to say something like, I
want a sample of Gaussian center

1161
00:44:20,096 --> 00:44:21,720
where I don't know
where it is, but I'm

1162
00:44:21,720 --> 00:44:24,360
going to say that it's in this
two-dimensional space between 0

1163
00:44:24,360 --> 00:44:29,882
and 10, a box that's
10 wide and 10 tall.

1164
00:44:29,882 --> 00:44:32,340
So for each new Gaussian, I
don't know where its center is,

1165
00:44:32,340 --> 00:44:34,131
but I'm assuming it's
somewhere in this box

1166
00:44:34,131 --> 00:44:35,280
that we're looking at.

1167
00:44:35,280 --> 00:44:37,390
And the way I do
that is, I say, OK, I

1168
00:44:37,390 --> 00:44:38,940
define some sort of procedure.

1169
00:44:38,940 --> 00:44:40,830
Each time you evaluate
this procedure, what

1170
00:44:40,830 --> 00:44:43,205
it's going to give
you back is a pair,

1171
00:44:43,205 --> 00:44:44,580
where the first
thing in the pair

1172
00:44:44,580 --> 00:44:47,890
is a uniform between 0 and 10,
the second thing in the pair

1173
00:44:47,890 --> 00:44:49,740
is a uniform between 0 and 10.

1174
00:44:49,740 --> 00:44:51,930
If all you were to do are
to sample Gaussian center,

1175
00:44:51,930 --> 00:44:57,300
you would get back some number
uniformly-distributed in the 10

1176
00:44:57,300 --> 00:45:00,960
box, where the first
one is, let's say,

1177
00:45:00,960 --> 00:45:03,649
x, and the second one is y.

1178
00:45:03,649 --> 00:45:05,190
And the next thing
I do is, let's say

1179
00:45:05,190 --> 00:45:08,550
I want to define some
number of Gaussians

1180
00:45:08,550 --> 00:45:12,230
and I don't know
how many there are.

1181
00:45:12,230 --> 00:45:17,120
Let's say, for
example, that I want

1182
00:45:17,120 --> 00:45:21,560
to put some sort of ignorance
prior on Gaussians between--

1183
00:45:21,560 --> 00:45:24,440
there might be one, there might
be two, there might be 10.

1184
00:45:24,440 --> 00:45:28,282
Let's say I stop it at 10
or something like that.

1185
00:45:28,282 --> 00:45:30,740
So in this case, I just say,
sample the number of Gaussians

1186
00:45:30,740 --> 00:45:33,200
from something like random
integer 10, since this

1187
00:45:33,200 --> 00:45:34,700
goes to 0, and you
don't want 0, I'm

1188
00:45:34,700 --> 00:45:38,430
just adding the number 1 here.

1189
00:45:38,430 --> 00:45:41,267
But what I also could
have done, and I

1190
00:45:41,267 --> 00:45:43,100
think I was going to
do this is an exercise,

1191
00:45:43,100 --> 00:45:45,266
but since we want to get
to physics, and psychology,

1192
00:45:45,266 --> 00:45:47,840
and some more interesting stuff,
what I could have done here

1193
00:45:47,840 --> 00:45:50,060
is define number of Gaussians--

1194
00:45:50,060 --> 00:45:52,250
suppose I wanted to
put a prior on there

1195
00:45:52,250 --> 00:45:54,710
being potentially an
infinite number of Gaussian,

1196
00:45:54,710 --> 00:45:56,134
what would I do?

1197
00:45:56,134 --> 00:45:58,055
AUDIENCE: Dirichlet.

1198
00:45:58,055 --> 00:45:59,430
TOMER ULLMAN: A
Dirichlet, right?

1199
00:45:59,430 --> 00:46:01,471
Or what else can I do that
we've already learned?

1200
00:46:05,155 --> 00:46:06,530
We could do the
geometric, right?

1201
00:46:06,530 --> 00:46:08,321
We just defined the
geometric a second ago.

1202
00:46:08,321 --> 00:46:10,310
The geometric gives
us a probability

1203
00:46:10,310 --> 00:46:13,284
on numbers basically
going from 0 to infinity.

1204
00:46:13,284 --> 00:46:15,200
And it dies off very
quickly, so this gives us

1205
00:46:15,200 --> 00:46:17,402
sort of a natural prior
of some sort to say,

1206
00:46:17,402 --> 00:46:19,610
I think that there are some
number of Gaussians here.

1207
00:46:19,610 --> 00:46:21,290
I don't know what it is.

1208
00:46:21,290 --> 00:46:22,880
I'm pretty sure it dies off.

1209
00:46:22,880 --> 00:46:25,790
Like, I don't think 100 is
as equally likely as 10.

1210
00:46:25,790 --> 00:46:27,830
I don't think 10 is as
equally likely as 1.

1211
00:46:27,830 --> 00:46:29,990
So I could have said,
define number of Gaussians,

1212
00:46:29,990 --> 00:46:31,340
just draw from geometric.

1213
00:46:31,340 --> 00:46:33,830
And then I would have gotten
some number, potentially

1214
00:46:33,830 --> 00:46:34,700
infinite.

1215
00:46:34,700 --> 00:46:38,270
You've just defined an infinite
Gaussian mixture model.

1216
00:46:38,270 --> 00:46:40,610
And then I draw some
number of centers

1217
00:46:40,610 --> 00:46:44,090
by basically repeating
this procedure.

1218
00:46:44,090 --> 00:46:46,520
I sample the Gaussians.

1219
00:46:46,520 --> 00:46:48,732
And then I scatter the points.

1220
00:46:48,732 --> 00:46:50,690
Let's see, and then you
can look at the points.

1221
00:46:50,690 --> 00:46:51,981
And this is a fun game to play.

1222
00:46:51,981 --> 00:46:55,080
It's basically recapturing a
bit of what Josh said before,

1223
00:46:55,080 --> 00:46:57,950
which is to say,
how many Gaussians

1224
00:46:57,950 --> 00:46:59,242
do you think are in this image?

1225
00:46:59,242 --> 00:47:01,033
And you can sort of
play that with yourself

1226
00:47:01,033 --> 00:47:02,150
to get a sense of it.

1227
00:47:02,150 --> 00:47:03,816
You know, you've
defined some procedure.

1228
00:47:03,816 --> 00:47:06,237
You don't know how many
Gaussians you actually created.

1229
00:47:06,237 --> 00:47:07,820
You don't know exactly
where they are,

1230
00:47:07,820 --> 00:47:09,230
but you can run it forward.

1231
00:47:09,230 --> 00:47:11,930
And you can look at it
and say, well, here I

1232
00:47:11,930 --> 00:47:13,334
think it's pretty obvious.

1233
00:47:13,334 --> 00:47:14,750
I think there's
sort of a Gaussian

1234
00:47:14,750 --> 00:47:17,120
here, maybe a Gaussian here.

1235
00:47:17,120 --> 00:47:19,670
So I guess the number
here is 2, but here it's

1236
00:47:19,670 --> 00:47:20,600
a bit less obvious.

1237
00:47:20,600 --> 00:47:25,100
And again, you can
play with this.

1238
00:47:25,100 --> 00:47:27,259
So those of you who've
written this down,

1239
00:47:27,259 --> 00:47:29,050
and assuming you've
done either a Dirichlet

1240
00:47:29,050 --> 00:47:31,780
or a geometric distribution
what you've basically done

1241
00:47:31,780 --> 00:47:36,520
is written down the forward
model for an infinite Gaussian

1242
00:47:36,520 --> 00:47:38,080
mixture model.

1243
00:47:38,080 --> 00:47:41,410
And you did it in, more or
less, five lines of code.

1244
00:47:41,410 --> 00:47:42,006
Yeah?

1245
00:47:42,006 --> 00:47:43,910
AUDIENCE: What is
the fold there?

1246
00:47:46,042 --> 00:47:47,750
TOMER ULLMAN: Where
do you see fold here?

1247
00:47:47,750 --> 00:47:50,580
AUDIENCE: Visualize
scatter fold append

1248
00:47:50,580 --> 00:47:53,680
TOMER ULLMAN: Ah,
yes, so fold is

1249
00:47:53,680 --> 00:47:55,837
another high-level procedure.

1250
00:47:55,837 --> 00:47:58,420
It's not terribly important for
the purposes of this tutorial,

1251
00:47:58,420 --> 00:48:02,390
but what it does is, it
basically takes in a function.

1252
00:48:02,390 --> 00:48:03,880
It takes in a list of stuff.

1253
00:48:03,880 --> 00:48:07,210
And it basically applies
it to the first argument.

1254
00:48:07,210 --> 00:48:09,370
Then it takes it and
applies it to whatever

1255
00:48:09,370 --> 00:48:11,085
the result was plus
the next item--

1256
00:48:11,085 --> 00:48:11,710
AUDIENCE: Plus?

1257
00:48:11,710 --> 00:48:13,202
TOMER ULLMAN: --in the list.

1258
00:48:13,202 --> 00:48:14,570
Well, not exactly plus--

1259
00:48:14,570 --> 00:48:14,830
AUDIENCE: In addition?

1260
00:48:14,830 --> 00:48:16,413
TOMER ULLMAN: --but,
yes, in addition,

1261
00:48:16,413 --> 00:48:17,980
so you can have
a fold which has,

1262
00:48:17,980 --> 00:48:19,720
for example, two arguments.

1263
00:48:19,720 --> 00:48:22,100
And what it does
is it multiplies.

1264
00:48:22,100 --> 00:48:23,350
So then you would take a list.

1265
00:48:23,350 --> 00:48:25,420
And you would basically do--

1266
00:48:25,420 --> 00:48:26,980
or rather, what is sum.

1267
00:48:26,980 --> 00:48:30,980
what some is basically is
a fold of plus over a list,

1268
00:48:30,980 --> 00:48:32,667
because it takes
the first number,

1269
00:48:32,667 --> 00:48:34,750
sums it up with the second
one, takes that result,

1270
00:48:34,750 --> 00:48:35,980
sums it up with a third one--

1271
00:48:35,980 --> 00:48:37,650
AUDIENCE: [INAUDIBLE]

1272
00:48:37,650 --> 00:48:39,370
TOMER ULLMAN: Fold
needs three arguments.

1273
00:48:39,370 --> 00:48:43,060
Fold needs a particular--
well, it needs the function

1274
00:48:43,060 --> 00:48:44,410
that you're going to apply.

1275
00:48:44,410 --> 00:48:47,560
It needs a starting
point to start from.

1276
00:48:47,560 --> 00:48:50,490
And it needs a lot that
it's going to work on,

1277
00:48:50,490 --> 00:48:53,410
again, not terribly
important for--

1278
00:48:53,410 --> 00:48:54,852
AUDIENCE: So why do this?

1279
00:48:54,852 --> 00:48:56,560
TOMER ULLMAN: So in
this particular case,

1280
00:48:56,560 --> 00:48:59,590
what I'm trying to
do in the background

1281
00:48:59,590 --> 00:49:01,810
is, I'm going to get
a lot of Gaussians.

1282
00:49:01,810 --> 00:49:02,860
I don't know how many.

1283
00:49:02,860 --> 00:49:05,140
I'm going to get
basically a list of lists.

1284
00:49:05,140 --> 00:49:06,520
It could be one.

1285
00:49:06,520 --> 00:49:07,480
It could be three.

1286
00:49:07,480 --> 00:49:08,724
It could be 10.

1287
00:49:08,724 --> 00:49:11,140
Each one of them is going to
define some number of points.

1288
00:49:11,140 --> 00:49:12,700
And I just want to scatter them.

1289
00:49:12,700 --> 00:49:15,749
But scatter works by
taking in one list,

1290
00:49:15,749 --> 00:49:17,540
so it's basically just
a way of collapsing.

1291
00:49:17,540 --> 00:49:19,498
Say I have three, or 10,
I don't know how many.

1292
00:49:19,498 --> 00:49:21,340
I'm trying to collapse
some number of lists

1293
00:49:21,340 --> 00:49:23,410
into a single list.

1294
00:49:23,410 --> 00:49:26,080
We've defined some
number of Gaussians.

1295
00:49:26,080 --> 00:49:27,850
This is a London Blitz example.

1296
00:49:27,850 --> 00:49:30,070
Josh was talking about
this a little bit.

1297
00:49:30,070 --> 00:49:32,620
Those of you who want to,
sort of, jump back in again,

1298
00:49:32,620 --> 00:49:38,590
you can go to 3.5.2 in
the student document.

1299
00:49:38,590 --> 00:49:41,272
You can copy and whatever
is under that and paste it.

1300
00:49:41,272 --> 00:49:43,230
And let's talk about that
example for a second.

1301
00:49:45,880 --> 00:49:49,720
What this thing is doing is,
it's sort of Josh's example--

1302
00:49:49,720 --> 00:49:52,390
do you remember his example
of, we have some sort of grid.

1303
00:49:52,390 --> 00:49:53,920
And we're trying
to say, is there

1304
00:49:53,920 --> 00:49:58,660
a suspicious cluster
somewhere, a disease cluster?

1305
00:49:58,660 --> 00:49:59,410
We have some dots.

1306
00:49:59,410 --> 00:50:01,430
And we're trying to
figure out is there

1307
00:50:01,430 --> 00:50:02,530
something going on here?

1308
00:50:02,530 --> 00:50:04,571
You know, there's sort of
a faulty, I don't know,

1309
00:50:04,571 --> 00:50:07,140
whatever, asbestos or
something like that.

1310
00:50:07,140 --> 00:50:08,390
And I want to figure that out.

1311
00:50:08,390 --> 00:50:10,897
So what you're going to
get is sort of a 2D map.

1312
00:50:10,897 --> 00:50:12,730
You're going to get
some dots from that map.

1313
00:50:12,730 --> 00:50:15,430
And you're trying to figure
out-- your hypothesis is either

1314
00:50:15,430 --> 00:50:17,110
this is sort of
randomly-distributed,

1315
00:50:17,110 --> 00:50:21,730
it's a uniform, or there's
some sort of center here.

1316
00:50:21,730 --> 00:50:24,190
So how do we write down the
forward model for something

1317
00:50:24,190 --> 00:50:25,490
like that?

1318
00:50:25,490 --> 00:50:26,800
We would write down either--

1319
00:50:26,800 --> 00:50:27,970
the particular
example, I'm doing

1320
00:50:27,970 --> 00:50:29,980
here is another example
that Tom Griffiths did,

1321
00:50:29,980 --> 00:50:34,090
which is, during the Blitz,
during the London bombing--

1322
00:50:34,090 --> 00:50:38,030
this is actually a very old
example of finding patterns.

1323
00:50:38,030 --> 00:50:40,450
Some of the British,
the people of London,

1324
00:50:40,450 --> 00:50:43,270
were convinced that there
were spies in London that

1325
00:50:43,270 --> 00:50:46,355
were telling the Germans where
to bomb during the Blitz.

1326
00:50:46,355 --> 00:50:47,980
And the way that they
reasoned this is,

1327
00:50:47,980 --> 00:50:50,380
they looked at the
pattern of bombings.

1328
00:50:50,380 --> 00:50:52,540
And they said, there's no
way that this is random.

1329
00:50:52,540 --> 00:50:54,820
They just looked at,
like, dots on a map.

1330
00:50:54,820 --> 00:50:56,830
And to them, it looked
a bit like Gaussians,

1331
00:50:56,830 --> 00:50:58,210
or things like that.

1332
00:50:58,210 --> 00:51:00,301
They were working from,
sort of, few examples.

1333
00:51:00,301 --> 00:51:02,800
When you look at, there's, sort
of, these nice web-- "nice,"

1334
00:51:02,800 --> 00:51:05,258
I don't know if it's nice--
but there's these websites that

1335
00:51:05,258 --> 00:51:08,980
show you the entire
Blitz from when

1336
00:51:08,980 --> 00:51:10,300
it started to when it ended.

1337
00:51:10,300 --> 00:51:13,230
And it's basically a
random distribution.

1338
00:51:13,230 --> 00:51:14,860
If you run statistical
tests on it,

1339
00:51:14,860 --> 00:51:17,230
it's no different from
a random distribution.

1340
00:51:17,230 --> 00:51:19,630
How would you run
such a test on it?

1341
00:51:19,630 --> 00:51:21,420
What you would do,
for example, is

1342
00:51:21,420 --> 00:51:24,340
you would write a forward model
that says it's either random,

1343
00:51:24,340 --> 00:51:25,690
uniform, or it's not.

1344
00:51:25,690 --> 00:51:28,510
Now, tell me which
one is more likely.

1345
00:51:28,510 --> 00:51:30,590
And that's what people
have, kind of, done.

1346
00:51:30,590 --> 00:51:32,436
That's a nice data set
to play around with.

1347
00:51:32,436 --> 00:51:34,060
The way that we've
written it over here

1348
00:51:34,060 --> 00:51:35,860
is to say, look, we
have two options.

1349
00:51:35,860 --> 00:51:41,500
Either it's a uniform bombing
or it's some targeted bombing.

1350
00:51:41,500 --> 00:51:43,630
The uniform bombing is
basically going to give us

1351
00:51:43,630 --> 00:51:46,810
just some point between 0--

1352
00:51:46,810 --> 00:51:49,360
between this box of 0 to
10, just this thing that we

1353
00:51:49,360 --> 00:51:50,620
were talking about before.

1354
00:51:50,620 --> 00:51:52,960
It's going to sample
uniformly from this box.

1355
00:51:52,960 --> 00:51:55,754
The targeted bombing is going
to sample some Gaussians,

1356
00:51:55,754 --> 00:51:56,920
just like we defined before.

1357
00:51:56,920 --> 00:51:57,920
You don't know how many.

1358
00:51:57,920 --> 00:51:59,980
You don't know
where the center is.

1359
00:51:59,980 --> 00:52:03,430
And it's going to then
sample from those Gaussians.

1360
00:52:03,430 --> 00:52:05,817
And it's going to give you
back some sort of scatter.

1361
00:52:05,817 --> 00:52:07,400
And you're basically
going to say, OK,

1362
00:52:07,400 --> 00:52:09,880
I don't know if it's
random, uniform,

1363
00:52:09,880 --> 00:52:12,219
or if there's some targeted
bombing going on here,

1364
00:52:12,219 --> 00:52:14,260
so I'm going to place,
basically, some inference.

1365
00:52:14,260 --> 00:52:15,730
I'm going to flip a coin.

1366
00:52:15,730 --> 00:52:17,980
If it comes up heads, I'm
going to do uniform bombing.

1367
00:52:17,980 --> 00:52:21,132
If it comes up tails, I'm
going to do targeted bombing.

1368
00:52:21,132 --> 00:52:23,090
And then you could look
at something like this.

1369
00:52:23,090 --> 00:52:25,950
And you can say,
well, I don't know.

1370
00:52:25,950 --> 00:52:27,020
That's kind of odd.

1371
00:52:27,020 --> 00:52:29,980
I mean, it doesn't exactly
look like a uniform bombing.

1372
00:52:29,980 --> 00:52:33,836
There's all this missing
empty space over here, right?

1373
00:52:33,836 --> 00:52:35,960
It doesn't exactly look
like one particular target.

1374
00:52:35,960 --> 00:52:37,840
And again, you can
sort of play with this.

1375
00:52:37,840 --> 00:52:40,160
And we'll get into the inference
about how to invert this thing.

1376
00:52:40,160 --> 00:52:42,326
But just as a forward model,
you can play with this,

1377
00:52:42,326 --> 00:52:46,028
run it forward, and try
to see if you can guess.