1
00:00:00,000 --> 00:00:02,430
The following content is
provided under a Creative

2
00:00:02,430 --> 00:00:03,880
Commons license.

3
00:00:03,880 --> 00:00:06,870
Your support will help MIT
OpenCourseWare continue to

4
00:00:06,870 --> 00:00:10,590
offer high quality educational
resource for free.

5
00:00:10,590 --> 00:00:14,115
To make a donation you or view
additional materials from

6
00:00:14,115 --> 00:00:16,360
hundreds of MIT courses, visit
mitopencourseware@ocw.mit.edu.

7
00:00:21,330 --> 00:00:23,380
PROFESSOR: So let's get started
with the second

8
00:00:23,380 --> 00:00:25,640
lecture for today.

9
00:00:25,640 --> 00:00:30,210
So I guess one thing multicores
did, is really

10
00:00:30,210 --> 00:00:33,910
shatter this nice view of
writing in your programs and

11
00:00:33,910 --> 00:00:37,040
hardwares to take care of,
giving you performance.

12
00:00:37,040 --> 00:00:41,690
So hardware just kind of
completely gave that up.

13
00:00:41,690 --> 00:00:46,010
But so what you're doing in this
class, is you're trying

14
00:00:46,010 --> 00:00:46,590
to do it by yourself.

15
00:00:46,590 --> 00:00:51,560
Give all the responsibility
back to the program.

16
00:00:51,560 --> 00:00:54,720
And you realize as you go,
it's a much harder job.

17
00:00:54,720 --> 00:00:57,520
I mean, this is not simple
programming.

18
00:00:57,520 --> 00:01:01,860
So you need to have, you don't
have MIT class students on

19
00:01:01,860 --> 00:01:05,625
every company to do this, so we
need to have some kind of

20
00:01:05,625 --> 00:01:05,670
middle ground.

21
00:01:05,670 --> 00:01:10,240
And so some of the stuff we have
been doing is trying to

22
00:01:10,240 --> 00:01:11,650
figure out are there
any middle ground.

23
00:01:11,650 --> 00:01:15,910
Can you actually take some of
that load away from the user

24
00:01:15,910 --> 00:01:18,640
into things like languages
and compilers.

25
00:01:18,640 --> 00:01:22,340
So we will talk about
some of those.

26
00:01:22,340 --> 00:01:25,140
So right now we are kind of
switching from directly doing

27
00:01:25,140 --> 00:01:29,920
what's necessary, to do the
Cell project into going

28
00:01:29,920 --> 00:01:35,107
breadth So this lecture, and
then we will sit back and do a

29
00:01:35,107 --> 00:01:35,367
little bit of debugging and
performance work and that will

30
00:01:35,367 --> 00:01:36,570
be directly helpful.

31
00:01:36,570 --> 00:01:39,736
And then next week we'll have
lots of guest lectures to kind

32
00:01:39,736 --> 00:01:41,770
of give you breadth in there.

33
00:01:41,770 --> 00:01:46,090
So you'll understand no just
Cell programming but parallel

34
00:01:46,090 --> 00:01:49,640
programming and parallel
processing, what the world is

35
00:01:49,640 --> 00:01:50,320
like beyond that.

36
00:01:50,320 --> 00:01:56,300
So today we're going to have
Bill talk about streams.

37
00:01:56,300 --> 00:01:57,210
BILL THIES: OK very good.

38
00:01:57,210 --> 00:01:58,470
So my name is Bill Thies.

39
00:01:58,470 --> 00:02:00,720
I'm a graduate student working
with Saman and Roderick, and

40
00:02:00,720 --> 00:02:01,450
others here.

41
00:02:01,450 --> 00:02:03,710
And I'll talk about the
StreamIt language.

42
00:02:03,710 --> 00:02:06,230
So why do we need a new
programming language?

43
00:02:06,230 --> 00:02:08,490
Well we think that languages
haven't kept up with the

44
00:02:08,490 --> 00:02:09,520
architectures.

45
00:02:09,520 --> 00:02:12,860
So one way to look at this is
that if you look back at

46
00:02:12,860 --> 00:02:17,110
previous languages, look at C
with von-Neumann machine.

47
00:02:17,110 --> 00:02:19,870
Now I grew up in rural
Pennsylvania not too far from

48
00:02:19,870 --> 00:02:20,630
Amish Country.

49
00:02:20,630 --> 00:02:23,640
And so to me these go together
just like a horse and buggy.

50
00:02:23,640 --> 00:02:26,110
OK they're perfectly made
for each other.

51
00:02:26,110 --> 00:02:27,360
They basically go at
the same rate.

52
00:02:27,360 --> 00:02:28,720
Everything is fine.

53
00:02:28,720 --> 00:02:31,310
But the problem is, in comes
the modern architecture.

54
00:02:31,310 --> 00:02:32,360
OK this is an F-16.

55
00:02:32,360 --> 00:02:34,610
you have a lot more that you can
do with it then, then with

56
00:02:34,610 --> 00:02:35,700
the horse and buggy.

57
00:02:35,700 --> 00:02:38,590
So how do you program these
new architectures?

58
00:02:38,590 --> 00:02:41,840
Well architecture makers these
days are basically faced with

59
00:02:41,840 --> 00:02:43,100
a really hard choice.

60
00:02:43,100 --> 00:02:46,060
On the one hand, you could get a
really cool architecture and

61
00:02:46,060 --> 00:02:48,950
develop an ad hoc programming
technique where you're really

62
00:02:48,950 --> 00:02:51,130
just leaving it to the
programmer to do something

63
00:02:51,130 --> 00:02:52,900
complicated to get
performance.

64
00:02:52,900 --> 00:02:54,570
And unfortunately I
think that's the

65
00:02:54,570 --> 00:02:55,100
route that they took.

66
00:02:55,100 --> 00:02:57,490
I mean fortunately for the
industry, but unfortunately

67
00:02:57,490 --> 00:03:00,010
for you, I think that's the
route they took with cell

68
00:03:00,010 --> 00:03:01,700
which means all of you are
going to become basically

69
00:03:01,700 --> 00:03:02,800
fighter pilots.

70
00:03:02,800 --> 00:03:04,160
You have to learn how
to fly the plane.

71
00:03:04,160 --> 00:03:05,100
You have to become an expert.

72
00:03:05,100 --> 00:03:07,140
You're going to become the best
people at programming

73
00:03:07,140 --> 00:03:08,350
these architectures.

74
00:03:08,350 --> 00:03:11,810
And unfortunately the only other
option is to really bend

75
00:03:11,810 --> 00:03:15,010
over backwards to support the
previous era of languages

76
00:03:15,010 --> 00:03:16,670
like C and C .

77
00:03:16,670 --> 00:03:19,500
And you can see what's coming
here, it's just hard to get

78
00:03:19,500 --> 00:03:20,850
off the runway.

79
00:03:20,850 --> 00:03:25,400
So you don't want
this situation.

80
00:03:25,400 --> 00:03:27,640
Out of consideration for whoever
is in the buggy,

81
00:03:27,640 --> 00:03:29,790
hopefully you'll
never take off.

82
00:03:29,790 --> 00:03:33,190
So looking from a more academic
perspective, why do

83
00:03:33,190 --> 00:03:35,060
we need a new language
right now?

84
00:03:35,060 --> 00:03:37,880
So if you look back over the
past 30 years, I know you've

85
00:03:37,880 --> 00:03:39,330
seen this graph before.

86
00:03:39,330 --> 00:03:41,760
We were dealing with just
one core in the machine

87
00:03:41,760 --> 00:03:42,950
for all this time.

88
00:03:42,950 --> 00:03:45,110
And now we have this plethora
of multicores

89
00:03:45,110 --> 00:03:46,610
coming across the board.

90
00:03:46,610 --> 00:03:49,120
So how did we program
these old machines?

91
00:03:49,120 --> 00:03:52,670
Well we had languages like C and
FORTRAN that really have a

92
00:03:52,670 --> 00:03:55,230
lot of nice properties across
these architectures.

93
00:03:55,230 --> 00:03:58,540
So it was portable,
high-performance, composable--

94
00:03:58,540 --> 00:04:00,870
you could have really good
software development--

95
00:04:00,870 --> 00:04:03,180
malleable, maintainable, all the
nice things you'd like to

96
00:04:03,180 --> 00:04:05,250
see from a software engineering
perspective.

97
00:04:05,250 --> 00:04:08,450
And really if you wrote a
program back in 1970, you

98
00:04:08,450 --> 00:04:12,170
could keep it in C and have it
continue to leverage all the

99
00:04:12,170 --> 00:04:14,620
new properties of machines
over the past 30 years.

100
00:04:14,620 --> 00:04:16,760
So just one fine
out of the box.

101
00:04:16,760 --> 00:04:19,620
And looking forward, that's
not going to be true.

102
00:04:19,620 --> 00:04:21,660
So for example, we could
say that C was the

103
00:04:21,660 --> 00:04:23,060
common machine language.

104
00:04:23,060 --> 00:04:25,760
That's what we say for the
past 30 years, was common

105
00:04:25,760 --> 00:04:27,520
across all the machines.

106
00:04:27,520 --> 00:04:29,120
But now looking forward,
that's not

107
00:04:29,120 --> 00:04:30,210
going to be true anymore.

108
00:04:30,210 --> 00:04:32,800
Because you have to program
every core separately.

109
00:04:32,800 --> 00:04:35,270
So what's the common machine
language for multicores?

110
00:04:35,270 --> 00:04:37,240
We really think you need
something where you can write

111
00:04:37,240 --> 00:04:41,370
a program once today, and have
it scale for the next 30 years

112
00:04:41,370 --> 00:04:43,560
without having to modify
the program.

113
00:04:43,560 --> 00:04:45,930
So what kind of language do you
need to get that kind of

114
00:04:45,930 --> 00:04:47,620
performance?

115
00:04:47,620 --> 00:04:49,520
Well let's look a little deeper
into this notion of a

116
00:04:49,520 --> 00:04:51,020
common machine language.

117
00:04:51,020 --> 00:04:53,820
So why did it work so well
for the past 30 years?

118
00:04:53,820 --> 00:04:57,670
Well on uniprocessors, things
like C and FORTRAN fran really

119
00:04:57,670 --> 00:04:59,800
encapsulated the common
properties.

120
00:04:59,800 --> 00:05:02,820
So things like a single flow of
control in the machine, a

121
00:05:02,820 --> 00:05:06,260
single memory image, are both
properties of the language.

122
00:05:06,260 --> 00:05:08,260
But they also hid certain
properties from the

123
00:05:08,260 --> 00:05:09,070
programmer.

124
00:05:09,070 --> 00:05:11,550
So they hid the things that
were different between one

125
00:05:11,550 --> 00:05:12,620
machine and another.

126
00:05:12,620 --> 00:05:16,320
So for example, the register
file, the ISA, the functional

127
00:05:16,320 --> 00:05:17,540
units and so on.

128
00:05:17,540 --> 00:05:20,200
These things could change from
one architecture to another.

129
00:05:20,200 --> 00:05:22,330
And you didn't have to change
your program because those

130
00:05:22,330 --> 00:05:24,820
aspects weren't in the
programming language.

131
00:05:24,820 --> 00:05:27,140
So that's why these languages
were succeeding.

132
00:05:27,140 --> 00:05:30,070
And what do we need to succeed
in the multicore era from a

133
00:05:30,070 --> 00:05:31,580
language perspective?

134
00:05:31,580 --> 00:05:34,370
Well you need to encapsulate the
common properties again.

135
00:05:34,370 --> 00:05:37,830
And this time it's multiple
flows of control that you have

136
00:05:37,830 --> 00:05:40,860
for all the different cores, and
multiple local memories.

137
00:05:40,860 --> 00:05:43,710
There's no more monolithic
memory anymore, that everyone

138
00:05:43,710 --> 00:05:46,370
can read and write to.

139
00:05:46,370 --> 00:05:48,530
Also you need to hide
some of the

140
00:05:48,530 --> 00:05:50,190
differences between the machines.

141
00:05:50,190 --> 00:05:51,990
So some cores have different
capabilities.

142
00:05:51,990 --> 00:05:54,570
On cell there's a heterogeneous
system between

143
00:05:54,570 --> 00:05:56,560
the STEs and the PPE.

144
00:05:56,560 --> 00:05:59,090
Different communication models
on different architectures,

145
00:05:59,090 --> 00:06:01,010
different synchronization
models.

146
00:06:01,010 --> 00:06:02,970
So whatever common machine
language we come up with,

147
00:06:02,970 --> 00:06:04,600
we'll have to keep these
things hidden from the

148
00:06:04,600 --> 00:06:06,130
programmer.

149
00:06:06,130 --> 00:06:07,960
Now a lot of different
researchers are taking

150
00:06:07,960 --> 00:06:10,390
different tacts for how you want
to invent the next common

151
00:06:10,390 --> 00:06:11,380
machine language.

152
00:06:11,380 --> 00:06:13,920
And the thrust that we're really
excited about is this

153
00:06:13,920 --> 00:06:15,510
notion of streaming.

154
00:06:15,510 --> 00:06:17,460
So what is a stream program?

155
00:06:17,460 --> 00:06:20,130
Well if you look at a lot of the
high-performance systems

156
00:06:20,130 --> 00:06:22,780
today-- including Powerpoint
which is running this awesome

157
00:06:22,780 --> 00:06:24,730
animation--

158
00:06:24,730 --> 00:06:26,880
you can basically see that
they're based around some

159
00:06:26,880 --> 00:06:27,930
stream of data.

160
00:06:27,930 --> 00:06:33,290
So audio, video, like HDTV,
video editing, graphic stuff.

161
00:06:33,290 --> 00:06:35,570
I think actually, a lot of the
projects in this class that I

162
00:06:35,570 --> 00:06:38,160
looked at, would fit into
the streaming mold.

163
00:06:38,160 --> 00:06:41,790
Things like the software
radio, array tracing, I

164
00:06:41,790 --> 00:06:42,750
probably don't remember
them all.

165
00:06:42,750 --> 00:06:44,460
But when I looked at them, they
all looked like they had

166
00:06:44,460 --> 00:06:46,670
a streaming component
somewhere in there.

167
00:06:46,670 --> 00:06:49,560
So what's special about a stream
program compared to

168
00:06:49,560 --> 00:06:51,440
just a normal program?

169
00:06:51,440 --> 00:06:53,830
Well they have a lot of
attractive properties.

170
00:06:53,830 --> 00:06:56,380
If you look at their structure,
you can usually see

171
00:06:56,380 --> 00:07:00,220
that the computation pattern
remains relatively constant

172
00:07:00,220 --> 00:07:01,850
across the lifetime
of the program.

173
00:07:01,850 --> 00:07:04,600
So they have some well-defined
units that are communicating

174
00:07:04,600 --> 00:07:05,810
with each other.

175
00:07:05,810 --> 00:07:08,790
And they continue that pattern
of communication throughout.

176
00:07:08,790 --> 00:07:11,130
And this really exposes a lot
of opportunities for the

177
00:07:11,130 --> 00:07:13,670
compiler to do some
optimizations that it couldn't

178
00:07:13,670 --> 00:07:16,640
do on just an arbitrary general
purpose program.

179
00:07:16,640 --> 00:07:19,020
And if you saw before, we have
basically all the types of

180
00:07:19,020 --> 00:07:22,380
parallelism are really exposed
in a stream program.

181
00:07:22,380 --> 00:07:25,000
There's the pipeline parallelism
between different

182
00:07:25,000 --> 00:07:26,650
producers and consumers.

183
00:07:26,650 --> 00:07:29,340
There's the task parallelism
basically going

184
00:07:29,340 --> 00:07:30,420
from left to right.

185
00:07:30,420 --> 00:07:33,220
And also data parallelism which
means that a single one

186
00:07:33,220 --> 00:07:36,900
of these stages can sometimes be
split to apply to multiple

187
00:07:36,900 --> 00:07:39,960
elements in the data stream.

188
00:07:39,960 --> 00:07:42,380
So when you're thinking about
stream programming, there's a

189
00:07:42,380 --> 00:07:43,840
lot of different ways
you can actually

190
00:07:43,840 --> 00:07:45,500
represent the program.

191
00:07:45,500 --> 00:07:47,640
So whenever you have a
programming model, you have to

192
00:07:47,640 --> 00:07:49,530
answer these kinds
of questions.

193
00:07:49,530 --> 00:07:52,150
For example do the senders and
the receivers block when they

194
00:07:52,150 --> 00:07:53,700
try to communicate?

195
00:07:53,700 --> 00:07:55,470
How much buffering is allowed?

196
00:07:55,470 --> 00:07:57,300
Is the computation
deterministic?

197
00:07:57,300 --> 00:07:59,440
What kind of model do
you have in there?

198
00:07:59,440 --> 00:08:00,580
Can you avoid deadlock?

199
00:08:00,580 --> 00:08:03,880
Questions like these, and we
could spend a whole lecture

200
00:08:03,880 --> 00:08:05,790
answering these questions,
putting them in different

201
00:08:05,790 --> 00:08:06,980
categories.

202
00:08:06,980 --> 00:08:09,170
But what I want to just to do,
just to give you a feel is

203
00:08:09,170 --> 00:08:12,010
touch on kind of three of the
major models that you might

204
00:08:12,010 --> 00:08:14,410
see come up in different kinds
of programming models.

205
00:08:14,410 --> 00:08:17,550
And I'll just touch Kahn process
networks, synchronous

206
00:08:17,550 --> 00:08:22,800
dataflow, and communicating
sequential processes, or CSP.

207
00:08:22,800 --> 00:08:24,930
So just one slide
on these models.

208
00:08:24,930 --> 00:08:27,300
So let's compare them
a little bit.

209
00:08:27,300 --> 00:08:29,650
First there's the Kahn
process networks.

210
00:08:29,650 --> 00:08:31,910
So this is kind of the
simplest model.

211
00:08:31,910 --> 00:08:32,940
It's very intuitive.

212
00:08:32,940 --> 00:08:34,610
You just have different
processes that are

213
00:08:34,610 --> 00:08:36,460
communicating over FIFOs.

214
00:08:36,460 --> 00:08:40,700
And the FIFO size is
conceptually unbounded.

215
00:08:40,700 --> 00:08:44,840
So to a first approximation,
it's kind of like a Unix pipe.

216
00:08:44,840 --> 00:08:47,630
These processes can just read
from the input, and they can

217
00:08:47,630 --> 00:08:50,310
push onto their outputs
without blocking.

218
00:08:50,310 --> 00:08:53,170
But if they try to read from an
input they do block until

219
00:08:53,170 --> 00:08:54,850
an input is available.

220
00:08:54,850 --> 00:08:58,090
And the interesting thing is
that the communication pattern

221
00:08:58,090 --> 00:09:00,520
can actually be dependent
on the data.

222
00:09:00,520 --> 00:09:04,410
So for example I could pop an
index off of one channel, and

223
00:09:04,410 --> 00:09:07,300
then use that index to determine
which other channel

224
00:09:07,300 --> 00:09:09,830
I'll read from on the
next time time step.

225
00:09:09,830 --> 00:09:12,560
But at the same time it
is deterministic.

226
00:09:12,560 --> 00:09:16,450
So for a given series of input
values on the stream, I'll

227
00:09:16,450 --> 00:09:18,540
always have the same
communication pattern that I'm

228
00:09:18,540 --> 00:09:20,630
trying from the other input.

229
00:09:20,630 --> 00:09:24,580
So if it's a deterministic
model, that's a nice property.

230
00:09:24,580 --> 00:09:26,320
Let's see, what else
to say here?

231
00:09:26,320 --> 00:09:29,670
There's actually a few recent
ventures that are using Kahn

232
00:09:29,670 --> 00:09:30,710
process networks.

233
00:09:30,710 --> 00:09:33,190
So there's commercial interest.
For example Ambric

234
00:09:33,190 --> 00:09:36,920
is a startup that I think will
be based on a Kahn process

235
00:09:36,920 --> 00:09:40,750
network for the programming
model.

236
00:09:40,750 --> 00:09:43,410
Looking at another model called
synchronous dataflow,

237
00:09:43,410 --> 00:09:45,930
this is actually what we use
in the StreamIt system.

238
00:09:45,930 --> 00:09:47,760
And compared to Kahn
process networks,

239
00:09:47,760 --> 00:09:48,830
it's kind of a subset.

240
00:09:48,830 --> 00:09:50,600
It's a little bit more
restrictive.

241
00:09:50,600 --> 00:09:53,040
So if you look at the space
of all possible program

242
00:09:53,040 --> 00:09:55,770
behaviors, Kahn process networks
are a pretty big

243
00:09:55,770 --> 00:09:57,110
piece of the space.

244
00:09:57,110 --> 00:09:59,580
And then synchronous dataflow
is kind of a subset of that

245
00:09:59,580 --> 00:10:02,490
space where you know more
about the communication

246
00:10:02,490 --> 00:10:04,390
pattern at compile time.

247
00:10:04,390 --> 00:10:07,070
So for example, in synchronous
dataflow, the programmer

248
00:10:07,070 --> 00:10:10,810
actually declares how many items
it will consume from

249
00:10:10,810 --> 00:10:14,090
each of its in put channels
on a given execution step.

250
00:10:14,090 --> 00:10:16,770
So there's no more data
dependence regarding the

251
00:10:16,770 --> 00:10:18,000
communication pattern.

252
00:10:18,000 --> 00:10:21,290
It'll always input some items
from some of the channels and

253
00:10:21,290 --> 00:10:23,990
produce some number of items
to other channels.

254
00:10:23,990 --> 00:10:26,300
And this is a really nice
properties because it lets the

255
00:10:26,300 --> 00:10:28,510
compiler do to scheduling
for you.

256
00:10:28,510 --> 00:10:31,020
So the compiling can see who's
communicating to who and

257
00:10:31,020 --> 00:10:32,630
exactly what pattern.

258
00:10:32,630 --> 00:10:35,720
And it can statically interleave
the filters to

259
00:10:35,720 --> 00:10:39,400
guarantee that everyone has
enough data to complete their

260
00:10:39,400 --> 00:10:40,750
computation.

261
00:10:40,750 --> 00:10:42,490
So there's a lot of interesting
optimizations you

262
00:10:42,490 --> 00:10:42,950
can do here.

263
00:10:42,950 --> 00:10:45,820
That's why it's very attractive
for StreamIt.

264
00:10:45,820 --> 00:10:47,820
And you can statically guarantee
freedom from

265
00:10:47,820 --> 00:10:51,560
deadlock, which is a nice
property to have.

266
00:10:51,560 --> 00:10:53,850
The last one I want to touch on
is communicating sequential

267
00:10:53,850 --> 00:10:56,440
processes or CSP.

268
00:10:56,440 --> 00:10:59,080
And in the space of program
behaviors, it's kind of an

269
00:10:59,080 --> 00:11:02,790
overlapping that from Kahn
processing networks, and adds

270
00:11:02,790 --> 00:11:05,170
a few new semantic behaviors.

271
00:11:05,170 --> 00:11:08,030
So the buffering model is
basically rendezvous

272
00:11:08,030 --> 00:11:09,300
communication now.

273
00:11:09,300 --> 00:11:11,950
So there's no bothering
in the system.

274
00:11:11,950 --> 00:11:15,390
Basically anytime you send a
value to another process, you

275
00:11:15,390 --> 00:11:18,380
have to block and wait until
that process will actually

276
00:11:18,380 --> 00:11:20,060
receive that value from you.

277
00:11:20,060 --> 00:11:24,570
So everyone is rendevouzing at
every communication step.

278
00:11:24,570 --> 00:11:27,200
In addition to that, they
have some sophisticated

279
00:11:27,200 --> 00:11:28,730
synchronization primitives.

280
00:11:28,730 --> 00:11:32,760
So you can for example, discuss
alternative behaviors

281
00:11:32,760 --> 00:11:35,850
that you have. You can either
one thing or another which

282
00:11:35,850 --> 00:11:39,510
will introduce the
nondeterminism in the model.

283
00:11:39,510 --> 00:11:42,150
Which could be a good or a bad
thing depending on the program

284
00:11:42,150 --> 00:11:43,550
you're trying to express.

285
00:11:43,550 --> 00:11:47,090
And pretty much the most
well-known encapsulation of

286
00:11:47,090 --> 00:11:50,550
CSP is this occam programming
language invented

287
00:11:50,550 --> 00:11:51,550
quite a while ago.

288
00:11:51,550 --> 00:11:54,420
And some people are still
using that today.

289
00:11:54,420 --> 00:11:56,030
Any questions on the
model computations?

290
00:11:59,760 --> 00:12:00,720
OK.

291
00:12:00,720 --> 00:12:03,570
So now let me get into
what StreamIt is.

292
00:12:03,570 --> 00:12:06,190
So StreamIt is a
great language.

293
00:12:06,190 --> 00:12:07,380
It's a high-level

294
00:12:07,380 --> 00:12:08,930
architecture-independent language.

295
00:12:08,930 --> 00:12:11,660
Oh question the back.

296
00:12:11,660 --> 00:12:15,024
AUDIENCE: With the CSP I'm
trying to understand exactly

297
00:12:15,024 --> 00:12:19,660
what that means or how
it's different.

298
00:12:19,660 --> 00:12:22,150
Is basically what's it's saying
is you have a bunch

299
00:12:22,150 --> 00:12:24,940
processes and they can send
messages to each other.

300
00:12:24,940 --> 00:12:29,390
BILL THIES: So all these models
have that property.

301
00:12:29,390 --> 00:12:31,230
AUDIENCE: They all
fit into that.

302
00:12:31,230 --> 00:12:35,220
But it seems like from your
explanation of CSP, that that

303
00:12:35,220 --> 00:12:41,620
was just sort of the essence
of CSP is it more specific?

304
00:12:41,620 --> 00:12:43,820
BILL THIES: So CSP is usually
associated with rendezvous

305
00:12:43,820 --> 00:12:45,360
communication.

306
00:12:45,360 --> 00:12:47,270
That's the side of programs
that fit inside

307
00:12:47,270 --> 00:12:48,780
Kahn process networks.

308
00:12:48,780 --> 00:12:51,860
It's any communicating model
where you basically have no

309
00:12:51,860 --> 00:12:54,310
buffering between
the processes.

310
00:12:54,310 --> 00:12:57,520
Now the piece that sits outside
is usually lumped with

311
00:12:57,520 --> 00:13:01,000
CSP, or especially with occam
They have a set a primitives

312
00:13:01,000 --> 00:13:04,500
that are richer in terms
of synchronization.

313
00:13:04,500 --> 00:13:07,810
So for example, you can have
guards on your communication.

314
00:13:07,810 --> 00:13:11,040
Don't execute this consumption
from this channel until I see

315
00:13:11,040 --> 00:13:12,550
a certain value.

316
00:13:12,550 --> 00:13:15,580
So there's some more rich
semantics there.

317
00:13:15,580 --> 00:13:17,720
And so that's the things that
are usually outside.

318
00:13:17,720 --> 00:13:19,710
They're outside the
other models.

319
00:13:19,710 --> 00:13:23,940
Does that make sense?

320
00:13:23,940 --> 00:13:25,190
Other questions?

321
00:13:28,530 --> 00:13:32,210
OK so StreamIt.

322
00:13:32,210 --> 00:13:34,480
OK so StreamIt is
architecture-independent.

323
00:13:34,480 --> 00:13:39,190
It's basically a really nice
syntactic model for

324
00:13:39,190 --> 00:13:42,480
interfacing with these lower
level models of computation

325
00:13:42,480 --> 00:13:43,530
for streaming.

326
00:13:43,530 --> 00:13:46,460
And really we have two goals
in the StreamIt project.

327
00:13:46,460 --> 00:13:49,470
And the first is from the
programmer's side.

328
00:13:49,470 --> 00:13:52,000
So we want to improve the
programmer's life when you're

329
00:13:52,000 --> 00:13:53,360
writing a parallel program.

330
00:13:53,360 --> 00:13:55,840
We want to make it easier for
you to write a parallel

331
00:13:55,840 --> 00:13:59,060
program then you would have to
do in C or a language like

332
00:13:59,060 --> 00:14:01,510
Java, or any other language
that you know.

333
00:14:01,510 --> 00:14:03,960
And at the same time, we want
scalable and portable

334
00:14:03,960 --> 00:14:06,100
performance across
the multicores.

335
00:14:06,100 --> 00:14:08,820
So an interesting thing these
days is you'll find, is it's

336
00:14:08,820 --> 00:14:13,570
often very hard a tempt the
programmer to switch to your

337
00:14:13,570 --> 00:14:16,240
favorite language based
solely on performance.

338
00:14:16,240 --> 00:14:18,770
Or at least this has been the
story in the past. It may

339
00:14:18,770 --> 00:14:20,210
change, looking forward.

340
00:14:20,210 --> 00:14:22,640
Because it's a lot harder to
get performance these days.

341
00:14:22,640 --> 00:14:24,730
But usually you have to offer
them some other carrot to get

342
00:14:24,730 --> 00:14:25,720
them on board.

343
00:14:25,720 --> 00:14:28,660
And you know the carrot here
is that it's really nice to

344
00:14:28,660 --> 00:14:29,630
program in.

345
00:14:29,630 --> 00:14:30,880
It's fun to program in.

346
00:14:30,880 --> 00:14:32,200
It's beautiful.

347
00:14:32,200 --> 00:14:34,820
It's a lot easier to program and
stream it then it would be

348
00:14:34,820 --> 00:14:37,230
in something like C or Java
for a certain class of

349
00:14:37,230 --> 00:14:40,380
programs. So that's how we get
them on board, and then we

350
00:14:40,380 --> 00:14:43,010
also provide the performance.

351
00:14:43,010 --> 00:14:45,420
We're mostly based on the
synchronous in dataflow model.

352
00:14:45,420 --> 00:14:49,380
In that when there are static
communication patterns, we

353
00:14:49,380 --> 00:14:51,560
leverage that from the
compiler side.

354
00:14:51,560 --> 00:14:53,850
So I'll also tell you about some
dynamic extensions that

355
00:14:53,850 --> 00:14:58,410
we have, that is the much richer
model of communication.

356
00:14:58,410 --> 00:15:00,690
So what have we been doing
in the Streamit Project?

357
00:15:00,690 --> 00:15:03,860
We have kind of a dual thrust
within our group building on

358
00:15:03,860 --> 00:15:04,800
this language.

359
00:15:04,800 --> 00:15:07,470
So the first thrust is from
the programmability side,

360
00:15:07,470 --> 00:15:10,140
looking at applications and
programmability What can we

361
00:15:10,140 --> 00:15:12,030
fit into the streaming model?

362
00:15:12,030 --> 00:15:14,260
And we're also really pushing
the optimizations.

363
00:15:14,260 --> 00:15:17,660
So what can you do from both a
domain specific optimization

364
00:15:17,660 --> 00:15:21,930
standpoint, as kind of emulating
a DSP engineer or a

365
00:15:21,930 --> 00:15:24,770
signal processing expert
in the design flow.

366
00:15:24,770 --> 00:15:26,860
And also architecture specific
optimizations.

367
00:15:26,860 --> 00:15:29,520
So we've been compiling for a
lot of parallel machines.

368
00:15:29,520 --> 00:15:31,860
And we were hoping we could
have a full system for you

369
00:15:31,860 --> 00:15:34,735
guys, this IEP, so you could
write it then stream it and

370
00:15:34,735 --> 00:15:35,520
then hit the button.

371
00:15:35,520 --> 00:15:37,900
And it would work the whole
way down to cell.

372
00:15:37,900 --> 00:15:39,570
Unfortunately we're not
quite there yet.

373
00:15:39,570 --> 00:15:42,510
But we do have a pretty robust
compiler infrastructure.

374
00:15:42,510 --> 00:15:45,010
And you can download this off
the web and play with it if

375
00:15:45,010 --> 00:15:46,190
you want to.

376
00:15:46,190 --> 00:15:49,270
One of our backends that we've
released so far actually does

377
00:15:49,270 --> 00:15:52,260
go to a cluster of
workstations.

378
00:15:52,260 --> 00:15:56,550
So it's kind of an MPI-like
version of C. It uses Pthreads

379
00:15:56,550 --> 00:15:58,120
for the parallelism model.

380
00:15:58,120 --> 00:16:00,630
And I mean, depending on what
kind of a hacker you are, you

381
00:16:00,630 --> 00:16:02,950
actually might be able to lower
that down onto cell.

382
00:16:02,950 --> 00:16:07,840
So some of the stuff you might
be able to use if you're have

383
00:16:07,840 --> 00:16:09,050
some initiative in there.

384
00:16:09,050 --> 00:16:10,400
And of course we'd be
willing to work with

385
00:16:10,400 --> 00:16:12,260
you on this as well.

386
00:16:12,260 --> 00:16:14,460
so we have lots optimizations
in the tool flow.

387
00:16:14,460 --> 00:16:17,280
And actually Saman will spend
another lecture focusing on

388
00:16:17,280 --> 00:16:19,240
the StreamIt compiler,
and how we get

389
00:16:19,240 --> 00:16:22,010
performance out of the model.

390
00:16:22,010 --> 00:16:25,330
OK, so let's just jump right in
and do the analog of Hello

391
00:16:25,330 --> 00:16:26,080
World in StreamIt.

392
00:16:26,080 --> 00:16:28,470
I'm going to kind of walk you
through the language and show

393
00:16:28,470 --> 00:16:30,950
you the interesting pieces
from an intellectual

394
00:16:30,950 --> 00:16:31,380
standpoint.

395
00:16:31,380 --> 00:16:33,720
What's interesting about
a streaming model that

396
00:16:33,720 --> 00:16:35,280
you can take away.

397
00:16:35,280 --> 00:16:38,100
So instead of Hello World,
we have a counter.

398
00:16:38,100 --> 00:16:40,320
Since we're dealing with stream
programs here, you're

399
00:16:40,320 --> 00:16:42,200
not usually doing
text processing.

400
00:16:42,200 --> 00:16:43,820
So how do you write counter?

401
00:16:43,820 --> 00:16:46,040
Well there are two pieces
to the program.

402
00:16:46,040 --> 00:16:48,430
The first is kind of the
interconnect between the

403
00:16:48,430 --> 00:16:49,270
different components.

404
00:16:49,270 --> 00:16:50,950
That's what we have up here.

405
00:16:50,950 --> 00:16:54,210
We're saying the program is a
pipeline with two stages, it

406
00:16:54,210 --> 00:16:56,690
has a source, and it
has a printer.

407
00:16:56,690 --> 00:16:58,120
And then we can write
the source and

408
00:16:58,120 --> 00:16:59,720
the printer as filters.

409
00:16:59,720 --> 00:17:03,190
We call those basic building
blocks filters in Streamit.

410
00:17:03,190 --> 00:17:05,430
So the source will just have
a variable x that it

411
00:17:05,430 --> 00:17:07,130
initializes is zero.

412
00:17:07,130 --> 00:17:09,450
And then we have a work
function which is

413
00:17:09,450 --> 00:17:12,570
automatically called by our
runtime system every time

414
00:17:12,570 --> 00:17:13,820
through the steady state.

415
00:17:13,820 --> 00:17:16,400
So this work function well push
one item on to the output

416
00:17:16,400 --> 00:17:20,130
channel, and it'll increment
the value afterward.

417
00:17:20,130 --> 00:17:22,870
Whereas the intPrinter
at the bottom here,

418
00:17:22,870 --> 00:17:24,720
will input one value.

419
00:17:24,720 --> 00:17:27,080
And its work function here just
pops that value off the

420
00:17:27,080 --> 00:17:29,940
input tape, and prints
it to the output.

421
00:17:29,940 --> 00:17:31,180
Now how do we run this thing?

422
00:17:31,180 --> 00:17:33,490
Well there's no main function
here like you

423
00:17:33,490 --> 00:17:34,730
see in Hello World.

424
00:17:34,730 --> 00:17:36,190
Is there comment?

425
00:17:36,190 --> 00:17:39,207
Oh, sorry.

426
00:17:39,207 --> 00:17:42,180
AUDIENCE: The two meanings of
push and two meanings of pop.

427
00:17:42,180 --> 00:17:42,670
BILL THIES: Two meanings?

428
00:17:42,670 --> 00:17:44,120
AUDIENCE: Push 1, 2, 3.

429
00:17:44,120 --> 00:17:44,970
BILL THIES: Yeah, yeah.

430
00:17:44,970 --> 00:17:48,490
So the first push here is just
declaring that this work

431
00:17:48,490 --> 00:17:51,300
function will push one item
to the output tape.

432
00:17:51,300 --> 00:17:53,450
So this is the synchronous
dataflow aspect.

433
00:17:53,450 --> 00:17:56,510
Were associating an output rate
with this work function.

434
00:17:56,510 --> 00:17:58,480
So that's a declaration here.

435
00:17:58,480 --> 00:18:02,190
Whereas this push is just
actually executing the push

436
00:18:02,190 --> 00:18:03,760
onto the output.

437
00:18:03,760 --> 00:18:06,000
So how do we run this thing?

438
00:18:06,000 --> 00:18:08,770
Well we compile it with a
StreamIt compiler, store C

439
00:18:08,770 --> 00:18:09,980
into a binary.

440
00:18:09,980 --> 00:18:12,170
And then when we run, we run
for a given number of

441
00:18:12,170 --> 00:18:13,390
iterations.

442
00:18:13,390 --> 00:18:15,100
So you don't just
call it once.

443
00:18:15,100 --> 00:18:18,060
Our model is that this is a
continuous stream of data

444
00:18:18,060 --> 00:18:19,420
going through the program.

445
00:18:19,420 --> 00:18:22,110
And so when you run it, you
run it for some number of

446
00:18:22,110 --> 00:18:25,570
iterations, or basically input
or output items, is what

447
00:18:25,570 --> 00:18:26,590
you're running it for.

448
00:18:26,590 --> 00:18:28,470
So if you run this for four
iterations, it would

449
00:18:28,470 --> 00:18:31,060
print in the 033.

450
00:18:31,060 --> 00:18:33,540
So we can leverage this
steady flow of data.

451
00:18:33,540 --> 00:18:35,050
Yeah Amir?

452
00:18:35,050 --> 00:18:38,500
AUDIENCE: 1, 2, 3, 4, pushing
X plus plus .

453
00:18:38,500 --> 00:18:40,320
BILL THIES: I think
the plus, plus is

454
00:18:40,320 --> 00:18:41,140
executed after the push.

455
00:18:41,140 --> 00:18:43,010
AUDIENCE: Push plus plus?

456
00:18:43,010 --> 00:18:46,570
BILL THIES: So it
starts at zero.

457
00:18:46,570 --> 00:18:50,670
So I think a PostFix expression
executes after the

458
00:18:50,670 --> 00:18:51,520
actual obsession.

459
00:18:51,520 --> 00:18:53,780
Yeah.

460
00:18:53,780 --> 00:18:56,220
Yeah.

461
00:18:56,220 --> 00:18:57,470
Other questions?

462
00:18:59,680 --> 00:19:02,200
OK so let's step up a
level and look at

463
00:19:02,200 --> 00:19:03,890
what we have in StreamIt.

464
00:19:03,890 --> 00:19:05,940
So the first question is, how
do you represent this

465
00:19:05,940 --> 00:19:08,350
connectivity between different
building blocks?

466
00:19:08,350 --> 00:19:10,200
How do you represent streams?

467
00:19:10,200 --> 00:19:13,160
And if you look at traditional
programming models, kind of

468
00:19:13,160 --> 00:19:15,140
the conventional wisdom
is that a stream

469
00:19:15,140 --> 00:19:16,490
program is a graph.

470
00:19:16,490 --> 00:19:17,560
You have different
nodes that are

471
00:19:17,560 --> 00:19:19,060
communicating to each other.

472
00:19:19,060 --> 00:19:21,350
And graphs are actually kind
of hard to analyze.

473
00:19:21,350 --> 00:19:22,420
They're hard to represent.

474
00:19:22,420 --> 00:19:24,090
They're a little
but confusing.

475
00:19:24,090 --> 00:19:26,900
So the approach we decided to
take in StreamIt is one of a

476
00:19:26,900 --> 00:19:29,170
structured computation graph.

477
00:19:29,170 --> 00:19:32,145
So instead of having arbitrary
inner connections between the

478
00:19:32,145 --> 00:19:35,280
stages, we have a higher
hierarchical description in

479
00:19:35,280 --> 00:19:38,150
which every individual stage
has a single input and a

480
00:19:38,150 --> 00:19:39,340
single output.

481
00:19:39,340 --> 00:19:41,760
And you can compose these
together into

482
00:19:41,760 --> 00:19:42,990
higher level stages.

483
00:19:42,990 --> 00:19:45,600
Of course there's some pages
that do split and join with

484
00:19:45,600 --> 00:19:46,770
multiple inputs.

485
00:19:46,770 --> 00:19:48,520
We'll get to that.

486
00:19:48,520 --> 00:19:52,400
So the analog here is kind of
ah analogous to structured

487
00:19:52,400 --> 00:19:54,160
control flow, in your favorite

488
00:19:54,160 --> 00:19:55,780
imperative programming language.

489
00:19:55,780 --> 00:19:59,060
Of course there was a day when
everyone used goto statements

490
00:19:59,060 --> 00:20:01,060
instead of having structure
control flow.

491
00:20:01,060 --> 00:20:03,400
We've got a fan of goto
statements in the audience?

492
00:20:03,400 --> 00:20:06,550
OK, I'll get to you later.

493
00:20:06,550 --> 00:20:09,160
But the problem was, it's really
hard to understand the

494
00:20:09,160 --> 00:20:12,110
program that's jumping all over
the place because there's

495
00:20:12,110 --> 00:20:14,810
no local reasoning you can have.
You know you're jumping

496
00:20:14,810 --> 00:20:17,150
to this location, you're coming
back a different way.

497
00:20:17,150 --> 00:20:19,950
It's hard to reason about
program components.

498
00:20:19,950 --> 00:20:21,960
So when people went to
structured control flow,

499
00:20:21,960 --> 00:20:26,410
there's just if else, four
loop statements.

500
00:20:26,410 --> 00:20:28,740
Those are the two basic
constructs.

501
00:20:28,740 --> 00:20:32,250
You can basically express all
kinds of computation in those

502
00:20:32,250 --> 00:20:33,540
simple primitives.

503
00:20:33,540 --> 00:20:35,340
And things got a lot simpler.

504
00:20:35,340 --> 00:20:37,500
And you know people objected
at one point even.

505
00:20:37,500 --> 00:20:39,630
You know what about a
finite-state machine?

506
00:20:39,630 --> 00:20:41,990
Don't you need goto statements
for a finite-state machine,

507
00:20:41,990 --> 00:20:43,980
going from one state
to another another.

508
00:20:43,980 --> 00:20:46,850
And now everyone writes in FSM
with a really simple idiom.

509
00:20:46,850 --> 00:20:50,090
You usually have a while loop
around a case statement.

510
00:20:50,090 --> 00:20:51,830
Right, you have a
dispatch loop.

511
00:20:51,830 --> 00:20:53,780
So and now whenever you see
that pattern you can

512
00:20:53,780 --> 00:20:55,490
recognize, oh there's a
finite-state machine.

513
00:20:55,490 --> 00:20:57,200
It's not just at set of gotos.

514
00:20:57,200 --> 00:20:58,840
It's a finite-state machine.

515
00:20:58,840 --> 00:21:00,370
So we think there are
similar idioms in

516
00:21:00,370 --> 00:21:01,140
the streaming domain.

517
00:21:01,140 --> 00:21:02,820
And that's kind of the direction
we're pushing from a

518
00:21:02,820 --> 00:21:04,600
design standpoint.

519
00:21:04,600 --> 00:21:06,510
So what are our structures
that we have?

520
00:21:06,510 --> 00:21:09,310
Well here are our structured
streams. At the

521
00:21:09,310 --> 00:21:10,310
base we have a filter.

522
00:21:10,310 --> 00:21:13,100
That's just the programmable
unit like I showed you.

523
00:21:13,100 --> 00:21:16,430
We have a pipeline, which just
connects one stream to another

524
00:21:16,430 --> 00:21:17,400
in a sequence.

525
00:21:17,400 --> 00:21:19,620
So this gives you pipeline
parallelism.

526
00:21:19,620 --> 00:21:22,270
There's a splitjoin where you
have explicit parallelism in

527
00:21:22,270 --> 00:21:23,020
the stream.

528
00:21:23,020 --> 00:21:25,470
So I'll talk about what these
splitters and joiners can do.

529
00:21:25,470 --> 00:21:28,730
It's basically a predefined
pattern of scattering data to

530
00:21:28,730 --> 00:21:31,530
some child streams, and then
gathering that data back into

531
00:21:31,530 --> 00:21:32,770
a single stream.

532
00:21:32,770 --> 00:21:35,500
So the whole construct still
remains single input and

533
00:21:35,500 --> 00:21:37,020
single output.

534
00:21:37,020 --> 00:21:40,720
Likewise a feedback loop is
just a simple way to put a

535
00:21:40,720 --> 00:21:42,340
loop in your stream.

536
00:21:42,340 --> 00:21:44,160
And of course these
are hierarchical.

537
00:21:44,160 --> 00:21:47,160
So all of these green boxes
can be any of the three

538
00:21:47,160 --> 00:21:47,910
constructs.

539
00:21:47,910 --> 00:21:50,270
So that's how you can have these
hierarchical graphs.

540
00:21:50,270 --> 00:21:52,740
And again, since everything is
single-input single-output,

541
00:21:52,740 --> 00:21:53,920
you can really mix and match.

542
00:21:53,920 --> 00:21:56,040
You know choose your favorite
components, and they'll always

543
00:21:56,040 --> 00:21:56,760
fit together.

544
00:21:56,760 --> 00:22:00,390
You don't need to stitch
multiple connections.

545
00:22:00,390 --> 00:22:03,010
So let's dive inside one
of these filters now.

546
00:22:03,010 --> 00:22:05,620
And I gave you a feel for how
they look before, but here's a

547
00:22:05,620 --> 00:22:07,080
little more detail.

548
00:22:07,080 --> 00:22:09,130
So how do we program
the filter?

549
00:22:09,130 --> 00:22:11,600
Well a filter just transforms
one stream

550
00:22:11,600 --> 00:22:12,950
into another stream.

551
00:22:12,950 --> 00:22:15,260
And here it's transforming a
stream of floating-point

552
00:22:15,260 --> 00:22:17,780
numbers into another
floating-point number stream.

553
00:22:17,780 --> 00:22:20,800
I can take some parameters
at the top.

554
00:22:20,800 --> 00:22:23,310
These actually fixed at compile
time in our model,

555
00:22:23,310 --> 00:22:25,860
which allows the compiler to
really specialize the filters

556
00:22:25,860 --> 00:22:28,720
code depending on the context
in which it's being used.

557
00:22:28,720 --> 00:22:32,210
So for example here, we're
inputting N in a frequency.

558
00:22:32,210 --> 00:22:34,740
And then we have two stages
of execution.

559
00:22:34,740 --> 00:22:38,220
At initialization time-- this
runs one at the beginning of

560
00:22:38,220 --> 00:22:40,940
the whole program-- we can
calculate some weights for

561
00:22:40,940 --> 00:22:42,720
example, from the frequency.

562
00:22:42,720 --> 00:22:45,550
And we can store those weights
as a local variable.

563
00:22:45,550 --> 00:22:47,860
So you can think of this kind
of like a Java class.

564
00:22:47,860 --> 00:22:49,260
You can have some member
variables.

565
00:22:49,260 --> 00:22:52,350
You can retains state from one
execution to the next.

566
00:22:52,350 --> 00:22:53,790
The work function
is the closest

567
00:22:53,790 --> 00:22:55,350
thing to the main function.

568
00:22:55,350 --> 00:22:57,860
This is called repeatedly
in the steady state.

569
00:22:57,860 --> 00:23:00,660
And here are the IO rates
of the work function.

570
00:23:00,660 --> 00:23:03,660
This filter actually peaks at
some data items. That means

571
00:23:03,660 --> 00:23:07,010
that it inspects more items on
the input channel than it

572
00:23:07,010 --> 00:23:09,490
actually consumes on
every iteration.

573
00:23:09,490 --> 00:23:12,920
So we'll look at N input items,
and we'll push one new

574
00:23:12,920 --> 00:23:17,040
item onto the output and pop one
item from the input tape

575
00:23:17,040 --> 00:23:19,200
every time we execute.

576
00:23:19,200 --> 00:23:21,800
So here we have a sliding
window computation.

577
00:23:21,800 --> 00:23:24,330
It means the next time through,
we'll just slide this

578
00:23:24,330 --> 00:23:27,180
window up by one and inspect
the next N items

579
00:23:27,180 --> 00:23:28,830
on the input tape.

580
00:23:28,830 --> 00:23:31,170
And inside the work function
you can have pretty much

581
00:23:31,170 --> 00:23:32,590
general purpose code.

582
00:23:32,590 --> 00:23:34,520
Right now we just allow pointers
and a few other

583
00:23:34,520 --> 00:23:35,980
things to keep things simple.

584
00:23:35,980 --> 00:23:38,050
But the idea is this is general
purpose imperative

585
00:23:38,050 --> 00:23:40,380
code inside the work function.

586
00:23:40,380 --> 00:23:44,160
Now what's nice about this
representations of a filter is

587
00:23:44,160 --> 00:23:46,020
for one thing is this
peak function.

588
00:23:46,020 --> 00:23:48,750
So what we really have is a
nice representation of the

589
00:23:48,750 --> 00:23:51,100
data pattern, that you're
reading the data

590
00:23:51,100 --> 00:23:52,470
on the input channel.

591
00:23:52,470 --> 00:23:55,290
And if you look at this for
example in a language like C,

592
00:23:55,290 --> 00:23:56,730
it's a lot messier.

593
00:23:56,730 --> 00:23:58,990
So usually when you're doing
buffer management, you have to

594
00:23:58,990 --> 00:24:00,610
do some modulo operations.

595
00:24:00,610 --> 00:24:03,290
You have to keep a circular
buffer of your live data.

596
00:24:03,290 --> 00:24:06,540
And increase you know, a head or
tail pointer and mod around

597
00:24:06,540 --> 00:24:08,910
the side with some kind
of modulo operation.

598
00:24:08,910 --> 00:24:11,580
And for a compiler, this
is a real nightmare.

599
00:24:11,580 --> 00:24:13,340
Because modulo operations
are kind of the the

600
00:24:13,340 --> 00:24:14,680
worst thing to analyze.

601
00:24:14,680 --> 00:24:16,760
You can't see what it's actually
trying to read.

602
00:24:16,760 --> 00:24:19,060
And if you want to map this
buffer to a network or to a

603
00:24:19,060 --> 00:24:22,460
combined communication with some
other actor or filter in

604
00:24:22,460 --> 00:24:24,690
the graph, it's pretty
much impossible.

605
00:24:24,690 --> 00:24:27,540
So here is we're exposing that
all to the compiler.

606
00:24:27,540 --> 00:24:29,610
And you'll see how that
can make a difference.

607
00:24:29,610 --> 00:24:31,450
And also it's just a lot
easier to program.

608
00:24:31,450 --> 00:24:33,040
I mean, I don't like looking
at the code.

609
00:24:33,040 --> 00:24:36,260
So I'm going to go to
the next slide.

610
00:24:36,260 --> 00:24:38,930
OK, so how do we piece
things together?

611
00:24:38,930 --> 00:24:41,560
Let's just build some higher
level components.

612
00:24:41,560 --> 00:24:44,380
So here's a pipeline
of two components.

613
00:24:44,380 --> 00:24:45,870
And we already saw a pipeline.

614
00:24:45,870 --> 00:24:48,920
You can just add one component
after another.

615
00:24:48,920 --> 00:24:53,680
And add just basically has the
effect of making a queue, and

616
00:24:53,680 --> 00:24:56,570
just queueing up all of the
components that you added, and

617
00:24:56,570 --> 00:24:58,530
connecting them one
after another.

618
00:24:58,530 --> 00:25:01,560
So here we have a BandPastFilter
by connecting a

619
00:25:01,560 --> 00:25:04,310
LowPassFilter and feeding it's
output into a HighPassFilter.

620
00:25:04,310 --> 00:25:07,650
You end up with a
BandPassFilter.

621
00:25:07,650 --> 00:25:08,850
OK what about a splitjoin?

622
00:25:08,850 --> 00:25:11,290
How do we make those?

623
00:25:11,290 --> 00:25:14,730
So a splitjoin has an add
statement as well.

624
00:25:14,730 --> 00:25:17,260
And here we're adding components
in a loop.

625
00:25:17,260 --> 00:25:20,710
So what this means is now when
we say add, we're actually

626
00:25:20,710 --> 00:25:22,700
adding from left to right.

627
00:25:22,700 --> 00:25:24,920
So instead of going top to down,
we're adding from left

628
00:25:24,920 --> 00:25:26,610
to right across the splitjoin.

629
00:25:26,610 --> 00:25:28,440
And we can actually
do that in a loop.

630
00:25:28,440 --> 00:25:31,880
So here we're intPrinting a
parameter N. And depending on

631
00:25:31,880 --> 00:25:34,110
that value, we'll add N

632
00:25:34,110 --> 00:25:36,420
BandPassFilters to this splitjoin.

633
00:25:36,420 --> 00:25:37,590
So it's kind of cool, right?

634
00:25:37,590 --> 00:25:40,000
Because you can input a
parameter, and that parameter

635
00:25:40,000 --> 00:25:43,180
actually affects the structure
of your graph.

636
00:25:43,180 --> 00:25:47,450
So this graph is unrolled at
compiled time by our compiler,

637
00:25:47,450 --> 00:25:50,110
constructing a big sequence
of computations.

638
00:25:50,110 --> 00:25:52,440
And it can resolve the structure
and communication

639
00:25:52,440 --> 00:25:55,050
pattern in that graph, and then
map it to the underlying

640
00:25:55,050 --> 00:25:56,300
substrate when we compile.

641
00:25:58,820 --> 00:26:00,680
Also to notice here are the
splitter and the joiner.

642
00:26:00,680 --> 00:26:03,420
So we have a predefined set
of splitter and joiners.

643
00:26:03,420 --> 00:26:05,760
I'll go into more
detail later.

644
00:26:05,760 --> 00:26:07,750
But here we're just duplicating
the data to every

645
00:26:07,750 --> 00:26:09,970
one of these parallel
components, and then doing a

646
00:26:09,970 --> 00:26:12,970
round-robin join pattern where
we bring them back together

647
00:26:12,970 --> 00:26:15,030
into a single output stream.

648
00:26:15,030 --> 00:26:17,150
And if you want to do an
equalizer, you basically need

649
00:26:17,150 --> 00:26:18,990
an adder at the bottom
to add the

650
00:26:18,990 --> 00:26:21,310
different components together.

651
00:26:21,310 --> 00:26:23,290
And another thing you can notice
here is that we have

652
00:26:23,290 --> 00:26:24,960
some inlining going on.

653
00:26:24,960 --> 00:26:28,530
So we actually embedded this
splitjoin inside a higher

654
00:26:28,530 --> 00:26:29,980
level pipeline.

655
00:26:29,980 --> 00:26:32,260
So what this does is it prevents
you from having to

656
00:26:32,260 --> 00:26:34,340
name every component
of your stream.

657
00:26:34,340 --> 00:26:37,870
You can have a single stream
definition with lots of nested

658
00:26:37,870 --> 00:26:39,110
components.

659
00:26:39,110 --> 00:26:42,770
And the natural extension is,
you can basically scale-up to

660
00:26:42,770 --> 00:26:45,230
basically the natural size of
a procedure, just like you

661
00:26:45,230 --> 00:26:47,160
would in an imperative
language.

662
00:26:47,160 --> 00:26:50,190
And here is for example, an FM
radio where we have a few

663
00:26:50,190 --> 00:26:51,650
inline components.

664
00:26:51,650 --> 00:26:54,530
And the interesting thing here
is that there's a pretty good

665
00:26:54,530 --> 00:26:57,530
correspondence between the
lines of the text and the

666
00:26:57,530 --> 00:26:59,480
actual structure of the graph.

667
00:26:59,480 --> 00:27:01,110
And that's something that's
hard to find in

668
00:27:01,110 --> 00:27:02,100
an imperative language.

669
00:27:02,100 --> 00:27:05,120
I mean if you want to for
example, stitch nodes together

670
00:27:05,120 --> 00:27:08,210
with edges, it's often very
hard to visualize the

671
00:27:08,210 --> 00:27:11,140
resulting structure of the graph
that you have. But here

672
00:27:11,140 --> 00:27:13,650
if you just go through the
program, you can see that the

673
00:27:13,650 --> 00:27:16,940
AtoD component goes right over
to the AtoD, the demodulator

674
00:27:16,940 --> 00:27:18,650
to the demodulator, and so on.

675
00:27:18,650 --> 00:27:21,640
And even for the parallel
components, you can kind of

676
00:27:21,640 --> 00:27:23,720
piece them together.

677
00:27:23,720 --> 00:27:27,680
so that's kind of how we think
of building programs. Any

678
00:27:27,680 --> 00:27:28,930
questions so far?

679
00:27:31,070 --> 00:27:33,390
OK so this is kind of
how you go about

680
00:27:33,390 --> 00:27:34,920
programming in StreamIt.

681
00:27:34,920 --> 00:27:37,370
But programming is kind of a
chug n' plug activity right?

682
00:27:37,370 --> 00:27:39,140
Nobody wants to be
a code monkey.

683
00:27:39,140 --> 00:27:41,340
The reason we're all here is to
see what's beautiful about

684
00:27:41,340 --> 00:27:42,590
this programming model.

685
00:27:42,590 --> 00:27:42,910
Right?

686
00:27:42,910 --> 00:27:45,410
Don Knuth said this, a famous
computer scientist from

687
00:27:45,410 --> 00:27:48,320
Stanford during his Turing
Award Lecture you know,

688
00:27:48,320 --> 00:27:50,020
"Some programs are
elegant, some are

689
00:27:50,020 --> 00:27:52,460
exquisite, some are sparkling.

690
00:27:52,460 --> 00:27:54,520
My claim is that it is possible
to write grand

691
00:27:54,520 --> 00:27:56,910
programs, noble programs, truly

692
00:27:56,910 --> 00:27:58,640
magnificent ones!" Right.

693
00:27:58,640 --> 00:28:00,210
We want the best programs
possible.

694
00:28:00,210 --> 00:28:01,990
It's not just about
making it work.

695
00:28:01,990 --> 00:28:04,850
We want really beautiful
programs. So what's beautiful

696
00:28:04,850 --> 00:28:06,370
about the streaming domain?

697
00:28:06,370 --> 00:28:08,820
What can you go away with and
say wow, that was a really

698
00:28:08,820 --> 00:28:11,540
beautiful expression
of the computation.

699
00:28:11,540 --> 00:28:14,290
Well for me I think one of the
interesting things here is the

700
00:28:14,290 --> 00:28:15,680
splitjoin contruct.

701
00:28:15,680 --> 00:28:18,040
Splitjoins can really
be beautiful.

702
00:28:18,040 --> 00:28:22,000
You know some mornings I just
wake up and I'm like, oh I'm

703
00:28:22,000 --> 00:28:24,250
so glad I live in a world
with splitjoins.

704
00:28:24,250 --> 00:28:25,510
You know?

705
00:28:25,510 --> 00:28:27,820
And and now splitjoins will
be part of your world.

706
00:28:27,820 --> 00:28:29,010
You can say this tomorrow.

707
00:28:29,010 --> 00:28:30,380
This is just wonderful.

708
00:28:30,380 --> 00:28:32,840
So OK, what do we having in
the splitjoin constructs?

709
00:28:32,840 --> 00:28:34,900
You can duplicate data.

710
00:28:34,900 --> 00:28:37,650
You can do a round-robin
communication pattern from one

711
00:28:37,650 --> 00:28:40,130
to another, or round-robin
join.

712
00:28:40,130 --> 00:28:42,300
Now the duplicate is
pretty simple.

713
00:28:42,300 --> 00:28:44,090
You just take the input
stream and duplicate

714
00:28:44,090 --> 00:28:45,440
it to all the children.

715
00:28:45,440 --> 00:28:47,240
No problem.

716
00:28:47,240 --> 00:28:48,980
What do you do for
a round-robin?

717
00:28:48,980 --> 00:28:53,260
Well you path N items from the
input to a given child.

718
00:28:53,260 --> 00:28:56,950
So for example, if N is 1, we'll
just distribute one at a

719
00:28:56,950 --> 00:29:01,170
time to the child streams. And
you get a pattern like this, a

720
00:29:01,170 --> 00:29:03,890
round-robin just going across.

721
00:29:03,890 --> 00:29:06,290
And you can do the same thing
on the joiner side.

722
00:29:06,290 --> 00:29:09,310
Let's say you're joining
with a factor of one.

723
00:29:09,310 --> 00:29:12,490
You're just reading from the
children and putting them into

724
00:29:12,490 --> 00:29:14,530
a single stream.

725
00:29:14,530 --> 00:29:18,190
OK so the pretty colorful, but
nothing too fancy yet.

726
00:29:18,190 --> 00:29:20,460
Let's consider a different
round-robin factor.

727
00:29:20,460 --> 00:29:24,420
So a round-robin of 2 means that
we peel off 2 items from

728
00:29:24,420 --> 00:29:27,760
the input, and pass those items
to the first output.

729
00:29:27,760 --> 00:29:29,590
OK there actually is going
to be a quiz on this.

730
00:29:29,590 --> 00:29:32,630
So ask questions of this
doesn't make sense.

731
00:29:32,630 --> 00:29:37,290
OK pass the next 2 items, and 2
items round-robin like that.

732
00:29:37,290 --> 00:29:39,010
And you can actually
have nonuniform

733
00:29:39,010 --> 00:29:40,210
weights if you want to.

734
00:29:40,210 --> 00:29:41,740
So on the right let's
say we're doing

735
00:29:41,740 --> 00:29:43,270
round-robin 1, 2, 3.

736
00:29:43,270 --> 00:29:46,620
That means we pass 1 item from
the first stream, 2 items from

737
00:29:46,620 --> 00:29:49,670
the next stream, and then 3
items from the third stream.

738
00:29:49,670 --> 00:29:50,460
OK, pretty simple.

739
00:29:50,460 --> 00:29:53,160
We're just doing 1, 2,
and 3, and so on.

740
00:29:53,160 --> 00:29:54,260
Does that make sense?

741
00:29:54,260 --> 00:29:59,050
I'm going to build on this
so any questions?

742
00:29:59,050 --> 00:30:01,780
OK this was colorful but this
totally beautiful yet.

743
00:30:01,780 --> 00:30:03,670
So what's beautiful
about this?

744
00:30:03,670 --> 00:30:06,440
well let's see how you might
write a matrix transpose.

745
00:30:06,440 --> 00:30:08,720
OK, something you guys have
probably all written at one

746
00:30:08,720 --> 00:30:11,120
point in your life is
transposing a matrix.

747
00:30:11,120 --> 00:30:14,430
Let's say this matrix has M
rows, and it has N columns

748
00:30:14,430 --> 00:30:15,600
going across.

749
00:30:15,600 --> 00:30:17,830
And we're starting with a
representation in which the

750
00:30:17,830 --> 00:30:20,660
stream is basically, I think
this is row major order.

751
00:30:20,660 --> 00:30:23,430
The first thing that you're
doing is going across the rows

752
00:30:23,430 --> 00:30:25,340
before you're going to the
next the next row.

753
00:30:25,340 --> 00:30:27,680
I'm sorry you're going across
the columns, and then down to

754
00:30:27,680 --> 00:30:28,770
the next row.

755
00:30:28,770 --> 00:30:30,710
So it's zigzagging like this.

756
00:30:30,710 --> 00:30:33,020
And you want to pass through
a transpose, so you

757
00:30:33,020 --> 00:30:34,420
zigzag the other way.

758
00:30:34,420 --> 00:30:36,870
You do the first column,
up, and then the next

759
00:30:36,870 --> 00:30:38,340
column, and so on.

760
00:30:38,340 --> 00:30:41,110
And this comes up a lot
in a stream program.

761
00:30:41,110 --> 00:30:44,080
So it turns out you can
represent this as a splitjoin.

762
00:30:44,080 --> 00:30:45,570
Oh that's not good.

763
00:30:48,110 --> 00:30:52,420
Just in my moment in
glory as well.

764
00:30:52,420 --> 00:30:53,430
OK, yeah slides.

765
00:30:53,430 --> 00:30:54,680
You can download slides.

766
00:30:57,540 --> 00:30:58,670
Actually I can do this
on the board.

767
00:30:58,670 --> 00:30:59,980
This is the thinking
part anyway.

768
00:30:59,980 --> 00:31:01,910
Yeah, could you just
bring up GMail?

769
00:31:01,910 --> 00:31:03,820
I have a backup on GMail.

770
00:31:03,820 --> 00:31:05,880
Can we focus on the board?

771
00:31:05,880 --> 00:31:09,990
So this is going to be a
little exercise anyway.

772
00:31:09,990 --> 00:31:14,380
OK, here's what we had.

773
00:31:14,380 --> 00:31:20,230
We had M, M rows, N columns.

774
00:31:20,230 --> 00:31:21,780
We started with an interleaving
like this.

775
00:31:27,640 --> 00:31:31,060
Right, and we want to
go into a splitjoin.

776
00:31:31,060 --> 00:31:33,030
And this will be a round-robin
construct.

777
00:31:33,030 --> 00:31:35,090
And what I want you to do is
fill in the round-robin

778
00:31:35,090 --> 00:31:41,020
weight, and also the number of
the streams. And you can have

779
00:31:41,020 --> 00:31:42,270
a round-robin at the bottom.

780
00:31:44,690 --> 00:31:48,240
And when it comes out,
you want the opposite

781
00:31:48,240 --> 00:31:49,490
interleaving.

782
00:31:56,130 --> 00:32:00,990
This is M and N.

783
00:32:00,990 --> 00:32:06,060
OK so there are 3 unknowns here,
what you're doing the

784
00:32:06,060 --> 00:32:09,410
round-robin by-- can you guys
see that over there--

785
00:32:09,410 --> 00:32:12,290
how many parallel streams there
are, and what you're

786
00:32:12,290 --> 00:32:13,630
joining the round-robin by.

787
00:32:17,520 --> 00:32:19,170
Ok so I'm going to give you a
minute to think about this.

788
00:32:19,170 --> 00:32:19,960
Try to think about this.

789
00:32:19,960 --> 00:32:22,170
See if you can figure out what
these constants are.

790
00:32:22,170 --> 00:32:24,260
You just basically want to read
from this data in a row

791
00:32:24,260 --> 00:32:28,770
major pattern, M rows and N
columns, and end up with

792
00:32:28,770 --> 00:32:31,450
something that's column major.

793
00:32:31,450 --> 00:32:35,130
What are the values
for the constant?

794
00:32:35,130 --> 00:32:35,570
Does it makes sense?

795
00:32:35,570 --> 00:32:36,920
Ask a question if it
doesn't make sense.

796
00:32:36,920 --> 00:32:38,870
Yeah?

797
00:32:38,870 --> 00:32:40,980
AUDIENCE: So we assume that
values are going to, based on

798
00:32:40,980 --> 00:32:45,070
the line that you drew, across
it, tests like a stream line?

799
00:32:45,070 --> 00:32:45,960
BILL THIES: Right,
right, right.

800
00:32:45,960 --> 00:32:47,860
So those values are coming
down the stream.

801
00:32:47,860 --> 00:32:49,870
You have a 1-dimensional
stream.

802
00:32:49,870 --> 00:32:50,870
It's interleaved like this.

803
00:32:50,870 --> 00:32:52,850
So you'll be reading
them like this.

804
00:32:52,850 --> 00:32:55,620
And then you the output a
1-dimensional stream that is

805
00:32:55,620 --> 00:32:57,170
threading the columns.

806
00:32:57,170 --> 00:32:57,530
Does that make sense?

807
00:32:57,530 --> 00:33:00,950
Somebody ask another question.

808
00:33:00,950 --> 00:33:01,840
Yeah

809
00:33:01,840 --> 00:33:08,470
AUDIENCE: So the actual matrix
transpose codes, it's my

810
00:33:08,470 --> 00:33:10,690
understand that nobody actually
does it sequentially

811
00:33:10,690 --> 00:33:13,300
like that because of
locality issues.

812
00:33:13,300 --> 00:33:15,690
Instead it's broken
up into blocks.

813
00:33:15,690 --> 00:33:18,900
BILL THIES: So there are
ways to optimize this.

814
00:33:18,900 --> 00:33:21,150
AUDIENCE: And after you've sort
of serialized it, can you

815
00:33:21,150 --> 00:33:25,000
then capture..

816
00:33:25,000 --> 00:33:26,510
PROFESSOR: Guys,
[UNINTELLIGIBLE PHRASE]

817
00:33:26,510 --> 00:33:28,260
has a blocking segment.

818
00:33:28,260 --> 00:33:30,540
So you can heirarchically
do that.

819
00:33:30,540 --> 00:33:34,203
So, normally what happens is you
do the blocks and inside

820
00:33:34,203 --> 00:33:35,580
the blocks, you can
do it again.

821
00:33:35,580 --> 00:33:36,830
You can do it at two
levels, basically.

822
00:33:45,950 --> 00:33:46,390
BILL THIES: Any hypotheses?

823
00:33:46,390 --> 00:33:46,690
Anyone?

824
00:33:46,690 --> 00:33:55,430
AUDIENCE: Is it N for first
one, M for the second one?

825
00:33:55,430 --> 00:33:57,170
BILL THIES: OK, what
do we have

826
00:33:57,170 --> 00:34:00,110
AUDIENCE: N for the first one,
and M for the second?

827
00:34:00,110 --> 00:34:02,490
BILL THIES: N,M and

828
00:34:02,490 --> 00:34:05,105
AUDIENCE: Same for the M?

829
00:34:05,105 --> 00:34:07,990
BILL THIES: Same, M?

830
00:34:07,990 --> 00:34:09,460
AUDIENCE: Yeah.

831
00:34:09,460 --> 00:34:17,560
BILL THIES: OK OK, this
is a hypothesis.

832
00:34:17,560 --> 00:34:19,940
Other hypotheses?

833
00:34:19,940 --> 00:34:21,920
AUDIENCE: N, M, 1.

834
00:34:26,360 --> 00:34:27,610
BILL THIES: OK, anyone else?

835
00:34:30,510 --> 00:34:33,000
Any amendments?

836
00:34:33,000 --> 00:34:34,270
AUDIENCE: How about 1, 1, 1?

837
00:34:34,270 --> 00:34:35,520
BILL THIES: 1, 1, 1?

838
00:34:38,750 --> 00:34:39,890
OK lottery is closing.

839
00:34:39,890 --> 00:34:40,630
Yep?

840
00:34:40,630 --> 00:34:41,945
AUDIENCE: !, M, M

841
00:34:41,945 --> 00:34:44,030
BILL THIES: 1 M, M?

842
00:34:44,030 --> 00:34:53,260
1, M, M, OK and last call?

843
00:34:53,260 --> 00:34:57,890
OK I believe two of the ones
submitted are correct.

844
00:34:57,890 --> 00:35:01,190
Is this and, yeah?

845
00:35:01,190 --> 00:35:05,480
OK I think this, N, M, 1, works
and 1, N, M, works.

846
00:35:05,480 --> 00:35:07,830
So let me explain the 1, N,
M, this is how I like

847
00:35:07,830 --> 00:35:10,000
to think about it.

848
00:35:10,000 --> 00:35:11,510
One way to think about this
is we just want to

849
00:35:11,510 --> 00:35:13,370
move the whole matrix.

850
00:35:13,370 --> 00:35:15,090
You doing OK, yeah?

851
00:35:15,090 --> 00:35:17,220
OK we just want to
move the whole

852
00:35:17,220 --> 00:35:19,600
matrix into this splitjoin.

853
00:35:19,600 --> 00:35:23,290
So the way we can do that
is have M columns of the

854
00:35:23,290 --> 00:35:25,660
splitjoin since we have N
columns of the matrix.

855
00:35:25,660 --> 00:35:28,540
And what we'll do is we'll just
do a round-robin one at a

856
00:35:28,540 --> 00:35:31,400
time, from the columns of
the matrix into the

857
00:35:31,400 --> 00:35:33,260
columns of the splitjoin.

858
00:35:33,260 --> 00:35:35,530
So we'll take the first element,
send it to the last,

859
00:35:35,530 --> 00:35:38,070
next element, next column,
and so on.

860
00:35:38,070 --> 00:35:40,380
So we get the whole matrix here,
Now I want to read it

861
00:35:40,380 --> 00:35:41,540
out column-wise.

862
00:35:41,540 --> 00:35:44,650
So we'll do a joiner of M. We'll
read M from the left

863
00:35:44,650 --> 00:35:48,260
stream that'll read all M items
from the columns, send

864
00:35:48,260 --> 00:35:50,300
it out, and then M items
in the next column, and

865
00:35:50,300 --> 00:35:50,730
then send it out.

866
00:35:50,730 --> 00:35:52,490
Does that make sense?

867
00:35:55,960 --> 00:35:58,370
How many people understood
that?

868
00:35:58,370 --> 00:35:58,760
All right,

869
00:35:58,760 --> 00:36:02,920
So if you think about it, you
can also do it in M, N, 1.

870
00:36:02,920 --> 00:36:07,310
That's basically, yeah,
it's very similar.

871
00:36:14,890 --> 00:36:17,420
OK there we were.

872
00:36:17,420 --> 00:36:18,990
And yes.

873
00:36:18,990 --> 00:36:21,570
We can do 1, N, M. We basically
read the matrix

874
00:36:21,570 --> 00:36:25,420
down, and then pull it
down into column.

875
00:36:25,420 --> 00:36:29,260
And it's very easy to write
this as a transpose.

876
00:36:29,260 --> 00:36:31,660
So we just have a transpose
filter in

877
00:36:31,660 --> 00:36:32,920
which we're doing nothing.

878
00:36:32,920 --> 00:36:36,630
No competition in the actual
rows or the actual contents of

879
00:36:36,630 --> 00:36:37,790
the splitjoin.

880
00:36:37,790 --> 00:36:41,140
And we just split the data by
1, have N different identity

881
00:36:41,140 --> 00:36:44,390
filters, and then join it back
together by M. Any questions

882
00:36:44,390 --> 00:36:45,700
about this?

883
00:36:45,700 --> 00:36:49,080
An interesting way to
write a transpose.

884
00:36:49,080 --> 00:36:52,260
OK so there's one more
opportunity to shine here.

885
00:36:52,260 --> 00:36:54,660
And that's a little more
interesting permutation called

886
00:36:54,660 --> 00:36:55,590
a bit-reversed ordering.

887
00:36:55,590 --> 00:36:59,120
And so this comes up in an FFT
and another algorithm.

888
00:36:59,120 --> 00:37:01,460
The permutation here, is that
we're taking the data

889
00:37:01,460 --> 00:37:03,030
at the index n.

890
00:37:03,030 --> 00:37:05,690
And let's say n has binary
digits b sub 0, b sub

891
00:37:05,690 --> 00:37:07,640
1, up to b sub k.

892
00:37:07,640 --> 00:37:11,390
And we want to rearrange that
data, so this data goes to a

893
00:37:11,390 --> 00:37:15,630
different index with the
reversed bits of its index.

894
00:37:15,630 --> 00:37:19,320
So if it was and index n before,
it ends up at b sub k

895
00:37:19,320 --> 00:37:21,250
down to b sub 1, b sub 0.

896
00:37:21,250 --> 00:37:22,630
So for example, let's
just look at

897
00:37:22,630 --> 00:37:24,330
3-digit binary numbers.

898
00:37:24,330 --> 00:37:28,850
If we have 0, 0, 0, this
is 1 input item.

899
00:37:28,850 --> 00:37:32,720
We're reversing those digits,
you still get 0, 0, 0.

900
00:37:32,720 --> 00:37:35,590
Item at index 1 will
be 0, 0, 1.

901
00:37:35,590 --> 00:37:38,330
We want to reorder
that to index 4.

902
00:37:38,330 --> 00:37:44,270
OK, 1, 0, 0, 0, 1, 0, stays the
same, 0 1, 1, goes to 1,

903
00:37:44,270 --> 00:37:46,000
1, 0 shifts over.

904
00:37:46,000 --> 00:37:49,040
And from there on,
it's symmetric.

905
00:37:49,040 --> 00:37:49,410
AUDIENCE:
[UNINTELLIGIBLE PHRASE]

906
00:37:49,410 --> 00:37:51,540
BILL THIES: Sorry?

907
00:37:51,540 --> 00:37:51,710
AUDIENCE:
[UNINTELLIGIBLE PHRASE].

908
00:37:51,710 --> 00:37:52,190
BILL THIES: OK.

909
00:37:52,190 --> 00:37:54,480
So here I'm writing
the indices.

910
00:37:54,480 --> 00:37:56,790
So I'm not writing the data.

911
00:37:56,790 --> 00:38:00,940
So index 0, 1, 2, 3 up through
index 8., or index 7.

912
00:38:00,940 --> 00:38:04,340
OK and on the bottom you have
indices 0 through 7.

913
00:38:04,340 --> 00:38:07,780
So the data will actually be
moved, reordered like that.

914
00:38:07,780 --> 00:38:08,360
Does that make sense?

915
00:38:08,360 --> 00:38:11,320
It's a reordering of data.

916
00:38:11,320 --> 00:38:12,660
Does this transformation
make sense?

917
00:38:12,660 --> 00:38:13,980
Other questions?

918
00:38:13,980 --> 00:38:17,940
OK, it turns out you can write
this as a splitjoin.

919
00:38:17,940 --> 00:38:21,380
And you just need 3 different
weights for the round-robins.

920
00:38:21,380 --> 00:38:26,010
OK round-robin with 1 weight
on the top here, and two

921
00:38:26,010 --> 00:38:28,480
different round-robin weights
on the bottom.

922
00:38:28,480 --> 00:38:32,460
And here I'm assuming you have 3
binary digits in your index.

923
00:38:32,460 --> 00:38:35,290
So you're reordering
in groups to 8.

924
00:38:35,290 --> 00:38:37,510
OK, so let me give you a second
to think about this.

925
00:38:37,510 --> 00:38:40,670
What are the values
for these weights?

926
00:38:40,670 --> 00:38:41,990
I'll give you a hint
in just a second,

927
00:38:41,990 --> 00:38:44,890
or ask another question.

928
00:38:44,890 --> 00:38:46,096
Yes?

929
00:38:46,096 --> 00:38:47,346
AUDIENCE:
[UNINTELLIGIBLE PHRASE].

930
00:38:51,170 --> 00:38:52,860
BILL THIES: So what we're doing
here, is we're exposing

931
00:38:52,860 --> 00:38:54,240
the communication pattern.

932
00:38:54,240 --> 00:38:54,970
That's the thing.

933
00:38:54,970 --> 00:38:57,800
If you write this in an
imperative way, you end up

934
00:38:57,800 --> 00:39:01,440
basically having your, you're
conflating basically the data

935
00:39:01,440 --> 00:39:03,930
dependencies with the
reordering pattern.

936
00:39:03,930 --> 00:39:06,150
So what I'm trying to convey
here, is how you can use the

937
00:39:06,150 --> 00:39:09,790
streaming model to show how
you're sending data around.

938
00:39:09,790 --> 00:39:11,410
Because when you're on an
architecture-like cell,

939
00:39:11,410 --> 00:39:13,610
everything is about
the data motion.

940
00:39:13,610 --> 00:39:15,460
You're taking data from one
place and you're trying to

941
00:39:15,460 --> 00:39:19,080
efficiently get it to the
producers or the consumers.

942
00:39:19,080 --> 00:39:21,400
And you really need to-- the
compiler needs to understand

943
00:39:21,400 --> 00:39:22,530
the data motion.

944
00:39:22,530 --> 00:39:24,700
And also, it's just another
way of writing it, which I

945
00:39:24,700 --> 00:39:27,100
think it's actually easier to
understand once you see it.

946
00:39:27,100 --> 00:39:29,990
It's a way to think about the
actual reordering from a

947
00:39:29,990 --> 00:39:33,060
theoretical standpoint

948
00:39:33,060 --> 00:39:34,310
Any wagers on this?

949
00:39:39,550 --> 00:39:41,790
So what we to do, if you think
about the bit-reverse

950
00:39:41,790 --> 00:39:45,630
ordering, what we want to do is
distribute the data by the

951
00:39:45,630 --> 00:39:48,300
low-order bits.

952
00:39:48,300 --> 00:39:52,080
And then gather the data
by the high-order bits.

953
00:39:52,080 --> 00:39:54,450
So you want a fine-grained
parity when you're shuffling.

954
00:39:54,450 --> 00:39:55,870
You can also do it the
other way around.

955
00:39:55,870 --> 00:39:57,340
It's totally symmetrical.

956
00:39:57,340 --> 00:39:59,230
But one way to think about it
is, you want a fine-grained

957
00:39:59,230 --> 00:40:02,090
parity when you're distributing
data, and then a

958
00:40:02,090 --> 00:40:03,540
course-grained when you're
coming together.

959
00:40:06,800 --> 00:40:08,050
Anyone see it?

960
00:40:14,120 --> 00:40:15,740
Give you ten more seconds.

961
00:40:21,530 --> 00:40:22,680
Come on?

962
00:40:22,680 --> 00:40:25,860
Yeah it is a little
bit tricky.

963
00:40:25,860 --> 00:40:29,030
OK well let me explain
how it works.

964
00:40:29,030 --> 00:40:31,370
So 1, 2, and 4.

965
00:40:31,370 --> 00:40:35,900
Ok so what these round-robin 1
splitters do, is these are

966
00:40:35,900 --> 00:40:38,510
basically the fine-grained
parities.

967
00:40:38,510 --> 00:40:41,740
So OK, the first round-robin,
that will send all the even

968
00:40:41,740 --> 00:40:45,450
bits to the left, and all the
odd values to the right.

969
00:40:45,450 --> 00:40:47,060
Right, that's the lowest
order bit.

970
00:40:47,060 --> 00:40:48,370
Because it's doing
every other one,

971
00:40:48,370 --> 00:40:50,540
shuffling it left and right.

972
00:40:50,540 --> 00:40:53,630
So this round-robin is seeing
only the even values.

973
00:40:53,630 --> 00:40:55,170
Now it's going to split
them up based on who's

974
00:40:55,170 --> 00:40:56,370
divisible by 4.

975
00:40:56,370 --> 00:40:58,410
Now we'll go to the left
or go to the right.

976
00:40:58,410 --> 00:41:01,610
This is basically shuffling in
the order of the bits, from

977
00:41:01,610 --> 00:41:03,420
low-order bits to
high-order bits.

978
00:41:03,420 --> 00:41:07,250
So these will be ordered in
terms of their low-order bits.

979
00:41:07,250 --> 00:41:10,050
And now we just want to read
them out from left to right.

980
00:41:10,050 --> 00:41:13,010
Just take the order that we made
with those round-robins,

981
00:41:13,010 --> 00:41:15,360
and read them out from
left to right.

982
00:41:15,360 --> 00:41:17,480
And since they have 8 values,
you can do that just by

983
00:41:17,480 --> 00:41:18,410
chunking them up.

984
00:41:18,410 --> 00:41:22,440
We'll read 2, 2, and then we'll
read these two, 2 and 2,

985
00:41:22,440 --> 00:41:23,320
and now put all 8.

986
00:41:23,320 --> 00:41:25,010
Does that make sense?

987
00:41:28,790 --> 00:41:30,100
OK, so yes, it's a bit clever.

988
00:41:30,100 --> 00:41:31,910
So I think it's a nice way
of thinking about what a

989
00:41:31,910 --> 00:41:33,540
bit-reversed ordering means.

990
00:41:33,540 --> 00:41:35,870
And you can write this in
a very general way.

991
00:41:35,870 --> 00:41:37,600
You just have a recursive
bit-reversed

992
00:41:37,600 --> 00:41:39,930
filter for N values.

993
00:41:39,930 --> 00:41:41,950
And base case, you
only have 2.

994
00:41:41,950 --> 00:41:44,200
So there's no reordering
to do when you when

995
00:41:44,200 --> 00:41:45,530
you think about it.

996
00:41:45,530 --> 00:41:47,550
So you're not doing
any computation.

997
00:41:47,550 --> 00:41:50,790
Otherwise yo have a round-robin
split in half, and

998
00:41:50,790 --> 00:41:51,990
then have a coarse-grain
joiner.

999
00:41:51,990 --> 00:41:56,060
So, you get a structure
like this.

1000
00:41:56,060 --> 00:41:58,700
If you're, as you're building
up, just distributing and then

1001
00:41:58,700 --> 00:42:02,330
bringing back together in
a course-grained way.

1002
00:42:02,330 --> 00:42:03,580
OK.

1003
00:42:05,290 --> 00:42:07,720
Let's see how do I
don't do this?

1004
00:42:07,720 --> 00:42:09,990
OK so one thing to notice,
there's one more

1005
00:42:09,990 --> 00:42:10,730
example of a splitjoin.

1006
00:42:10,730 --> 00:42:11,170
Question?

1007
00:42:11,170 --> 00:42:13,140
AUDIENCE: [UNINTELLIGIBLE].

1008
00:42:13,140 --> 00:42:16,990
BILL THIES: OK so in general, at
the base of this hierarchy,

1009
00:42:16,990 --> 00:42:20,260
we could've added some other
filter to do some competition.

1010
00:42:20,260 --> 00:42:22,980
Identity just means we're
doing no computation.

1011
00:42:22,980 --> 00:42:25,230
It's a predefined filter
that just does nothing.

1012
00:42:25,230 --> 00:42:26,460
PROFESSOR: On complex data.

1013
00:42:26,460 --> 00:42:27,610
BILL THIES: On complex data.

1014
00:42:27,610 --> 00:42:29,780
Sorry this is a templated
filter.

1015
00:42:29,780 --> 00:42:32,100
So we're reordering
complex values.

1016
00:42:32,100 --> 00:42:35,490
And we're passing the
input to the output.

1017
00:42:35,490 --> 00:42:35,740
Amir?

1018
00:42:35,740 --> 00:42:40,050
AUDIENCE: [UNINTELLIGIBLE]

1019
00:42:40,050 --> 00:42:41,760
BILL THIES: In general the
language does not have support

1020
00:42:41,760 --> 00:42:42,480
for templates.

1021
00:42:42,480 --> 00:42:44,570
We only do it for these
base classes.

1022
00:42:44,570 --> 00:42:46,570
That's more of an implementation
detail.

1023
00:42:46,570 --> 00:42:46,860
Yeah.

1024
00:42:46,860 --> 00:42:48,010
AUDIENCE: [UNINTELLIGIBLE].

1025
00:42:48,010 --> 00:42:49,420
BILL THIES: Right now
there isn't, but

1026
00:42:49,420 --> 00:42:50,840
nothing fundamental there.

1027
00:42:50,840 --> 00:42:51,060
Yeah?

1028
00:42:51,060 --> 00:42:53,410
Other questions?

1029
00:42:53,410 --> 00:42:54,000
Yeah?

1030
00:42:54,000 --> 00:42:56,864
AUDIENCE: How did you know that
there are two filters

1031
00:42:56,864 --> 00:42:59,510
after that?

1032
00:42:59,510 --> 00:43:01,830
BILL THIES: Two filters,
sorry here?

1033
00:43:01,830 --> 00:43:06,030
AUDIENCE: [UNINTELLIGIBLE].

1034
00:43:06,030 --> 00:43:06,570
BILL THIES: OK.

1035
00:43:06,570 --> 00:43:07,160
OK.

1036
00:43:07,160 --> 00:43:10,180
So we have two add statements
between the

1037
00:43:10,180 --> 00:43:11,740
split and the join.

1038
00:43:11,740 --> 00:43:13,650
So that branches to two
parallel streams.

1039
00:43:13,650 --> 00:43:16,240
Is that your question?

1040
00:43:20,610 --> 00:43:20,673
AUDIENCE: Yeah.

1041
00:43:20,673 --> 00:43:20,990
How do you that theorem, that
there's not like three.

1042
00:43:20,990 --> 00:43:24,590
BILL THIES: So the compiler
will analyze

1043
00:43:24,590 --> 00:43:25,610
this, the compile time.

1044
00:43:25,610 --> 00:43:28,360
And it'll know these values of
N and propagate them down at

1045
00:43:28,360 --> 00:43:29,950
compile time.

1046
00:43:29,950 --> 00:43:32,660
So it'll basically symbolically
evaluate this

1047
00:43:32,660 --> 00:43:35,010
code, see there are two
branches, and you can unroll

1048
00:43:35,010 --> 00:43:36,180
this communication pattern.

1049
00:43:36,180 --> 00:43:41,800
AUDIENCE: I think another way of
thinking about it is, each

1050
00:43:41,800 --> 00:43:43,262
add statement essentially
adds another

1051
00:43:43,262 --> 00:43:44,730
branch in your splitjoin.

1052
00:43:44,730 --> 00:43:49,240
PROFESSOR: That is
another box.

1053
00:43:49,240 --> 00:43:51,880
BILL THIES: It's about to
get clear actually.

1054
00:43:51,880 --> 00:43:53,130
Other questions?

1055
00:43:56,090 --> 00:43:56,580
AUDIENCE: [UNINTELLIGIBLE]

1056
00:43:56,580 --> 00:43:58,690
BILL THIES: That's one way
to think about it, yeah.

1057
00:43:58,690 --> 00:44:00,590
Wait say again, counting?

1058
00:44:00,590 --> 00:44:03,060
AUDIENCE: A rating sort.

1059
00:44:03,060 --> 00:44:05,630
BILL THIES: A rating
sort, right.

1060
00:44:05,630 --> 00:44:07,890
OK, well is it sorting?

1061
00:44:07,890 --> 00:44:09,460
It's not really sorting.

1062
00:44:09,460 --> 00:44:09,620
AUDIENCE: Well it's not
really sorting.

1063
00:44:09,620 --> 00:44:10,730
BILL THIES: It's
a permutation.

1064
00:44:10,730 --> 00:44:13,730
AUDIENCE: Could you
do a rating sort?

1065
00:44:13,730 --> 00:44:14,940
BILL THIES: You could
do a rating sort.

1066
00:44:14,940 --> 00:44:18,050
So actually, what I want to show
next is how you can morph

1067
00:44:18,050 --> 00:44:21,890
this program into a merge sort
by changing only a few lines.

1068
00:44:21,890 --> 00:44:22,640
OK look carefully.

1069
00:44:22,640 --> 00:44:24,710
Don't blink.

1070
00:44:24,710 --> 00:44:26,500
OK there's merge sort.

1071
00:44:26,500 --> 00:44:27,710
So very similar pattern.

1072
00:44:27,710 --> 00:44:29,440
This is one of those idioms.
It's a recursive idiom with

1073
00:44:29,440 --> 00:44:30,860
splitjoins.

1074
00:44:30,860 --> 00:44:33,130
But now in the base case,
we have a sort.

1075
00:44:33,130 --> 00:44:35,840
So we would basically
branch down.

1076
00:44:35,840 --> 00:44:37,760
What we ended up with was
a sort in the base case.

1077
00:44:37,760 --> 00:44:39,800
We're just sorting
a few values.

1078
00:44:39,800 --> 00:44:42,080
And we call [? merge sort ?]
twice again, and

1079
00:44:42,080 --> 00:44:43,410
then we do a merge.

1080
00:44:43,410 --> 00:44:46,150
So instead of identity at the
base case here, we now have a

1081
00:44:46,150 --> 00:44:47,680
basic sorting routine.

1082
00:44:47,680 --> 00:44:50,260
And we merge those results
from both sides.

1083
00:44:50,260 --> 00:44:52,850
And the only thing I changed in
terms of the communication

1084
00:44:52,850 --> 00:44:56,100
rate, is to be more efficient
we just distributed data in

1085
00:44:56,100 --> 00:44:59,120
chuncks instead of doing a
fine-grained splitting.

1086
00:44:59,120 --> 00:45:01,550
Actually you do it however
you want in a merge sort.

1087
00:45:01,550 --> 00:45:02,360
But this is chunked up.

1088
00:45:02,360 --> 00:45:03,960
And let's just zoom in here.

1089
00:45:03,960 --> 00:45:06,420
This is how a merger sort
looks in StreamIt.

1090
00:45:06,420 --> 00:45:10,480
So we split the data two ways,
both directions, come

1091
00:45:10,480 --> 00:45:14,210
together, do a sort on both
sides, and then you merge.

1092
00:45:14,210 --> 00:45:16,840
And so by having the, you know
you can interleave pipelines

1093
00:45:16,840 --> 00:45:18,030
and splitjoins like this.

1094
00:45:18,030 --> 00:45:19,720
So you have these hierarchical
structures that are coming

1095
00:45:19,720 --> 00:45:21,350
back together.

1096
00:45:21,350 --> 00:45:24,420
Does this make sense?

1097
00:45:24,420 --> 00:45:25,890
OK.

1098
00:45:25,890 --> 00:45:29,830
I'm going to hold off on
messaging actually.

1099
00:45:29,830 --> 00:45:32,830
Let me see, what do
I want to cover?

1100
00:45:32,830 --> 00:45:34,330
OK let me actually
skip to the end.

1101
00:45:37,330 --> 00:45:38,580
Oh, I can show you this.

1102
00:45:40,870 --> 00:45:42,380
Yeah, I'm going to cut
short a little bit.

1103
00:45:42,380 --> 00:45:44,860
So here's how other programs
look written in StreamIt.

1104
00:45:44,860 --> 00:45:47,110
OK, you can have
a Bitonic sort.

1105
00:45:47,110 --> 00:45:48,840
OK so you see a lot of these
regular structures.

1106
00:45:48,840 --> 00:45:50,340
And the compiler can unroll
this and then

1107
00:45:50,340 --> 00:45:52,320
match it to the substrate.

1108
00:45:52,320 --> 00:45:53,380
This is how and FFT looks.

1109
00:45:53,380 --> 00:45:56,740
It's quite an ah elegant
implementation of an FFT.

1110
00:45:56,740 --> 00:45:59,430
It'd be good to go into
in more detail.

1111
00:45:59,430 --> 00:46:01,560
You can do things like block
matrix multiply.

1112
00:46:01,560 --> 00:46:04,420
You don't always have to have
column or row-wise ordering.

1113
00:46:04,420 --> 00:46:07,420
It's natural to split
things up like this.

1114
00:46:07,420 --> 00:46:09,400
We have a lot of DSP algorithms,
the filter bank,

1115
00:46:09,400 --> 00:46:12,790
FM radio with equalizer,
radar array front end.

1116
00:46:12,790 --> 00:46:14,040
Here's an MP3 decoder.

1117
00:46:16,190 --> 00:46:18,160
And let's see, I'm going to
skip this section and just

1118
00:46:18,160 --> 00:46:19,410
give you a taste for
the end here.

1119
00:46:23,880 --> 00:46:27,190
I'm skipping a hundred slides.

1120
00:46:27,190 --> 00:46:29,840
Yeah.

1121
00:46:29,840 --> 00:46:31,100
OK so if I give you a feel.

1122
00:46:31,100 --> 00:46:32,990
Our biggest program written
in StreamIt so far, is the

1123
00:46:32,990 --> 00:46:35,680
complete MPEG-2 encoder
and decoder.

1124
00:46:35,680 --> 00:46:36,940
So here is MPEG-2 decoder.

1125
00:46:36,940 --> 00:46:38,720
And I think you've seen
block diagrams of this

1126
00:46:38,720 --> 00:46:40,420
already in the class.

1127
00:46:40,420 --> 00:46:41,800
And so it's a pretty
natural expression.

1128
00:46:41,800 --> 00:46:43,590
You can really get a feel for
the high-level structure of

1129
00:46:43,590 --> 00:46:45,250
the algorithm mapping down.

1130
00:46:45,250 --> 00:46:47,430
And for example, here on the
top we're doing the spatial

1131
00:46:47,430 --> 00:46:49,980
decoding looking inside
each frame.

1132
00:46:49,980 --> 00:46:52,170
Whereas at the bottom we're
doing the temporal decoding

1133
00:46:52,170 --> 00:46:55,580
between two frames, the
motion compensation.

1134
00:46:55,580 --> 00:46:58,970
And one thing that I didn't have
a chance to mention, is

1135
00:46:58,970 --> 00:47:01,540
that we have a concept of
teleport messaging.

1136
00:47:01,540 --> 00:47:05,070
What this means is, I showed you
how the steady state flow

1137
00:47:05,070 --> 00:47:08,270
data goes between these
actors in the stream.

1138
00:47:08,270 --> 00:47:10,820
But sometimes you want to
control the stream as well.

1139
00:47:10,820 --> 00:47:13,280
For example, this is
a variable length

1140
00:47:13,280 --> 00:47:14,050
decoder at the top.

1141
00:47:14,050 --> 00:47:16,010
It's parsing the input data.

1142
00:47:16,010 --> 00:47:18,390
It might want to change how the
processing is happening

1143
00:47:18,390 --> 00:47:19,260
downstream.

1144
00:47:19,260 --> 00:47:22,180
For example, say that you have--
you know in this case

1145
00:47:22,180 --> 00:47:24,430
you have different picture
types coming in.

1146
00:47:24,430 --> 00:47:26,320
And you want to tell other
components to change their

1147
00:47:26,320 --> 00:47:29,370
processing based on a
non-local effect.

1148
00:47:29,370 --> 00:47:32,340
And that's hard to do if you
just want static data rates.

1149
00:47:32,340 --> 00:47:35,180
But what we have is this limited
notion of limited

1150
00:47:35,180 --> 00:47:37,660
dynamism, where you're basically
poking into somebody

1151
00:47:37,660 --> 00:47:38,730
else's stream.

1152
00:47:38,730 --> 00:47:40,600
And we let you do that
very precisely.

1153
00:47:40,600 --> 00:47:42,440
I don't have time to go into
the details, but you can

1154
00:47:42,440 --> 00:47:45,500
basically synchronize the
arrival of these messages with

1155
00:47:45,500 --> 00:47:48,220
the data that's also flowing
through the stream, And so in

1156
00:47:48,220 --> 00:47:50,340
this case, were sending through
the picture type.

1157
00:47:50,340 --> 00:47:52,320
And it really simplifies
the program code.

1158
00:47:52,320 --> 00:47:54,430
I didn't have time for details,
but why don't we put

1159
00:47:54,430 --> 00:47:57,530
in the slides anyway, if
you're interested.

1160
00:47:57,530 --> 00:48:00,350
And if you do a similar
communication pattern in C,

1161
00:48:00,350 --> 00:48:01,760
it's a little bit
of a nightmare.

1162
00:48:01,760 --> 00:48:05,020
You have all these different,
basically memory spaces,

1163
00:48:05,020 --> 00:48:06,320
different files.

1164
00:48:06,320 --> 00:48:08,570
And the control information
is basically going left

1165
00:48:08,570 --> 00:48:09,980
and right all over.

1166
00:48:09,980 --> 00:48:12,650
So this really helps both the
compiler and the programmer as

1167
00:48:12,650 --> 00:48:15,080
well in StreamIt.

1168
00:48:15,080 --> 00:48:16,220
So it's all implemented.

1169
00:48:16,220 --> 00:48:18,720
It's about 2,000 lines
of code in StreamIt.

1170
00:48:18,720 --> 00:48:22,100
Which is about 2/3 of the size
of the C code, taking into

1171
00:48:22,100 --> 00:48:24,020
account similar functionality
there.

1172
00:48:24,020 --> 00:48:25,120
And it's a pretty big program.

1173
00:48:25,120 --> 00:48:27,780
You can write 48 static streams.
And then we expand

1174
00:48:27,780 --> 00:48:30,400
that to more than 600
instantiated filters.

1175
00:48:30,400 --> 00:48:31,970
So this gives you a lot of
flexibility when you're trying

1176
00:48:31,970 --> 00:48:32,750
to get parallelism.

1177
00:48:32,750 --> 00:48:33,260
Question?

1178
00:48:33,260 --> 00:48:35,920
AUDIENCE: When a compiler
downloads all bytes?

1179
00:48:35,920 --> 00:48:39,780
BILL THIES: Oh the object
code, you mean.

1180
00:48:39,780 --> 00:48:41,920
OK, so right now our current
implementation, we duplicate a

1181
00:48:41,920 --> 00:48:42,990
lot of code.

1182
00:48:42,990 --> 00:48:45,600
So it end up being bigger
than it needs to be.

1183
00:48:45,600 --> 00:48:46,820
There's no reason for
us to do that.

1184
00:48:46,820 --> 00:48:48,630
That's kind of a-- we have
a research compiler

1185
00:48:48,630 --> 00:48:49,640
that make that easy.

1186
00:48:49,640 --> 00:48:52,240
AUDIENCE: So object-wise
, its not data.

1187
00:48:52,240 --> 00:48:54,350
BILL THIES: Object-wise we
still need to do that

1188
00:48:54,350 --> 00:48:55,120
comparison.

1189
00:48:55,120 --> 00:48:56,830
Yeah that's a good question.

1190
00:48:56,830 --> 00:48:58,660
Yeah.

1191
00:48:58,660 --> 00:49:00,940
Other questions?

1192
00:49:00,940 --> 00:49:03,070
OK so let me cut to the end.

1193
00:49:03,070 --> 00:49:05,360
OK, so we have the StreamIt
language.

1194
00:49:05,360 --> 00:49:05,940
And we think it really

1195
00:49:05,940 --> 00:49:07,570
preserves the program structure.

1196
00:49:07,570 --> 00:49:09,620
It's a new way of thinking about
how you orchestrate the

1197
00:49:09,620 --> 00:49:12,100
data reordering with the
splitjoins, showing you who is

1198
00:49:12,100 --> 00:49:14,470
communicating to who, and how
you can stitch together

1199
00:49:14,470 --> 00:49:16,980
different pieces in your
program development.

1200
00:49:16,980 --> 00:49:20,400
And again, really our goal is to
get this scalable multicore

1201
00:49:20,400 --> 00:49:21,170
performance.

1202
00:49:21,170 --> 00:49:22,840
But you can't get people
on board just on

1203
00:49:22,840 --> 00:49:23,840
a performance stat.

1204
00:49:23,840 --> 00:49:25,540
You need to show them a new
programming model that

1205
00:49:25,540 --> 00:49:27,250
actually makes their
lives easier.

1206
00:49:27,250 --> 00:49:28,870
So that's what we're
working on.

1207
00:49:28,870 --> 00:49:30,060
And thinks with listening.

1208
00:49:30,060 --> 00:49:35,520
[APPLAUSE]

1209
00:49:35,520 --> 00:49:37,450
BILL THIES: Any last
questions?

1210
00:49:37,450 --> 00:49:37,940
Yes?

1211
00:49:37,940 --> 00:49:42,770
AUDIENCE: So in the
anti-decoder, you have a lot

1212
00:49:42,770 --> 00:49:47,110
of computations size that were
not sequential streams. Like

1213
00:49:47,110 --> 00:49:52,300
for example, the output of the
distinct cosine transform is

1214
00:49:52,300 --> 00:49:57,750
not a stream of pixel, you are
going to have coefficients and

1215
00:49:57,750 --> 00:49:58,080
things like that.

1216
00:49:58,080 --> 00:49:59,500
Which are a logical
sort of chunk.

1217
00:49:59,500 --> 00:50:04,780
BILL THIES: Yes so depending
on the granularity of the

1218
00:50:04,780 --> 00:50:07,730
competition, you don't need to
pass individual values over

1219
00:50:07,730 --> 00:50:08,610
the stream.

1220
00:50:08,610 --> 00:50:10,540
For example, you can have a
stream that inputs the whole

1221
00:50:10,540 --> 00:50:12,980
array at a time.

1222
00:50:12,980 --> 00:50:16,120
And so we basically advocate
that if you have something

1223
00:50:16,120 --> 00:50:18,520
that course-grained, you should
be passing an array or

1224
00:50:18,520 --> 00:50:21,660
a macroblock, in the case of
MPEG, or a set of coefficients

1225
00:50:21,660 --> 00:50:22,650
in a structure.

1226
00:50:22,650 --> 00:50:25,060
So when you have coarse-grain
parallelism, you write your

1227
00:50:25,060 --> 00:50:26,630
program in a course-grained
way.

1228
00:50:26,630 --> 00:50:28,600
The fine-grained things
I showed for the bit

1229
00:50:28,600 --> 00:50:30,560
interleaving and so on, is
more for the fine-grained

1230
00:50:30,560 --> 00:50:33,260
programs. AUDIENCE:
Can you do both?

1231
00:50:33,260 --> 00:50:38,305
In the sense that can you stream
over an array, so it's

1232
00:50:38,305 --> 00:50:39,780
stream of stream, so to speak.

1233
00:50:39,780 --> 00:50:41,270
BILL THIES: So there's an
interesting multidimensional

1234
00:50:41,270 --> 00:50:42,760
problem there.

1235
00:50:42,760 --> 00:50:45,700
Right now we've taken a
1-dimensional approach.

1236
00:50:45,700 --> 00:50:48,360
So far it's basically the
programmer has to set an

1237
00:50:48,360 --> 00:50:51,500
iteration order, and end up with
a 1-dimensional stream

1238
00:50:51,500 --> 00:50:53,200
coming into and out
of every filter.

1239
00:50:53,200 --> 00:50:54,780
We're working on
extending that.

1240
00:50:54,780 --> 00:50:57,020
Yeah, but when you have
basically streams of

1241
00:50:57,020 --> 00:51:00,210
2-dimensional data, you like the
freedom to either iterate

1242
00:51:00,210 --> 00:51:02,340
basically in time or
in space, depending

1243
00:51:02,340 --> 00:51:03,350
on what you're doing.

1244
00:51:03,350 --> 00:51:05,180
And so I think that's more
of a research problem.

1245
00:51:05,180 --> 00:51:08,240
So far we're just been doing a
1-dimensional representation.

1246
00:51:08,240 --> 00:51:11,590
Yeah good point.

1247
00:51:11,590 --> 00:51:13,650
Other questions?

1248
00:51:13,650 --> 00:51:14,370
Yeah?

1249
00:51:14,370 --> 00:51:18,290
AUDIENCE: Why did you decide on
this synchronous dataflow

1250
00:51:18,290 --> 00:51:19,620
model as opposed to something
more general?

1251
00:51:19,620 --> 00:51:22,180
BILL THIES: So our philosophy
has been that you want to

1252
00:51:22,180 --> 00:51:27,180
start with the most kind of
basic block of a stream

1253
00:51:27,180 --> 00:51:29,850
program, and optimize
it really well.

1254
00:51:29,850 --> 00:51:31,340
And then you can stitch
those together into

1255
00:51:31,340 --> 00:51:32,510
higher level blocks.

1256
00:51:32,510 --> 00:51:34,770
So we think of synchronous
dataflow as being kind of the

1257
00:51:34,770 --> 00:51:36,750
basic block of streaming.

1258
00:51:36,750 --> 00:51:38,790
You know what's coming in, you
know what's coming out.

1259
00:51:38,790 --> 00:51:41,390
And even if you have a more
general model, they'll be

1260
00:51:41,390 --> 00:51:44,750
pieces that fit under their
synchronous dataflow model.

1261
00:51:44,750 --> 00:51:46,490
And so we saw a lot
of optimizations

1262
00:51:46,490 --> 00:51:47,720
opportunities in there.

1263
00:51:47,720 --> 00:51:50,100
And really knowing those IO
rates can let you do a lot of

1264
00:51:50,100 --> 00:51:52,180
things that you can't do
in a general model.

1265
00:51:52,180 --> 00:51:55,010
So I wanted to get the simple
case right first. And actually

1266
00:51:55,010 --> 00:51:58,010
kind of our focus now is on
expanding, and how do you look

1267
00:51:58,010 --> 00:51:59,960
at the heterogeneous system, and
how do you optimize a more

1268
00:51:59,960 --> 00:52:01,980
dynamic system.

1269
00:52:01,980 --> 00:52:03,230
Yep.

1270
00:52:05,270 --> 00:52:06,520
Other questions?

1271
00:52:10,520 --> 00:52:12,420
OK.

1272
00:52:12,420 --> 00:52:13,180
Yes.

1273
00:52:13,180 --> 00:52:15,140
You can check out
our web page.

1274
00:52:15,140 --> 00:52:16,550
Yeah if you Google for
StreamIt, I'm sure

1275
00:52:16,550 --> 00:52:17,770
you can find it.

1276
00:52:17,770 --> 00:52:19,930
Yeah, we have a public
release.

1277
00:52:19,930 --> 00:52:22,380
Yes, send us any problems.
It's actually it's a good

1278
00:52:22,380 --> 00:52:25,050
test. We want to make sure
it works for everyone.

1279
00:52:25,050 --> 00:52:26,830
But I mean, we've had, you know,
hundreds of downloads.

1280
00:52:26,830 --> 00:52:28,930
There are a lot of people
using StreamIt.

1281
00:52:28,930 --> 00:52:31,440
It shouldn't break if
you download it.

1282
00:52:31,440 --> 00:52:32,690
Yeah.

1283
00:52:34,360 --> 00:52:35,500
OK good.

1284
00:52:35,500 --> 00:52:36,750
Thanks.