1
00:00:01,099 --> 00:00:06,540
We've been designing our processing pipelines
to have all the stages operate in lock step,

2
00:00:06,540 --> 00:00:12,150
choosing the clock period to accommodate the
worst-case processing time over all the stages.

3
00:00:12,150 --> 00:00:16,830
This is what we'd call a synchronous, globally
timed system.

4
00:00:16,830 --> 00:00:21,180
But what if there are data dependencies in
the processing time, i.e., if for some data

5
00:00:21,180 --> 00:00:25,980
inputs a particular processing stage might
be able to produce its output in a shorter

6
00:00:25,980 --> 00:00:27,170
time?

7
00:00:27,170 --> 00:00:32,479
Can we design a system that could take advantage
of that opportunity to increase throughput?

8
00:00:32,479 --> 00:00:37,159
One alternative is to continue to use a single
system clock, but for each stage to signal

9
00:00:37,159 --> 00:00:42,950
when it's ready for a new input and when it
has a new output ready for the next stage.

10
00:00:42,950 --> 00:00:48,089
It's fun to design a simple 2-signal handshake
protocol to reliably transfer data from one

11
00:00:48,089 --> 00:00:50,219
stage to the next.

12
00:00:50,219 --> 00:00:55,309
The upstream stage produces a signal called
HERE-IS-X to indicate that is has new data

13
00:00:55,309 --> 00:00:57,180
for the downstream stage.

14
00:00:57,180 --> 00:01:01,559
And the downstream stage produces a signal
called GOT-X to indicate when it is willing

15
00:01:01,559 --> 00:01:02,789
to consume data.

16
00:01:02,789 --> 00:01:08,280
It's a synchronous system so the signal values
are only examined on the rising edge of the

17
00:01:08,280 --> 00:01:10,539
clock.

18
00:01:10,539 --> 00:01:16,220
The handshake protocol works as follows:
the upstream stage asserts HERE-IS-X if it

19
00:01:16,220 --> 00:01:21,420
will have a new output value available at
the next rising edge of the clock.

20
00:01:21,420 --> 00:01:26,420
The downstream stage asserts GOT-X if it will
grab the next output at the rising edge of

21
00:01:26,420 --> 00:01:27,840
the clock.

22
00:01:27,840 --> 00:01:32,840
Both stages look at the signals on the rising
edge of the clock to decide what to do next.

23
00:01:32,840 --> 00:01:38,970
If both stages see that HERE-IS-X and GOT-X
are asserted at the same clock edge, the handshake

24
00:01:38,970 --> 00:01:44,030
is complete and the data transfer happens
at that clock edge.

25
00:01:44,030 --> 00:01:48,289
Either stage can delay a transfer if they
are still working on producing the next output

26
00:01:48,289 --> 00:01:50,658
or consuming the previous input.

27
00:01:50,658 --> 00:01:56,549
It's possible, although considerably more
difficult, to build a clock-free asynchronous

28
00:01:56,549 --> 00:02:00,969
self-timed system that uses a similar handshake
protocol.

29
00:02:00,969 --> 00:02:03,729
The handshake involves four phases.

30
00:02:03,729 --> 00:02:09,940
In phase 1, when the upstream stage has a
new output and GOT-X is deasserted, it asserts

31
00:02:09,940 --> 00:02:15,820
its HERE-IS-X signal and then waits to see
the downstream stage's reply on the GOT-X

32
00:02:15,820 --> 00:02:17,840
signal.

33
00:02:17,840 --> 00:02:24,610
In phase 2, the downstream stage, seeing that
HERE-IS-X is asserted, asserts GOT-X when

34
00:02:24,610 --> 00:02:26,840
it has consumed the available input.

35
00:02:26,840 --> 00:02:33,610
In phase 3, the downstream stage waits to
see the HERE-IS-X go low, indicating that

36
00:02:33,610 --> 00:02:38,590
the upstream stage has successfully received
the GOT-X signal.

37
00:02:38,590 --> 00:02:44,860
In phase 4, once HERE-IS-X is deasserted,
the downstream stage deasserts GOT-X and the

38
00:02:44,860 --> 00:02:48,670
transfer handshake is ready to begin again.

39
00:02:48,670 --> 00:02:53,230
Note that the upstream stage waits until it
sees the GOT-X deasserted before starting

40
00:02:53,230 --> 00:02:54,810
the next handshake.

41
00:02:54,810 --> 00:02:59,440
The timing of the system is based on the transitions
of the handshake signals, which can happen

42
00:02:59,440 --> 00:03:04,250
at any time the conditions required by the
protocol are satisfied.

43
00:03:04,250 --> 00:03:07,300
No need for a global clock here!

44
00:03:07,300 --> 00:03:11,330
It's fun to think about how this self-timed
protocol might work when there are multiple

45
00:03:11,330 --> 00:03:14,840
downstream modules, each with their own internal
timing.

46
00:03:14,840 --> 00:03:20,370
In this example, A's output is consumed by
both the B and C stages.

47
00:03:20,370 --> 00:03:25,440
We need a special circuit, shown as a yellow
box in the diagram, to combine the GOT-X signals

48
00:03:25,440 --> 00:03:31,100
from the B and C stages and produce a summary
signal for the A stage.

49
00:03:31,100 --> 00:03:34,610
Let's take a quick look at the timing diagram
shown here.

50
00:03:34,610 --> 00:03:40,550
After A has asserted HERE-IS-X, the circuit
in the yellow box waits until both the B and

51
00:03:40,550 --> 00:03:46,640
the C stage have asserted their GOT-X signals
before asserting GOT-X to the A stage.

52
00:03:46,640 --> 00:03:52,260
At this point the A stage deasserts HERE-IS-X,
then the yellow box waits until both the B

53
00:03:52,260 --> 00:03:59,370
and C stages have deasserted their GOT-X signals,
before deasserting GOT-X to the A stage.

54
00:03:59,370 --> 00:04:01,300
Let's watch the system in action!

55
00:04:01,300 --> 00:04:05,960
When a signal is asserted we'll show it in
red, otherwise it's shown in black.

56
00:04:05,960 --> 00:04:11,120
A new value for the A stage arrives on A's
data input and the module supplying the value

57
00:04:11,120 --> 00:04:16,589
then asserts its HERE-IS-X signal to let A
know it has a new input.

58
00:04:16,589 --> 00:04:22,540
At some point later, A signals GOT-X back
upstream to indicate that it has consumed

59
00:04:22,540 --> 00:04:29,200
the value, then the upstream stage deasserts
HERE-IS-X, followed by A deasserting its GOT-X

60
00:04:29,200 --> 00:04:30,200
signal.

61
00:04:30,200 --> 00:04:34,120
This completes the transfer of the data to
the A stage.

62
00:04:34,120 --> 00:04:39,781
When A is ready to send a new output to the
B and C stages, it checks that its GOT-X input

63
00:04:39,781 --> 00:04:44,909
is deasserted (which it is),
so it asserts the new output value and signals

64
00:04:44,909 --> 00:04:49,900
HERE-IS-X to the yellow box which forwards
the signal to the downstream stages.

65
00:04:49,900 --> 00:04:55,490
B is ready to consume the new input and so
asserts its GOT-X output.

66
00:04:55,490 --> 00:05:01,470
Note that C is still waiting for its second
input and has yet to assert its GOT-X output.

67
00:05:01,470 --> 00:05:07,740
After B finishes its computation, it supplies
a new value to C and asserts its HERE-IS-X

68
00:05:07,740 --> 00:05:12,370
output to let C know its second input is ready.

69
00:05:12,370 --> 00:05:18,720
Now C is happy and signals both upstream stages
that it has consumed its two inputs.

70
00:05:18,720 --> 00:05:25,150
Now that both GOT-X inputs are asserted, the
yellow box asserts A's GOT-X input to let

71
00:05:25,150 --> 00:05:28,620
it know that the data has been transferred.

72
00:05:28,620 --> 00:05:34,730
Meanwhile B completes its part of the handshake,
and C completes its transaction with B and

73
00:05:34,730 --> 00:05:40,530
A deasserts HERE-IS-X to indicate that it
has seen its GOT-X input.

74
00:05:40,530 --> 00:05:46,780
When the B and C stages see their HERE-IS-X
inputs go low, they their finish their handshakes

75
00:05:46,780 --> 00:05:51,159
by deasserting their GOT-X outputs,
and when they're both low, the yellow box

76
00:05:51,159 --> 00:05:56,490
lets A know the handshake is complete by deserting
A's GOT-X input.

77
00:05:56,490 --> 00:05:57,490
Whew!

78
00:05:57,490 --> 00:06:02,100
The system has returned to the initial state
where A is now ready to accept some future

79
00:06:02,100 --> 00:06:04,500
input value.

80
00:06:04,500 --> 00:06:08,730
This an elegant design based entirely on transition
signaling.

81
00:06:08,730 --> 00:06:14,000
Each module is in complete control of when
it consumes inputs and produces outputs, and

82
00:06:14,000 --> 00:06:19,020
so the system can process data at the fastest
possible speed, rather than waiting for the

83
00:06:19,020 --> 00:06:20,710
worst-case processing delay.