1
00:00:00,070 --> 00:00:02,430
The following content is
provided under a Creative

2
00:00:02,430 --> 00:00:03,820
Commons license.

3
00:00:03,820 --> 00:00:06,060
Your support will help
MIT OpenCourseWare

4
00:00:06,060 --> 00:00:10,140
continue to offer high quality
educational resources for free.

5
00:00:10,140 --> 00:00:12,690
To make a donation or to
view additional materials

6
00:00:12,690 --> 00:00:16,600
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:16,600 --> 00:00:17,255
at ocw.mit.edu.

8
00:00:25,835 --> 00:00:26,960
PROFESSOR: All right, guys.

9
00:00:26,960 --> 00:00:28,800
So let's get started.

10
00:00:28,800 --> 00:00:31,190
Welcome back from what I
hope was an exciting holiday

11
00:00:31,190 --> 00:00:32,560
for everyone.

12
00:00:32,560 --> 00:00:35,360
So today we're going to talk
about user authentication.

13
00:00:35,360 --> 00:00:37,890
So the basic challenge that
we want to address today

14
00:00:37,890 --> 00:00:42,420
is how can human users prove
their identity to a program?

15
00:00:42,420 --> 00:00:45,680
In particular, the paper that
was assigned for today's class

16
00:00:45,680 --> 00:00:47,635
addresses an
existential question

17
00:00:47,635 --> 00:00:48,930
in the security community.

18
00:00:48,930 --> 00:00:53,240
Is there anything better than
passwords for authentication?

19
00:00:53,240 --> 00:00:57,430
So at a high level it seems like
passwords are a terrible idea.

20
00:00:57,430 --> 00:01:00,010
So they have very low entropy,
its very easy for attackers

21
00:01:00,010 --> 00:01:01,380
to guess them.

22
00:01:01,380 --> 00:01:03,130
Also the security
questions that we

23
00:01:03,130 --> 00:01:05,481
use to recover
from lost passwords

24
00:01:05,481 --> 00:01:07,480
often have even lower
entropy than the passwords

25
00:01:07,480 --> 00:01:10,330
themselves, which also
seems like a problem.

26
00:01:10,330 --> 00:01:15,180
And even worse, users typically
will use the same password

27
00:01:15,180 --> 00:01:16,987
across a lot of different sites.

28
00:01:16,987 --> 00:01:19,195
So that means that the
vulnerability in one password,

29
00:01:19,195 --> 00:01:22,820
if it's easy to guess, could
expose a user's activity

30
00:01:22,820 --> 00:01:24,400
across a wide range of sites.

31
00:01:24,400 --> 00:01:27,030
So as the paper for
today's class states,

32
00:01:27,030 --> 00:01:28,930
I love this quote, "the
continued domination

33
00:01:28,930 --> 00:01:31,620
of passwords over
all of the methods

34
00:01:31,620 --> 00:01:34,850
of in-user authentication
is a major embarrassment

35
00:01:34,850 --> 00:01:36,110
for security researchers."

36
00:01:36,110 --> 00:01:37,920
All right, so the community
just seething out there,

37
00:01:37,920 --> 00:01:39,460
they want some
better alternative.

38
00:01:39,460 --> 00:01:41,380
But it's not clear
if there actually

39
00:01:41,380 --> 00:01:45,910
is an authentication scheme
that actually totally dominates

40
00:01:45,910 --> 00:01:48,630
passwords, that's more usable,
that's more deployable,

41
00:01:48,630 --> 00:01:49,830
that's more secure.

42
00:01:49,830 --> 00:01:52,210
So in today's lecture, we'll
basically do three things.

43
00:01:52,210 --> 00:01:53,710
So first of all,
we're going to look

44
00:01:53,710 --> 00:01:55,970
and we're going to see how
current passwords can work.

45
00:01:55,970 --> 00:01:58,660
Then we're going to talk
about the desirable properties

46
00:01:58,660 --> 00:02:01,630
at a high level for any
authentication scheme.

47
00:02:01,630 --> 00:02:05,112
And then we're finally going to
look at what the paper gives us

48
00:02:05,112 --> 00:02:07,320
in terms of metrics for
authenticating authentication

49
00:02:07,320 --> 00:02:08,740
schemes, and we're
going to see how

50
00:02:08,740 --> 00:02:10,156
some of these other
authentication

51
00:02:10,156 --> 00:02:12,230
schemes actually
compared to passwords.

52
00:02:12,230 --> 00:02:14,860
So in [INAUDIBLE]
what is a password?

53
00:02:14,860 --> 00:02:26,250
So a password is a
secret that is shared

54
00:02:26,250 --> 00:02:30,400
between a user and a server.

55
00:02:34,540 --> 00:02:37,800
So the naive implementation
of a password scheme

56
00:02:37,800 --> 00:02:41,160
is to basically
just have a table

57
00:02:41,160 --> 00:02:44,780
on the server side that
essentially just maps

58
00:02:44,780 --> 00:02:50,258
user names to passwords.

59
00:02:50,258 --> 00:02:52,008
That's the simplest
way for you to imagine

60
00:02:52,008 --> 00:02:54,980
implementing one of the
authentication schemes-- user

61
00:02:54,980 --> 00:02:58,280
passes into their user name
and the password, server

62
00:02:58,280 --> 00:02:59,826
network does a look
up in this table,

63
00:02:59,826 --> 00:03:01,700
compares the password
of the client supplied,

64
00:03:01,700 --> 00:03:02,360
what's in here.

65
00:03:02,360 --> 00:03:04,320
If everything's good,
the user's authenticated.

66
00:03:04,320 --> 00:03:06,176
So clearly the
problem with this is

67
00:03:06,176 --> 00:03:09,212
that if the attacker
compromises the server,

68
00:03:09,212 --> 00:03:10,670
then he can just
look at this table

69
00:03:10,670 --> 00:03:13,959
and then get all the uses
passwords in the queue.

70
00:03:13,959 --> 00:03:15,270
So that's clearly a bad thing.

71
00:03:15,270 --> 00:03:19,170
So perhaps an
improved solution is

72
00:03:19,170 --> 00:03:23,280
to have the server store
a table that looks like.

73
00:03:23,280 --> 00:03:25,320
So once again, it'd
match the user name

74
00:03:25,320 --> 00:03:31,055
but now it actually match
to hash of the password.

75
00:03:34,360 --> 00:03:37,000
So user client's gonna
supply their clear text

76
00:03:37,000 --> 00:03:39,590
password to the
server, the server

77
00:03:39,590 --> 00:03:41,480
will then take that
clear text password,

78
00:03:41,480 --> 00:03:43,870
hash it, do look at the
table, and once again see

79
00:03:43,870 --> 00:03:46,620
if the user is who he or
she says that they are.

80
00:03:46,620 --> 00:03:49,490
So the advantage
of this scheme is

81
00:03:49,490 --> 00:03:52,080
that by designed
these hash functions

82
00:03:52,080 --> 00:03:54,040
are difficult to invert.

83
00:03:54,040 --> 00:03:57,304
So if this table is
lost, it's leaked somehow

84
00:03:57,304 --> 00:03:58,928
or the attacker
compromised the server,

85
00:03:58,928 --> 00:04:00,969
and the attacker could
look at these things here,

86
00:04:00,969 --> 00:04:03,180
but it's difficult
for the attackers

87
00:04:03,180 --> 00:04:05,695
to say, OK, this sort of
string of random alpha

88
00:04:05,695 --> 00:04:07,460
numeric characters here.

89
00:04:07,460 --> 00:04:10,592
Here's a pre-image that
was used as the input

90
00:04:10,592 --> 00:04:13,660
of the hast function
[INAUDIBLE] that value there.

91
00:04:13,660 --> 00:04:16,089
So that at least
is the nice thing

92
00:04:16,089 --> 00:04:18,720
about these hashes in theory.

93
00:04:18,720 --> 00:04:21,370
Now in practice,
attackers don't actually

94
00:04:21,370 --> 00:04:23,540
have to launch
brute force attacks

95
00:04:23,540 --> 00:04:28,150
to figure out what the preimages
for these hash values are.

96
00:04:28,150 --> 00:04:30,770
So attackers can actually
take advantage of the fact

97
00:04:30,770 --> 00:04:36,595
that passwords in practice
have skewed distribution.

98
00:04:40,200 --> 00:04:43,150
And by skewed
distributions, I mean

99
00:04:43,150 --> 00:04:45,850
that-- let's say that we
knew that all passwords were

100
00:04:45,850 --> 00:04:47,150
20 characters long.

101
00:04:47,150 --> 00:04:50,460
It's not like users actually
pick passwords that's

102
00:04:50,460 --> 00:04:54,080
sort of exist in all
places in that space of 20

103
00:04:54,080 --> 00:04:55,340
possible characters.

104
00:04:55,340 --> 00:05:00,580
In practice, people pick
passwords like 1, 2, 3 or todd

105
00:05:00,580 --> 00:05:02,002
or things like this.

106
00:05:02,002 --> 00:05:03,960
So in fact there's been
these empirical studies

107
00:05:03,960 --> 00:05:08,180
of how passwords work
and a lot of times

108
00:05:08,180 --> 00:05:18,764
these studies find things
like the top 5,000 passwords

109
00:05:18,764 --> 00:05:21,710
cover about 20% of users.

110
00:05:25,032 --> 00:05:26,490
So what that means,
in other words,

111
00:05:26,490 --> 00:05:29,970
is that the attacker has
a database of those 5,000

112
00:05:29,970 --> 00:05:30,840
passwords.

113
00:05:30,840 --> 00:05:32,830
The attacker can
just hash those,

114
00:05:32,830 --> 00:05:37,050
and then when the attacker looks
at this stolen password table,

115
00:05:37,050 --> 00:05:39,640
can just see if one
of those things that

116
00:05:39,640 --> 00:05:44,408
come from this 5,000 large
list match over here.

117
00:05:44,408 --> 00:05:46,344
And so empirically
speaking, the attacker

118
00:05:46,344 --> 00:05:49,260
would be able to recover about
20% of passwords that way.

119
00:05:49,260 --> 00:05:55,050
And so, folks at Yahoo
found that passwords

120
00:05:55,050 --> 00:06:02,832
have roughly 10 to 20 bits
of intricate, 10 to 20 bits

121
00:06:02,832 --> 00:06:04,760
of randomness in them.

122
00:06:04,760 --> 00:06:08,360
And that's actually
not that big.

123
00:06:08,360 --> 00:06:10,435
So, for example, if you
think about what might

124
00:06:10,435 --> 00:06:11,560
this hash function here be?

125
00:06:11,560 --> 00:06:14,620
So maybe it's something like
shop, something like this.

126
00:06:14,620 --> 00:06:17,880
So modern machines
actually calculate millions

127
00:06:17,880 --> 00:06:20,260
of these hashes every second.

128
00:06:20,260 --> 00:06:22,660
So the fact that hash
function by design

129
00:06:22,660 --> 00:06:25,050
are suppose to be
easy to calculate

130
00:06:25,050 --> 00:06:26,450
so it'd be fast calculate.

131
00:06:26,450 --> 00:06:27,950
Combined with this
fact that there'd

132
00:06:27,950 --> 00:06:29,700
be skewed password
distributions,

133
00:06:29,700 --> 00:06:32,500
means that in principle, this
scheme here is not as secure

134
00:06:32,500 --> 00:06:34,510
as it might seem.

135
00:06:34,510 --> 00:06:36,800
So one thing you
can imagine to try

136
00:06:36,800 --> 00:06:40,660
to make life more
difficult on the attacker

137
00:06:40,660 --> 00:06:46,860
is you could imagine that you
use expensive key derivation

138
00:06:46,860 --> 00:06:47,360
function.

139
00:06:53,290 --> 00:06:55,280
And so by key
derivation function,

140
00:06:55,280 --> 00:06:58,867
I just mean this thing up here.

141
00:06:58,867 --> 00:07:01,200
This thing that's taking the
passwords as input and then

142
00:07:01,200 --> 00:07:03,505
generate something that's
stored on the server.

143
00:07:03,505 --> 00:07:05,213
So what's nice about
these key derivation

144
00:07:05,213 --> 00:07:09,915
functions is it actually
have tunable cost.

145
00:07:09,915 --> 00:07:11,930
So you can basically
turn this knob

146
00:07:11,930 --> 00:07:14,516
and make that function
run slower or faster

147
00:07:14,516 --> 00:07:15,640
depending on what you want.

148
00:07:15,640 --> 00:07:17,525
And so the idea here
is that, let's say

149
00:07:17,525 --> 00:07:19,650
that you're going to use
a key derivation function.

150
00:07:19,650 --> 00:07:28,020
So assume these examples are
like PBKDF2, or maybe BCrypt

151
00:07:28,020 --> 00:07:30,901
so you can look these up using
the miracle of the internet

152
00:07:30,901 --> 00:07:32,400
if you care to know
more about them.

153
00:07:32,400 --> 00:07:34,330
But the base idea
is let's imagine

154
00:07:34,330 --> 00:07:36,040
that one of these key
derivation function

155
00:07:36,040 --> 00:07:40,820
took a second to calculate, as
opposed to a few milliseconds.

156
00:07:40,820 --> 00:07:42,490
That actually makes
the attacker's job

157
00:07:42,490 --> 00:07:45,760
much more difficult. Because
when the attacker is trying

158
00:07:45,760 --> 00:07:49,090
to, let's say, generate
values for these 5,000 topmost

159
00:07:49,090 --> 00:07:51,720
passwords, it's going to
take the attacker much longer

160
00:07:51,720 --> 00:07:52,760
to do that.

161
00:07:52,760 --> 00:07:55,770
So does that all makes
sense how these things work?

162
00:07:55,770 --> 00:07:56,940
Pretty straight forward.

163
00:07:56,940 --> 00:07:59,260
So internally these key
derivation functions

164
00:07:59,260 --> 00:08:02,675
often operate by repeatedly
calling a hash multiple,

165
00:08:02,675 --> 00:08:03,500
multiple times.

166
00:08:03,500 --> 00:08:05,960
So that's all pretty
straightforward.

167
00:08:05,960 --> 00:08:08,712
So you might say, well,
does this solve the problem?

168
00:08:08,712 --> 00:08:10,753
So can we just use these
expensive key derivation

169
00:08:10,753 --> 00:08:12,590
function and be done with it?

170
00:08:12,590 --> 00:08:14,920
So if this was a security
class, the answer is no.

171
00:08:14,920 --> 00:08:17,820
So one problem is that the
adversary can build something

172
00:08:17,820 --> 00:08:23,470
called rainbow tables.

173
00:08:23,470 --> 00:08:29,990
And so a rainbow table
is basically just a map

174
00:08:29,990 --> 00:08:35,490
of a password to hash out.

175
00:08:39,039 --> 00:08:43,532
And so the insight here is that
even if the system is using

176
00:08:43,532 --> 00:08:45,665
one of these expensive
key derivation function,

177
00:08:45,665 --> 00:08:49,840
the attacker can calculate
one of these tables once.

178
00:08:49,840 --> 00:08:52,396
It might be a little bit painful
because each key derivation

179
00:08:52,396 --> 00:08:53,950
function indication is slow.

180
00:08:53,950 --> 00:08:56,780
But the attacker can build
this table once and then use

181
00:08:56,780 --> 00:09:00,030
that to crack all subsequent
systems the attacker can

182
00:09:00,030 --> 00:09:04,120
break into that use that
same key derivation function.

183
00:09:04,120 --> 00:09:05,980
So that's how
rainbow tables work.

184
00:09:05,980 --> 00:09:07,827
And once again, to
maximize the cost benefit

185
00:09:07,827 --> 00:09:09,660
of building this rainbow
table, the attacker

186
00:09:09,660 --> 00:09:12,700
could take advantage of the
skewed password distributions

187
00:09:12,700 --> 00:09:13,450
I can see up here.

188
00:09:13,450 --> 00:09:15,040
So the attacker might
only build a rainbow table

189
00:09:15,040 --> 00:09:17,245
for some small set of
all possible passwords.

190
00:09:17,245 --> 00:09:19,910
AUDIENCE: So salting makes
this much more difficult.

191
00:09:19,910 --> 00:09:21,410
PROFESSOR: Yeah,
yeah, that's right.

192
00:09:21,410 --> 00:09:24,250
So we're going to get to salting
I believe in a couple seconds.

193
00:09:24,250 --> 00:09:24,890
That's right.

194
00:09:24,890 --> 00:09:27,290
So at a high level, if
you don't use salting,

195
00:09:27,290 --> 00:09:29,620
rainbow tables actually
allow the attacker

196
00:09:29,620 --> 00:09:32,030
to spend some effort offline,
calculate this table,

197
00:09:32,030 --> 00:09:34,430
and then sort of
amortized the cost

198
00:09:34,430 --> 00:09:36,119
of calculating that
table over breaking

199
00:09:36,119 --> 00:09:37,535
many different
password databases.

200
00:09:41,455 --> 00:09:44,510
So the next thing that we can
think about to improve things

201
00:09:44,510 --> 00:09:45,255
is salting.

202
00:09:45,255 --> 00:09:46,630
I swear that guy
was not a plant,

203
00:09:46,630 --> 00:09:49,180
I will give you your
$20 after class.

204
00:09:49,180 --> 00:09:50,990
So how does salting work?

205
00:09:50,990 --> 00:09:52,448
So the basic thing
is you just want

206
00:09:52,448 --> 00:09:54,950
input some additional
randomness into the way

207
00:09:54,950 --> 00:09:56,750
that the passwords generated.

208
00:09:56,750 --> 00:10:02,450
So basically, you want to
take this hash function

209
00:10:02,450 --> 00:10:05,172
and you want to put some
salt in there-- which

210
00:10:05,172 --> 00:10:08,657
I'll explain in a second--
and then the password.

211
00:10:08,657 --> 00:10:10,865
And this is the thing that
you saw on the server side

212
00:10:10,865 --> 00:10:11,656
in the [INAUDIBLE].

213
00:10:11,656 --> 00:10:12,660
So what is this salt?

214
00:10:12,660 --> 00:10:16,880
And you just think of it as just
a string, a long string that's

215
00:10:16,880 --> 00:10:20,370
provided as sort of a first
part to this hash function.

216
00:10:20,370 --> 00:10:23,640
So why is it better
to use this scheme?

217
00:10:23,640 --> 00:10:25,440
And know that the
salt is actually

218
00:10:25,440 --> 00:10:28,500
stored on the clear
text on the server side.

219
00:10:28,500 --> 00:10:30,879
So you might be thinking OK,
well if that salt is stored

220
00:10:30,879 --> 00:10:32,640
on the clear text
in the server side,

221
00:10:32,640 --> 00:10:36,030
it seemed like a server can both
steal the table that matched

222
00:10:36,030 --> 00:10:38,330
user names to passwords
and the attacker can also

223
00:10:38,330 --> 00:10:41,109
steal the salt. So
why is that useful?

224
00:10:41,109 --> 00:10:43,650
AUDIENCE: Because if you picked
the top most common password,

225
00:10:43,650 --> 00:10:46,107
you can't just use it
once and find a new user.

226
00:10:46,107 --> 00:10:47,440
PROFESSOR: That's exactly right.

227
00:10:47,440 --> 00:10:49,180
So basically what
this does is this

228
00:10:49,180 --> 00:10:52,580
prevents the attacker from
building a single rainbow table

229
00:10:52,580 --> 00:10:56,050
and then using that rainbow
table against all instances

230
00:10:56,050 --> 00:10:57,930
of that hash function.

231
00:10:57,930 --> 00:10:59,970
And so you can
basically think of this

232
00:10:59,970 --> 00:11:02,776
as sort of uniquifying
passwords even if they

233
00:11:02,776 --> 00:11:04,810
are the same, basically.

234
00:11:04,810 --> 00:11:07,166
So this is what a lot of
systems do in practice, they

235
00:11:07,166 --> 00:11:09,370
use this notion of salt here.

236
00:11:09,370 --> 00:11:10,840
And so the best
practices for this

237
00:11:10,840 --> 00:11:12,360
so you want to
choose a salt that's

238
00:11:12,360 --> 00:11:14,776
long Because you're going to
essentially think of the salt

239
00:11:14,776 --> 00:11:18,240
as adding more bits to
this pseudo-password right.

240
00:11:18,240 --> 00:11:19,490
So more bits is always better.

241
00:11:19,490 --> 00:11:21,031
And the other thing
you want to do to

242
00:11:21,031 --> 00:11:23,390
is that whenever the user
changes his or her password,

243
00:11:23,390 --> 00:11:25,480
you typically want to
change that salt too.

244
00:11:25,480 --> 00:11:29,165
So one reason for that is
let's say that users are lazy

245
00:11:29,165 --> 00:11:31,750
and they want to pick the
same password multiple times.

246
00:11:31,750 --> 00:11:34,678
Changing the salt will
ensure that the thing that's

247
00:11:34,678 --> 00:11:37,303
stored in the password database
will actually be different even

248
00:11:37,303 --> 00:11:38,440
it that password's the same.

249
00:11:38,440 --> 00:11:40,106
I think there was a
questions somewhere.

250
00:11:40,106 --> 00:11:41,550
AUDIENCE: Why's it called salt?

251
00:11:41,550 --> 00:11:43,750
PROFESSOR: I'm actually
not sure why it's called

252
00:11:43,750 --> 00:11:45,060
salt, that's a good question.

253
00:11:45,060 --> 00:11:46,680
I'm sure there's some
answer to this though.

254
00:11:46,680 --> 00:11:47,450
It's like why are
cookies called cookies?

255
00:11:47,450 --> 00:11:49,836
The internet will know
but I actually don't know.

256
00:11:49,836 --> 00:11:52,800
AUDIENCE: Add some
[INAUDIBLE] to the hash number

257
00:11:52,800 --> 00:11:55,270
hash [INAUDIBLE].

258
00:11:55,270 --> 00:11:56,382
PROFESSOR: There we go.

259
00:11:56,382 --> 00:11:58,090
I'm glad that we're
getting this on film,

260
00:11:58,090 --> 00:11:59,255
cause I feel this
how we're going

261
00:11:59,255 --> 00:12:00,338
to get our Touring awards.

262
00:12:00,338 --> 00:12:01,530
That's right.

263
00:12:01,530 --> 00:12:03,790
I'm sure there's some
answer on the internet,

264
00:12:03,790 --> 00:12:05,370
so I'll look that up later.

265
00:12:05,370 --> 00:12:08,280
But does that all
basically makes sense?

266
00:12:08,280 --> 00:12:12,720
OK so these approaches are
fairly straightforward.

267
00:12:12,720 --> 00:12:16,980
So what I've assume so far
is that somehow the client

268
00:12:16,980 --> 00:12:20,466
is transmitting the
password to the server.

269
00:12:20,466 --> 00:12:23,090
But I haven't actually specified
how that transition's actually

270
00:12:23,090 --> 00:12:23,923
going to take place.

271
00:12:27,270 --> 00:12:35,880
So how do we transmit
these passwords?

272
00:12:35,880 --> 00:12:39,500
So the first idea you
might have would be,

273
00:12:39,500 --> 00:12:43,960
well, we'll just
send the password

274
00:12:43,960 --> 00:12:46,730
in the clear over the network.

275
00:12:46,730 --> 00:12:49,344
This is clearly
cartoonishly bad,

276
00:12:49,344 --> 00:12:51,510
because then there could
be a network attacker who's

277
00:12:51,510 --> 00:12:54,007
basically snooping
and seeing the traffic

278
00:12:54,007 --> 00:12:54,840
that you're sending.

279
00:12:54,840 --> 00:12:56,798
And let's see if we can
just take that password

280
00:12:56,798 --> 00:12:59,249
right off the wire and
then impersonate you.

281
00:12:59,249 --> 00:13:00,790
So we always start
with the straw man

282
00:13:00,790 --> 00:13:02,970
before I show you the other
straw men, which of course are

283
00:13:02,970 --> 00:13:03,840
also fatally flawed.

284
00:13:03,840 --> 00:13:05,815
So first thing you
think about is sending

285
00:13:05,815 --> 00:13:07,285
a password in the clear.

286
00:13:07,285 --> 00:13:08,785
Another thing you
might think, which

287
00:13:08,785 --> 00:13:10,860
would be a little
bit better perhaps,

288
00:13:10,860 --> 00:13:18,200
is perhaps we send the password
over an encrypted connection.

289
00:13:23,345 --> 00:13:27,464
And so we use some type
of cryptography here.

290
00:13:27,464 --> 00:13:29,630
Maybe there's some secret
key or something like that

291
00:13:29,630 --> 00:13:31,540
and that's what we
use to transform

292
00:13:31,540 --> 00:13:34,240
the password before we send
it over the connection.

293
00:13:34,240 --> 00:13:35,942
So at a high level,
encryption always

294
00:13:35,942 --> 00:13:37,400
seems to make things
better, right?

295
00:13:37,400 --> 00:13:38,200
Trademark.

296
00:13:38,200 --> 00:13:41,179
But the problem is that
unless you think carefully

297
00:13:41,179 --> 00:13:43,595
about how you're using things
like encryption and hashing,

298
00:13:43,595 --> 00:13:45,473
you may not be getting
the security benefits

299
00:13:45,473 --> 00:13:46,530
that you think you're getting.

300
00:13:46,530 --> 00:13:48,120
Because, for example,
what if there's

301
00:13:48,120 --> 00:13:50,450
someone who's sitting
between you-- the client--

302
00:13:50,450 --> 00:13:53,426
and the server, this proverbial
man in the middle attacker,

303
00:13:53,426 --> 00:13:55,050
who's actually snooping
on your traffic

304
00:13:55,050 --> 00:13:57,580
and pretending to be the server.

305
00:13:57,580 --> 00:14:00,370
If you send encrypted
data, you haven't actually

306
00:14:00,370 --> 00:14:02,600
authenticated the
other end, then

307
00:14:02,600 --> 00:14:06,150
you could still be opening
up yourself to problems.

308
00:14:06,150 --> 00:14:07,960
Because if the client
just, let's say,

309
00:14:07,960 --> 00:14:10,410
picked some random key,
sends it to some entity

310
00:14:10,410 --> 00:14:12,970
on the other side who may
or may not be the server.

311
00:14:12,970 --> 00:14:15,906
It is not the
server, [INAUDIBLE].

312
00:14:15,906 --> 00:14:19,490
You are sending something to
some person, who will then be

313
00:14:19,490 --> 00:14:21,390
able to get all your secrets.

314
00:14:21,390 --> 00:14:23,740
And so similarly,
people might think well

315
00:14:23,740 --> 00:14:25,810
what if I don't send
the raw password

316
00:14:25,810 --> 00:14:27,615
but I send a hash
of the passwords.

317
00:14:27,615 --> 00:14:29,240
That actually doesn't
give you anything

318
00:14:29,240 --> 00:14:30,260
in and of itself either.

319
00:14:30,260 --> 00:14:32,720
Because whether you send
the password or the hash

320
00:14:32,720 --> 00:14:34,780
of a password-- I mean,
a hash of the password

321
00:14:34,780 --> 00:14:37,800
has the same sort of semantic
power as the original password

322
00:14:37,800 --> 00:14:38,794
itself.

323
00:14:38,794 --> 00:14:40,585
If you haven't
authenticated the other side

324
00:14:40,585 --> 00:14:43,110
if you haven't authenticated
the server or things like this.

325
00:14:43,110 --> 00:14:44,740
So the basic point
with this discussion

326
00:14:44,740 --> 00:14:49,440
here is just to stress the fact
that just adding encryption

327
00:14:49,440 --> 00:14:51,730
or just adding hashing
doesn't necessarily

328
00:14:51,730 --> 00:14:53,690
give you any additional powers.

329
00:14:53,690 --> 00:14:56,160
If the client can't authenticate
who he or she is sending

330
00:14:56,160 --> 00:14:59,620
the password to then the client
could be mistakenly divulging

331
00:14:59,620 --> 00:15:03,430
that password with someone they
don't intend to divulged it to.

332
00:15:03,430 --> 00:15:07,620
So perhaps a better
idea than these two

333
00:15:07,620 --> 00:15:12,155
is to use what they call a
challenge response protocol.

334
00:15:17,200 --> 00:15:20,070
And here's an example of a
very simple challenge response

335
00:15:20,070 --> 00:15:21,090
protocol.

336
00:15:21,090 --> 00:15:26,140
So let's say we've
got the client here,

337
00:15:26,140 --> 00:15:30,700
and then you've got
the server over here.

338
00:15:30,700 --> 00:15:36,340
So the client says,
hi, I'm Alice.

339
00:15:39,450 --> 00:15:45,470
And then the server response
with some challenge seam,

340
00:15:45,470 --> 00:15:48,900
some quantity that the
server got to pick.

341
00:15:48,900 --> 00:15:54,670
And then the client
is going to respond

342
00:15:54,670 --> 00:15:58,950
with the hash of that
server sent challenge,

343
00:15:58,950 --> 00:16:02,898
and then you can concatenate
that with the password.

344
00:16:06,350 --> 00:16:09,490
So at this point, the server
can take this quantity.

345
00:16:09,490 --> 00:16:11,830
The server knows the
challenge that it sent.

346
00:16:11,830 --> 00:16:13,950
And presumably the server
knows the password,

347
00:16:13,950 --> 00:16:16,530
so the server can
[INAUDIBLE] this quantity

348
00:16:16,530 --> 00:16:19,780
and see it actually
matches what the user sent.

349
00:16:19,780 --> 00:16:21,720
So what's nice
about this protocol

350
00:16:21,720 --> 00:16:24,950
is that if we ignore man in the
middle attacks for a second,

351
00:16:24,950 --> 00:16:28,985
the server is now confident
that the user's actually Alice,

352
00:16:28,985 --> 00:16:31,331
because only Alice would
know this password here.

353
00:16:31,331 --> 00:16:33,830
And what's nice about this is
that if the server is actually

354
00:16:33,830 --> 00:16:36,120
the attacker-- so
in other words,

355
00:16:36,120 --> 00:16:39,442
if Alice sent this thing
to someone who's not

356
00:16:39,442 --> 00:16:41,400
the person who she's
trying to authenticate to,

357
00:16:41,400 --> 00:16:43,957
then the attacker still
doesn't know the password.

358
00:16:43,957 --> 00:16:45,990
Because the attacker
got to choose C,

359
00:16:45,990 --> 00:16:48,126
but the attacker doesn't
know what this is.

360
00:16:48,126 --> 00:16:49,500
And so basically
for the attacker

361
00:16:49,500 --> 00:16:50,969
to figure out what
the password is,

362
00:16:50,969 --> 00:16:52,760
the attacker has to be
able to, once again,

363
00:16:52,760 --> 00:16:54,324
invert these hash functions.

364
00:16:54,324 --> 00:16:55,282
Do you have a question?

365
00:16:55,282 --> 00:16:57,282
AUDIENCE: I'm just curious,
how can you not make

366
00:16:57,282 --> 00:17:01,178
a client do the hashing?

367
00:17:01,178 --> 00:17:01,678
[INAUDIBLE]

368
00:17:10,329 --> 00:17:13,300
PROFESSOR: So let's see,
so your proposed scheme

369
00:17:13,300 --> 00:17:20,370
is that the client side is
going to call this thing?

370
00:17:20,370 --> 00:17:22,495
AUDIENCE: Yeah, so instead
of setting the password,

371
00:17:22,495 --> 00:17:26,478
and having the server hash
the password and check it,

372
00:17:26,478 --> 00:17:28,482
the client would just
send the hash password.

373
00:17:28,482 --> 00:17:30,815
PROFESSOR: The client would
just sent the hash password.

374
00:17:36,430 --> 00:17:37,980
So there's a couple reasons.

375
00:17:37,980 --> 00:17:40,642
So one reason, as
we'll discuss later,

376
00:17:40,642 --> 00:17:42,350
is that there's going
to be things called

377
00:17:42,350 --> 00:17:43,772
anti-hammering defenses right.

378
00:17:43,772 --> 00:17:45,230
Anti-hammering
defenses is designed

379
00:17:45,230 --> 00:17:48,544
to prevent a bad client
from continually asking,

380
00:17:48,544 --> 00:17:50,335
is this the password,
is this the password,

381
00:17:50,335 --> 00:17:51,330
is this the password?

382
00:17:51,330 --> 00:17:53,121
So then as a result,
it's easier for things

383
00:17:53,121 --> 00:17:55,150
to be on the server side
as on the client side.

384
00:17:55,150 --> 00:17:57,340
But suffice it to
say, you can, in fact,

385
00:17:57,340 --> 00:17:59,882
do the hash on the client side.

386
00:17:59,882 --> 00:18:01,590
Using JavaScripts or
something like this.

387
00:18:01,590 --> 00:18:03,185
But the basic idea
is that somehow you

388
00:18:03,185 --> 00:18:06,770
have to have the computational
expense be very, very large,

389
00:18:06,770 --> 00:18:10,620
because that's going to prevent
the attacker from just guessing

390
00:18:10,620 --> 00:18:13,617
what the password is quickly.

391
00:18:13,617 --> 00:18:14,700
Is there another question?

392
00:18:14,700 --> 00:18:16,878
AUDIENCE: Well I just
wanted to point out

393
00:18:16,878 --> 00:18:18,822
that if the client
does the hashing,

394
00:18:18,822 --> 00:18:23,196
then it's [INAUDIBLE] because
your password is the hash.

395
00:18:23,196 --> 00:18:25,140
PROFESSOR: So that's true.

396
00:18:25,140 --> 00:18:26,920
AUDIENCE: So if
somebody get the table

397
00:18:26,920 --> 00:18:28,900
from the server
[INAUDIBLE] using

398
00:18:28,900 --> 00:18:31,251
it to hash they can log in.

399
00:18:31,251 --> 00:18:32,250
PROFESSOR: That's right.

400
00:18:32,250 --> 00:18:34,041
Yeah, it gets a little
bit subtle sometimes

401
00:18:34,041 --> 00:18:37,160
depending on who can
pick, for example,

402
00:18:37,160 --> 00:18:38,487
these challenge values.

403
00:18:38,487 --> 00:18:40,820
Because if client and servers
can pick challenge values,

404
00:18:40,820 --> 00:18:43,130
so that makes it more or
less difficult for the client

405
00:18:43,130 --> 00:18:44,280
to launch those
types of attacks.

406
00:18:44,280 --> 00:18:46,405
So for example, like one
problem with this protocol

407
00:18:46,405 --> 00:18:49,700
here is that
basically the client

408
00:18:49,700 --> 00:18:54,000
doesn't get to inject
any randomness into this.

409
00:18:54,000 --> 00:18:55,500
So you can imagine
that you can make

410
00:18:55,500 --> 00:18:59,440
this protocol more difficult
for the server to invert.

411
00:18:59,440 --> 00:19:01,976
If the client actually got
to choose some challenge that

412
00:19:01,976 --> 00:19:04,476
was put in here, so you got the
server side challenge verses

413
00:19:04,476 --> 00:19:05,720
the client side challenge.

414
00:19:05,720 --> 00:19:06,886
But you're right about that.

415
00:19:09,110 --> 00:19:11,670
Any other questions?

416
00:19:11,670 --> 00:19:13,790
OK.

417
00:19:13,790 --> 00:19:17,240
So yeah, so this segues is
discussion we're just having.

418
00:19:19,890 --> 00:19:22,960
So even though to
break this, the server

419
00:19:22,960 --> 00:19:25,860
would have to invert
this hash, the attacker

420
00:19:25,860 --> 00:19:29,132
could still try to do one of
these brute force attacks.

421
00:19:29,132 --> 00:19:30,840
So one way that we
can prevent the server

422
00:19:30,840 --> 00:19:32,160
from doing these
brute force attacks

423
00:19:32,160 --> 00:19:33,876
is to choose one of these
expensive hash functions

424
00:19:33,876 --> 00:19:35,060
like we were discussing before.

425
00:19:35,060 --> 00:19:36,559
Another thing, as
we just discussed,

426
00:19:36,559 --> 00:19:39,640
is that you could actually
allow the client to,

427
00:19:39,640 --> 00:19:44,070
for example, choose its
own client chosen challenge

428
00:19:44,070 --> 00:19:44,850
over here.

429
00:19:44,850 --> 00:19:46,225
And so that
essentially would act

430
00:19:46,225 --> 00:19:48,960
as like a client chosen salt.
So that would essentially

431
00:19:48,960 --> 00:19:50,950
make it more difficult
for the hacker

432
00:19:50,950 --> 00:19:52,760
to do things like build
up a rainbow table.

433
00:19:52,760 --> 00:19:56,590
Because note that if the
servers is the attacker here,

434
00:19:56,590 --> 00:19:59,830
the server always can pick the
same challenge value again,

435
00:19:59,830 --> 00:20:02,190
again, and again, allowing
to build the rainbow table.

436
00:20:02,190 --> 00:20:04,300
But if when the
client responded back,

437
00:20:04,300 --> 00:20:06,870
the client also
included some salt,

438
00:20:06,870 --> 00:20:09,086
some client chosen
challenge that it included,

439
00:20:09,086 --> 00:20:10,460
then they'll
prevent the attacker

440
00:20:10,460 --> 00:20:12,900
from building one of
the rainbow tables.

441
00:20:12,900 --> 00:20:15,361
So does that all make sense?

442
00:20:15,361 --> 00:20:15,860
OK.

443
00:20:19,580 --> 00:20:23,300
So yeah, one thing
that I mentioned

444
00:20:23,300 --> 00:20:26,920
that might be useful
to do is implementing

445
00:20:26,920 --> 00:20:28,222
these anti-hammer defenses.

446
00:20:33,770 --> 00:20:40,560
And so anti-hammering defenses
are basically designed to rate

447
00:20:40,560 --> 00:20:50,800
limit the number
of password guesses

448
00:20:50,800 --> 00:20:53,630
that a bad client can issue.

449
00:20:59,900 --> 00:21:03,210
Because the idea here is that
if you've got some clients who's

450
00:21:03,210 --> 00:21:05,320
trying to launch one
of these brute force

451
00:21:05,320 --> 00:21:06,754
guesses against
the password, you

452
00:21:06,754 --> 00:21:08,670
don't want that client
to be able to sit there

453
00:21:08,670 --> 00:21:10,795
in a tight loop and just
say, is this the password,

454
00:21:10,795 --> 00:21:12,910
is this the password,
is this the password?

455
00:21:12,910 --> 00:21:14,830
So one way we can
do anti-hamming

456
00:21:14,830 --> 00:21:16,556
it just do that rate limiting.

457
00:21:16,556 --> 00:21:18,170
So the server will
say, I will only

458
00:21:18,170 --> 00:21:21,150
accept let's say three
password guesses per second

459
00:21:21,150 --> 00:21:22,650
from any particular client.

460
00:21:22,650 --> 00:21:28,710
You could also mention imagine
implementing timeouts here.

461
00:21:28,710 --> 00:21:31,550
So maybe the client can issue
a bunch of password requests

462
00:21:31,550 --> 00:21:33,970
in a row, but then after, let's
say, 10 of them are wrong,

463
00:21:33,970 --> 00:21:35,594
the server says, OK
you got to hold on,

464
00:21:35,594 --> 00:21:39,340
I will not accept any more
requests from you for,

465
00:21:39,340 --> 00:21:42,770
let's say, 10 seconds,
something like that.

466
00:21:42,770 --> 00:21:44,610
And so both of these
things are designed

467
00:21:44,610 --> 00:21:46,220
for preventing
brute force attacks.

468
00:21:46,220 --> 00:21:48,912
And so, for example,
like some smart cars have

469
00:21:48,912 --> 00:21:50,860
these types of
defenses, some TPNs

470
00:21:50,860 --> 00:21:53,150
have these kinds of
defenses to basically stop

471
00:21:53,150 --> 00:21:56,000
against this brute force attack.

472
00:21:56,000 --> 00:21:58,250
So why is it important
for you to use

473
00:21:58,250 --> 00:21:59,880
these anti-hammering defenses?

474
00:21:59,880 --> 00:22:01,370
Well one reason
why it's important

475
00:22:01,370 --> 00:22:03,570
is as we discussed
these passwords have

476
00:22:03,570 --> 00:22:05,640
so little entropy.

477
00:22:05,640 --> 00:22:08,110
So because passwords typically
have so little entropy,

478
00:22:08,110 --> 00:22:10,337
it's really important
to prevent the attacker

479
00:22:10,337 --> 00:22:12,670
from just trying to cycle
through that low entropy space

480
00:22:12,670 --> 00:22:13,940
very, very quickly.

481
00:22:13,940 --> 00:22:15,940
So as you may be aware,
a lot of websites

482
00:22:15,940 --> 00:22:21,042
have these format constraints
that push upon you

483
00:22:21,042 --> 00:22:22,630
for your passwords.

484
00:22:22,630 --> 00:22:24,437
They'll say things
like your password must

485
00:22:24,437 --> 00:22:31,036
have a punctuation, it must
have a mixture of numbers

486
00:22:31,036 --> 00:22:33,410
and letters, you must have
uppercase and lowercase stuff,

487
00:22:33,410 --> 00:22:34,546
so and so forth.

488
00:22:34,546 --> 00:22:36,920
And so what those constraints
are trying to get you to do

489
00:22:36,920 --> 00:22:38,760
is they're trying
to get you to expand

490
00:22:38,760 --> 00:22:40,660
the entropy of the password.

491
00:22:40,660 --> 00:22:43,490
But what's problematic though
is that it's not really

492
00:22:43,490 --> 00:22:46,210
these formatted constraints
that we should be caring about.

493
00:22:46,210 --> 00:22:48,980
It's the actual entropy
of the password itself.

494
00:22:48,980 --> 00:22:51,680
So it turns out even if people
were given these constraints--

495
00:22:51,680 --> 00:22:52,960
like you have to use
punctuation, characters,

496
00:22:52,960 --> 00:22:55,275
and stuff like that-- the
entropy of resulting password

497
00:22:55,275 --> 00:22:56,844
is often quite low.

498
00:22:56,844 --> 00:22:58,885
So for example, people
will often put punctuation

499
00:22:58,885 --> 00:22:59,885
at the beginning or end.

500
00:22:59,885 --> 00:23:02,218
Because they don't want to
be troubled to remember like,

501
00:23:02,218 --> 00:23:04,900
do I have like a dollar sign
in the middle or something?

502
00:23:04,900 --> 00:23:08,720
And so as it turns out, these
format requirements oftentimes

503
00:23:08,720 --> 00:23:11,850
don't make dictionary
attacks much harder

504
00:23:11,850 --> 00:23:14,070
for a sophisticated adversary.

505
00:23:14,070 --> 00:23:18,240
And the reason is because,
basically, the dictionary

506
00:23:18,240 --> 00:23:20,540
attacker can leverage
these observations

507
00:23:20,540 --> 00:23:22,720
about how people
pick passwords even

508
00:23:22,720 --> 00:23:24,360
in the presence of constraints.

509
00:23:24,360 --> 00:23:26,910
So for example, if the attacker
knows that people typically

510
00:23:26,910 --> 00:23:28,630
put punctuation at the
beginning or the end,

511
00:23:28,630 --> 00:23:30,720
just incorporate that into
your dictionary attack.

512
00:23:30,720 --> 00:23:32,595
And so an actually really
interesting website

513
00:23:32,595 --> 00:23:35,995
you can go to that's
called Telepathwords.

514
00:23:40,130 --> 00:23:41,770
And so what's neat
about this site

515
00:23:41,770 --> 00:23:44,390
is that it has a
little text box.

516
00:23:44,390 --> 00:23:46,745
So you can type a character
into that text box--

517
00:23:46,745 --> 00:23:48,870
you're pretending that
you're entering a password--

518
00:23:48,870 --> 00:23:51,070
and Telepathwords
will try to guess

519
00:23:51,070 --> 00:23:52,960
what your next character is.

520
00:23:52,960 --> 00:23:54,595
So as you type
additional characters,

521
00:23:54,595 --> 00:23:56,800
it'll have a little drop
down box which says,

522
00:23:56,800 --> 00:23:59,091
were you going to put this,
were you going to put this?

523
00:23:59,091 --> 00:24:02,380
It will give you a
little blurb that says,

524
00:24:02,380 --> 00:24:04,035
here's what I think
that you were going

525
00:24:04,035 --> 00:24:05,650
to enter this next password.

526
00:24:05,650 --> 00:24:07,290
So how does Telepathwords work?

527
00:24:07,290 --> 00:24:09,350
So it basically has
a bunch of databases.

528
00:24:09,350 --> 00:24:11,705
It has a database
of common passwords.

529
00:24:15,030 --> 00:24:21,930
It also has a list
of popular phrases

530
00:24:21,930 --> 00:24:25,504
that it's taken from websites.

531
00:24:25,504 --> 00:24:28,040
And it also has this
set of heuristics

532
00:24:28,040 --> 00:24:36,570
which describe common user
biases in picking passwords.

533
00:24:36,570 --> 00:24:38,210
So for example,
one funny bias is

534
00:24:38,210 --> 00:24:39,796
that people will
often-- when they

535
00:24:39,796 --> 00:24:41,170
are forced with
these constraints

536
00:24:41,170 --> 00:24:43,503
to say you must use punctuation,
stuff like that-- a lot

537
00:24:43,503 --> 00:24:47,460
of times when they're picking
characters for the password,

538
00:24:47,460 --> 00:24:50,994
they will use keys that
are adjacent to each other.

539
00:24:50,994 --> 00:24:52,660
So in other words,
they'll be very small

540
00:24:52,660 --> 00:24:54,690
edit distance in physical
space with respect

541
00:24:54,690 --> 00:24:56,920
to edit distance in
the actual password.

542
00:24:56,920 --> 00:24:59,510
So what a Telepathwords does
is it has the database here,

543
00:24:59,510 --> 00:25:01,720
so when you type in things
it's running these models.

544
00:25:01,720 --> 00:25:02,670
And it's saying,
statistically speaking,

545
00:25:02,670 --> 00:25:05,424
here's the most likely thing
that you're going to type next.

546
00:25:05,424 --> 00:25:07,652
So it's almost like auto
complete for passwords.

547
00:25:07,652 --> 00:25:09,235
And so what's funny
is that this shows

548
00:25:09,235 --> 00:25:11,151
once again that if you
have these constraints,

549
00:25:11,151 --> 00:25:14,150
they actually don't protect
you that much if there are some

550
00:25:14,150 --> 00:25:17,500
of these underlying a priori
distributions of things

551
00:25:17,500 --> 00:25:19,870
that the attacker
can't leverage.

552
00:25:19,870 --> 00:25:21,766
I think there was a question?

553
00:25:21,766 --> 00:25:25,970
AUDIENCE: Yeah so it seems
like if an attacker is

554
00:25:25,970 --> 00:25:28,162
too sophisticated
that they could

555
00:25:28,162 --> 00:25:31,571
try guessing like a bunch
of IP addresses and things

556
00:25:31,571 --> 00:25:34,980
which only would prevent
hammering [INAUDIBLE].

557
00:25:42,684 --> 00:25:44,100
PROFESSOR: Yeah,
it's very tricky.

558
00:25:44,100 --> 00:25:45,100
Now that's a good point.

559
00:25:45,100 --> 00:25:47,659
So anti-hammering
basically sounds well

560
00:25:47,659 --> 00:25:50,500
what's the scope of the attack
that you're trying to prevent?

561
00:25:50,500 --> 00:25:54,055
So if you're concerned
about distributed attackers

562
00:25:54,055 --> 00:25:57,250
and a network system, it does
become very, very subtle.

563
00:25:57,250 --> 00:26:00,202
And suffice it to say that
the notion of anti-hammering

564
00:26:00,202 --> 00:26:02,410
or [INAUDIBLE] systems, and
also the notion of things

565
00:26:02,410 --> 00:26:05,080
like clipfraud, for example.

566
00:26:05,080 --> 00:26:06,700
So in other words,
how does someone

567
00:26:06,700 --> 00:26:08,590
who's running an
advertising campaign online

568
00:26:08,590 --> 00:26:10,665
determine if someone's
actually putting the link

569
00:26:10,665 --> 00:26:13,070
and actually paying someone
for those clicks, verses

570
00:26:13,070 --> 00:26:15,560
this is just spammer who
got some box just sitting

571
00:26:15,560 --> 00:26:17,200
there clicking on stuff.

572
00:26:17,200 --> 00:26:19,241
So suffice it to say
there's a lot of distributed

573
00:26:19,241 --> 00:26:21,690
heuristics that try to
solve those problems.

574
00:26:21,690 --> 00:26:23,980
And in many cases, it's
not a science, it's an art.

575
00:26:23,980 --> 00:26:26,480
But your [INAUDIBLE] correct
and in the distributed setting,

576
00:26:26,480 --> 00:26:30,980
things get much more
difficult. All right,

577
00:26:30,980 --> 00:26:32,930
so does this all make sense?

578
00:26:32,930 --> 00:26:35,330
AUDIENCE: What about the
cryptographic anti-hammering

579
00:26:35,330 --> 00:26:36,770
defenses?

580
00:26:36,770 --> 00:26:40,800
Most of the time you end up
sending a hash on the line

581
00:26:40,800 --> 00:26:44,855
[INAUDIBLE] that when
you get out of it

582
00:26:44,855 --> 00:26:46,595
is exactly what
you would get out

583
00:26:46,595 --> 00:26:48,178
the password of the
hashable password?

584
00:26:50,571 --> 00:26:52,490
I know there are
protocols like SRP

585
00:26:52,490 --> 00:26:56,160
or there are some zero
knowledge protocols.

586
00:26:56,160 --> 00:26:57,062
PROFESSOR: Yeah, so--

587
00:26:57,062 --> 00:26:58,520
AUDIENCE: That you
use in practice?

588
00:26:58,520 --> 00:26:59,311
PROFESSOR: They do.

589
00:27:01,820 --> 00:27:03,980
Those protocols
provides some stronger

590
00:27:03,980 --> 00:27:05,160
cryptographic guarantees.

591
00:27:05,160 --> 00:27:06,500
A lot of times they
are not backwards

592
00:27:06,500 --> 00:27:08,900
compatible with current systems,
which is why in practice you

593
00:27:08,900 --> 00:27:09,470
don't see them used a lot.

594
00:27:09,470 --> 00:27:10,928
But yeah, there
are some protocols,

595
00:27:10,928 --> 00:27:14,900
for example, that
allow the server to not

596
00:27:14,900 --> 00:27:17,840
have any notion of
the password at all.

597
00:27:17,840 --> 00:27:20,220
So there's some zero knowledge
type thing or whatever.

598
00:27:20,220 --> 00:27:21,719
So those things do
work in practice.

599
00:27:21,719 --> 00:27:24,505
But one of the things that this
paper says is very interesting

600
00:27:24,505 --> 00:27:26,880
is that you basically go
through all these authentication

601
00:27:26,880 --> 00:27:29,190
schemes and they say,
OK, here's passwords.

602
00:27:29,190 --> 00:27:30,190
Yeah, they kind of suck.

603
00:27:30,190 --> 00:27:31,360
Here's some other
things that are actually

604
00:27:31,360 --> 00:27:32,770
much stronger on
security access,

605
00:27:32,770 --> 00:27:35,500
but then they all fail on
deployability or usability

606
00:27:35,500 --> 00:27:36,560
and things like that.

607
00:27:36,560 --> 00:27:39,970
And so that's one of the
interesting and slightly sad

608
00:27:39,970 --> 00:27:41,890
outcomes of this
paper that maybe

609
00:27:41,890 --> 00:27:44,185
even though we have all
these much stronger security

610
00:27:44,185 --> 00:27:46,680
for the protocols,
we can't deploy them

611
00:27:46,680 --> 00:27:50,164
for some usability reasons
or some [INAUDIBLE] reason.

612
00:27:54,440 --> 00:27:56,277
So that's just a fun
site to go to right.

613
00:27:56,277 --> 00:27:58,360
So they claim that they
don't store your passwords

614
00:27:58,360 --> 00:28:00,660
so you take them at their
word if you want to.

615
00:28:00,660 --> 00:28:03,520
But it is very interesting to
just sit down and think like,

616
00:28:03,520 --> 00:28:04,870
what password I generate?

617
00:28:04,870 --> 00:28:07,340
And then type into this,
and see how accurate

618
00:28:07,340 --> 00:28:09,685
it is in guessing what
the next thing will be.

619
00:28:09,685 --> 00:28:12,090
It even covers things
like the popular heuristic

620
00:28:12,090 --> 00:28:15,760
like take a popular phrase
that has multiple words,

621
00:28:15,760 --> 00:28:18,180
and then only take the
first letter of each word.

622
00:28:18,180 --> 00:28:19,650
So this thing is
very, very good.

623
00:28:19,650 --> 00:28:21,100
Very, very scary too.

624
00:28:21,100 --> 00:28:23,402
OK so that's Telepathwords.

625
00:28:23,402 --> 00:28:25,110
And so one thing that
is also interesting

626
00:28:25,110 --> 00:28:30,070
when you think about is
in your password scheme,

627
00:28:30,070 --> 00:28:33,760
is it vulnerable to
offline guessing.

628
00:28:37,290 --> 00:28:43,740
So this was a problem
that Kerberos before that.

629
00:28:43,740 --> 00:28:51,550
And then also V5 without
this thing they call preauth.

630
00:28:51,550 --> 00:28:55,090
So the basic idea is that in
these versions of Kerberos,

631
00:28:55,090 --> 00:28:58,530
anyone could ask the KDC for
a ticket that would encrypted

632
00:28:58,530 --> 00:29:00,610
with the users password.

633
00:29:00,610 --> 00:29:04,149
So basically, the KDC did
not authenticate requests

634
00:29:04,149 --> 00:29:05,440
that were coming from a client.

635
00:29:05,440 --> 00:29:07,500
Now the thing that
the KDC would return

636
00:29:07,500 --> 00:29:12,180
was, in fact-- there
are some set of bits

637
00:29:12,180 --> 00:29:13,980
here that the KDC would return.

638
00:29:13,980 --> 00:29:16,275
I'm sure you don't want to
think about this ugly set

639
00:29:16,275 --> 00:29:17,340
of cryptographic
printers anymore.

640
00:29:17,340 --> 00:29:18,839
But suffice it to
say, the KDC would

641
00:29:18,839 --> 00:29:21,430
return this stuff
that was encrypted

642
00:29:21,430 --> 00:29:24,490
with the key of the client.

643
00:29:24,490 --> 00:29:26,510
That's what will come
back to the client side.

644
00:29:26,510 --> 00:29:30,420
So the problem with this is
that because the server did not

645
00:29:30,420 --> 00:29:34,730
check who was sending this
encrypted set of things to,

646
00:29:34,730 --> 00:29:38,520
the attacker can basically
get this thing here and then

647
00:29:38,520 --> 00:29:40,900
try to just guess what KC is.

648
00:29:40,900 --> 00:29:43,856
Just guess that KC is some
value, try to encrypt this,

649
00:29:43,856 --> 00:29:44,980
see if it looks reasonable.

650
00:29:44,980 --> 00:29:47,720
If not, try to guess
another KC, decrypt this,

651
00:29:47,720 --> 00:29:48,970
see if it looks reasonable.

652
00:29:48,970 --> 00:29:52,270
And the reason why the attacker
can launch this type of attack,

653
00:29:52,270 --> 00:29:54,950
is that this thing
here, this TGT actually

654
00:29:54,950 --> 00:29:57,370
has a known format.

655
00:29:57,370 --> 00:29:59,420
So it has things in
here like timestamps,

656
00:29:59,420 --> 00:30:02,010
and it has things in here like
various link field would have

657
00:30:02,010 --> 00:30:03,870
to be internally consistent.

658
00:30:03,870 --> 00:30:06,970
And so that basically
helps the attacker.

659
00:30:06,970 --> 00:30:10,380
Because if the attacker guesses
the KC, gets this thing here,

660
00:30:10,380 --> 00:30:12,550
a decrypted thing, and
the internal fields

661
00:30:12,550 --> 00:30:14,600
don't check out,
the attacker knows

662
00:30:14,600 --> 00:30:16,453
that it picked the
wrong KC, so they

663
00:30:16,453 --> 00:30:18,480
can go on and pick another KC.

664
00:30:18,480 --> 00:30:24,570
And so, in Kerberos V5,
basically the client

665
00:30:24,570 --> 00:30:30,330
has to send in this thing
that it sends over to the KDC,

666
00:30:30,330 --> 00:30:36,790
it basically sends a time stamp.

667
00:30:36,790 --> 00:30:40,900
And then this time stamp is
going to be encrypted with KC.

668
00:30:40,900 --> 00:30:43,230
So this is sent to the
server, and the server

669
00:30:43,230 --> 00:30:46,240
looks at this and validates that
before it will send something

670
00:30:46,240 --> 00:30:47,280
back to the client.

671
00:30:47,280 --> 00:30:49,930
So that gets rid of this
problem that any random client

672
00:30:49,930 --> 00:30:53,354
can show up and just
ask for this thing here.

673
00:30:56,840 --> 00:31:00,824
AUDIENCE: So is time stamp
recorded in the message?

674
00:31:00,824 --> 00:31:04,657
So can't the attacker just give
this message and enforce it?

675
00:31:04,657 --> 00:31:05,740
PROFESSOR: Let's see here.

676
00:31:05,740 --> 00:31:09,670
So can't the attacker
get this message here?

677
00:31:09,670 --> 00:31:11,902
AUDIENCE: Yeah, the
encryption [INAUDIBLE].

678
00:31:11,902 --> 00:31:14,360
PROFESSOR: So you're thinking
where the attacker might just

679
00:31:14,360 --> 00:31:15,500
spoof this, for example?

680
00:31:15,500 --> 00:31:19,227
AUDIENCE: No, I just brute
force it and get KC out.

681
00:31:19,227 --> 00:31:19,810
PROFESSOR: OK.

682
00:31:19,810 --> 00:31:21,185
So in other words,
you're worried

683
00:31:21,185 --> 00:31:22,954
someone could observe this.

684
00:31:22,954 --> 00:31:23,620
AUDIENCE: Right.

685
00:31:23,620 --> 00:31:25,090
PROFESSOR: So I
believe that this

686
00:31:25,090 --> 00:31:29,166
is put inside an encrypted thing
that belongs to the server,

687
00:31:29,166 --> 00:31:30,540
or the key belongs
to the server.

688
00:31:30,540 --> 00:31:32,331
I think to prevent that
attack. [INAUDIBLE]

689
00:31:32,331 --> 00:31:34,390
so don't quote me on that.

690
00:31:34,390 --> 00:31:36,250
But you're correct
it's not, for example.

691
00:31:36,250 --> 00:31:37,625
And if the attacker,
for example,

692
00:31:37,625 --> 00:31:39,890
knew something that about
what the current time is,

693
00:31:39,890 --> 00:31:42,400
roughly, that actually
is super useful.

694
00:31:42,400 --> 00:31:44,190
Because then the
attacker can guess,

695
00:31:44,190 --> 00:31:46,815
oh, time stamp should be
roughly between here and here.

696
00:31:46,815 --> 00:31:48,190
And if it sees
it's in the clear,

697
00:31:48,190 --> 00:31:50,357
it can do the exact same
attack that we had up here.

698
00:31:50,357 --> 00:31:52,648
AUDIENCE: It's a little better
because the attacker has

699
00:31:52,648 --> 00:31:54,712
to be in the middle, but
it's still susceptible.

700
00:31:54,712 --> 00:31:55,670
PROFESSOR: That's true.

701
00:31:55,670 --> 00:31:57,150
Well, yeah, that's
right, the attacker

702
00:31:57,150 --> 00:31:58,770
has to be on the
network somewhere so

703
00:31:58,770 --> 00:32:00,370
this [INAUDIBLE] stuff.

704
00:32:00,370 --> 00:32:00,946
That's right.

705
00:32:04,070 --> 00:32:06,350
So that's all, I'm guessing.

706
00:32:06,350 --> 00:32:09,130
So another thing that's
important to think about

707
00:32:09,130 --> 00:32:14,580
is password recovery.

708
00:32:18,510 --> 00:32:20,950
So this is the idea that
you lose your password,

709
00:32:20,950 --> 00:32:23,380
and then somehow you
have to go to the service

710
00:32:23,380 --> 00:32:26,636
and you have to ask
for another password.

711
00:32:26,636 --> 00:32:28,010
But before you
get that password,

712
00:32:28,010 --> 00:32:30,220
you have to prove that
you are you in some way.

713
00:32:30,220 --> 00:32:31,290
So how does that work?

714
00:32:31,290 --> 00:32:32,650
How to do password recovery?

715
00:32:32,650 --> 00:32:35,940
So what's interesting is
that people oftentimes

716
00:32:35,940 --> 00:32:39,190
focus on the entropy
of the password itself.

717
00:32:39,190 --> 00:32:43,430
But the problem is that
if the password recovery

718
00:32:43,430 --> 00:32:45,570
questions or the
password recovery scheme

719
00:32:45,570 --> 00:32:47,420
has little entropy,
that actually

720
00:32:47,420 --> 00:32:50,113
affects the entropy of the
overall authentication scheme.

721
00:32:50,113 --> 00:32:55,240
So in other words, the
strength of the overall scheme

722
00:32:55,240 --> 00:32:58,520
is basically equal
to the minimum

723
00:32:58,520 --> 00:33:07,440
of the password entropy in
the recovery question entropy.

724
00:33:11,589 --> 00:33:13,960
And so you see this
actually play out

725
00:33:13,960 --> 00:33:16,005
in a lot of rules scenarios.

726
00:33:16,005 --> 00:33:18,380
There's a lot of famous cases,
like the Sarah Palin case,

727
00:33:18,380 --> 00:33:21,700
where basically someone
was able to recover

728
00:33:21,700 --> 00:33:25,300
her password fraudulently
because her recovery

729
00:33:25,300 --> 00:33:28,029
questions were things that
any random person could find.

730
00:33:28,029 --> 00:33:30,070
By looking at her Wikipedia
article, for example,

731
00:33:30,070 --> 00:33:32,880
find out where she went to high
school and things like that.

732
00:33:32,880 --> 00:33:35,840
And so often times these
password recovery questions

733
00:33:35,840 --> 00:33:36,950
are not very good.

734
00:33:36,950 --> 00:33:39,980
And they're not very good
because of a couple reasons.

735
00:33:39,980 --> 00:33:44,560
So sometimes these things
just have very low entropy.

736
00:33:44,560 --> 00:33:46,990
So if you have a password
recovery question that

737
00:33:46,990 --> 00:33:49,610
is something like, what's
your favorite color,

738
00:33:49,610 --> 00:33:52,190
the most popular answers are
going to be like blue and red.

739
00:33:52,190 --> 00:33:55,300
Nobody's going to say like
off white, fuchsia, magenta.

740
00:33:55,300 --> 00:33:57,150
So some of these
recovery questions

741
00:33:57,150 --> 00:34:01,035
intrinsically are very difficult
to provide a lot of entropy

742
00:34:01,035 --> 00:34:01,770
for.

743
00:34:01,770 --> 00:34:05,140
The other problem is
that sometimes these

744
00:34:05,140 --> 00:34:11,560
recover questions can be
leaked via social media.

745
00:34:11,560 --> 00:34:14,270
So for example, if one
of the recovery questions

746
00:34:14,270 --> 00:34:16,020
is what's your favorite movie?

747
00:34:16,020 --> 00:34:18,170
So maybe this space there
is a little bit bigger,

748
00:34:18,170 --> 00:34:20,540
but if intrinsically I
can go look at, let's say,

749
00:34:20,540 --> 00:34:22,530
your IMDB profile,
your Facebook profile,

750
00:34:22,530 --> 00:34:24,482
and figure out like,
oh hey, you literally

751
00:34:24,482 --> 00:34:25,940
told me that's your
favorite movie,

752
00:34:25,940 --> 00:34:27,820
this isn't super useful either.

753
00:34:27,820 --> 00:34:29,500
And another problem--
this is actually

754
00:34:29,500 --> 00:34:32,270
sort of the funniest
one-- is that the user

755
00:34:32,270 --> 00:34:38,270
selected recovery questions
are often super weak.

756
00:34:38,270 --> 00:34:42,396
So for example, people
have done a survey

757
00:34:42,396 --> 00:34:44,520
of what some of these
recovery questions look like,

758
00:34:44,520 --> 00:34:46,370
and sometimes users
themselves will

759
00:34:46,370 --> 00:34:51,820
set recovery questions that are
things like what is 2 plus 3?

760
00:34:51,820 --> 00:34:55,000
And so, at the time, the user's
thinking this is a big hassle,

761
00:34:55,000 --> 00:34:56,409
we're going to have to use this.

762
00:34:56,409 --> 00:34:59,680
But trivially most humans
who pass the Turing Test

763
00:34:59,680 --> 00:35:01,848
can answer that
questions successfully.

764
00:35:01,848 --> 00:35:04,842
And then therefore get
the users password back.

765
00:35:04,842 --> 00:35:12,340
AUDIENCE: So [INAUDIBLE] like
using recovery passwords?

766
00:35:12,340 --> 00:35:16,462
It's basically like you enter in
your name and maybe the subject

767
00:35:16,462 --> 00:35:18,891
of some emails that you've
sent, like a small amount

768
00:35:18,891 --> 00:35:19,974
of additional information.

769
00:35:19,974 --> 00:35:21,979
But based on that,
in some cases they

770
00:35:21,979 --> 00:35:26,200
can-- is security of
that kind of stuff then?

771
00:35:26,200 --> 00:35:28,771
PROFESSOR: So I don't know of
any formal study like that.

772
00:35:28,771 --> 00:35:30,396
Those things are
actually a lot better.

773
00:35:30,396 --> 00:35:32,770
I actually know
this, because I was

774
00:35:32,770 --> 00:35:35,000
trying to help a friend
go through this process.

775
00:35:35,000 --> 00:35:38,630
So she basically lost
control of her Gmail account,

776
00:35:38,630 --> 00:35:40,880
and she was trying to prove
that this was her account.

777
00:35:40,880 --> 00:35:43,840
And so yeah, they would ask you
things like roughly speaking,

778
00:35:43,840 --> 00:35:46,100
when did you open this account.

779
00:35:46,100 --> 00:35:48,573
Roughly speaking before you
lost control of this account

780
00:35:48,573 --> 00:35:52,770
to hesball or whatever,
who were some of the people

781
00:35:52,770 --> 00:35:54,205
that you talked to?

782
00:35:54,205 --> 00:35:55,080
And things like that.

783
00:35:55,080 --> 00:35:57,187
And it's actually a
pretty laborious process.

784
00:35:57,187 --> 00:35:59,520
What ends up happening is
that you're generally correct,

785
00:35:59,520 --> 00:36:01,950
it ends up being much more
powerful than this stuff.

786
00:36:01,950 --> 00:36:04,920
And so actually I don't know
of any formal studies of that,

787
00:36:04,920 --> 00:36:06,656
but it does seem
[INAUDIBLE] much strong

788
00:36:06,656 --> 00:36:07,886
than these types of things.

789
00:36:11,259 --> 00:36:12,550
All right, any other questions?

790
00:36:16,350 --> 00:36:20,810
Now we can get to
the paper for today.

791
00:36:20,810 --> 00:36:24,010
So reading for today,
the author has basically

792
00:36:24,010 --> 00:36:28,610
proposed a bunch of factors
that can be used to evaluate

793
00:36:28,610 --> 00:36:30,465
these authentication schemes.

794
00:36:30,465 --> 00:36:32,506
And what's really cool
about this paper, I think,

795
00:36:32,506 --> 00:36:35,010
is that it basically tries
to say, look, a lot of us

796
00:36:35,010 --> 00:36:37,460
in the security community
are fighting just

797
00:36:37,460 --> 00:36:38,710
based on aesthetic principles.

798
00:36:38,710 --> 00:36:41,020
Like, we should pick
this because I just

799
00:36:41,020 --> 00:36:43,260
like the way that the curly
braces look in the proof.

800
00:36:43,260 --> 00:36:46,161
We should pick this because
it uses a lot of math mode.

801
00:36:46,161 --> 00:36:48,660
And so what they say is, look,
why don't we try to establish

802
00:36:48,660 --> 00:36:50,050
some type of criteria?

803
00:36:50,050 --> 00:36:52,510
Maybe some of the criteria
are a little bit subjective.

804
00:36:52,510 --> 00:36:54,630
Let's just try to have
this taxonomy of ways

805
00:36:54,630 --> 00:36:56,620
to evaluate the
authentication scheme.

806
00:36:56,620 --> 00:36:59,900
And let's just see how these
various schemes stack up.

807
00:36:59,900 --> 00:37:03,060
And so the authors basically
proposed three high level

808
00:37:03,060 --> 00:37:05,660
metrics for evaluating
these schemes.

809
00:37:05,660 --> 00:37:11,910
And so, the first
metric is usability.

810
00:37:11,910 --> 00:37:13,950
And so, the base
idea here is how

811
00:37:13,950 --> 00:37:16,520
easy is it for users to interact
with this authentication

812
00:37:16,520 --> 00:37:17,620
scheme.

813
00:37:17,620 --> 00:37:20,000
So they find a couple
interesting properties.

814
00:37:20,000 --> 00:37:23,820
So for example, is
it easy to learn?

815
00:37:26,580 --> 00:37:29,679
This basically just means is
this scheme easy to learn?

816
00:37:29,679 --> 00:37:31,970
So some of these categories
are pretty straightforward.

817
00:37:31,970 --> 00:37:33,830
Some of them actually involve
a little bit of subtlety.

818
00:37:33,830 --> 00:37:35,512
But this one makes
a lot of sense.

819
00:37:35,512 --> 00:37:43,710
And so if we look at passwords,
passwords pass this test.

820
00:37:43,710 --> 00:37:48,460
Because everybody is used to
using passwords, so we'll say

821
00:37:48,460 --> 00:37:49,550
they are easy to learn.

822
00:37:49,550 --> 00:37:54,480
Another category is
infrequent errors.

823
00:37:54,480 --> 00:37:56,480
So that means when
you are trying

824
00:37:56,480 --> 00:37:58,583
to authenticate
the system, if you

825
00:37:58,583 --> 00:38:01,189
are the actual user
in question, is it

826
00:38:01,189 --> 00:38:03,230
the case that you can
often authenticate yourself

827
00:38:03,230 --> 00:38:04,990
without generating errors?

828
00:38:04,990 --> 00:38:09,050
And so, here the
authors say quasi-yes.

829
00:38:12,970 --> 00:38:15,316
And so the quasi prefix is
one of the more entertaining

830
00:38:15,316 --> 00:38:17,190
aspects of the paper,
because authors kind of

831
00:38:17,190 --> 00:38:20,010
admit there's this element
of subjectivity to it.

832
00:38:20,010 --> 00:38:24,350
So we can't necessarily say with
crisp precision yes, no, things

833
00:38:24,350 --> 00:38:25,020
like this.

834
00:38:25,020 --> 00:38:26,760
So the reason why
they say quasi-yes

835
00:38:26,760 --> 00:38:30,120
is because, in general, you
can authenticate a password

836
00:38:30,120 --> 00:38:30,700
successfully.

837
00:38:30,700 --> 00:38:33,109
But we've all been in that
place where it's like 3 AM,

838
00:38:33,109 --> 00:38:34,900
we're trying to log on
to our email server,

839
00:38:34,900 --> 00:38:36,060
our mind's not in
the right place,

840
00:38:36,060 --> 00:38:38,060
and we enter a bunch of
errors a bunch of times.

841
00:38:38,060 --> 00:38:41,030
So they say quasi-yes for this.

842
00:38:41,030 --> 00:38:46,510
Another category is
it scalable for users.

843
00:38:50,006 --> 00:38:54,867
And so the basic idea
here is if the user has

844
00:38:54,867 --> 00:38:56,950
a bunch of different
services that he or she wants

845
00:38:56,950 --> 00:39:01,160
to authenticate to, does
this scheme scale well?

846
00:39:01,160 --> 00:39:04,110
Does the user have to
remember some new thing

847
00:39:04,110 --> 00:39:06,290
for each one of the schemes?

848
00:39:06,290 --> 00:39:11,200
And so, for here,
the authors say no.

849
00:39:11,200 --> 00:39:14,480
Because in practice, it's
very difficult for users

850
00:39:14,480 --> 00:39:18,130
to remember a separate
password for every single site

851
00:39:18,130 --> 00:39:18,880
that they go to.

852
00:39:18,880 --> 00:39:21,500
This is one reason actually why
people reuse their passwords

853
00:39:21,500 --> 00:39:23,660
often.

854
00:39:23,660 --> 00:39:27,216
So another usability
property is easy recovery.

855
00:39:30,370 --> 00:39:34,230
So what happens if you
lose your authentication

856
00:39:34,230 --> 00:39:37,160
token-- in this case, your
password-- is it easy to reset?

857
00:39:37,160 --> 00:39:42,060
And in this case, the
answer for passwords is yes.

858
00:39:42,060 --> 00:39:44,670
In fact, they are probably
too easy to reset,

859
00:39:44,670 --> 00:39:46,620
as we just discussed
a couple minutes ago.

860
00:39:46,620 --> 00:39:49,690
So that's a yes.

861
00:39:49,690 --> 00:39:52,210
And so another existing
one is nothing to carry.

862
00:39:54,730 --> 00:39:58,690
So a lot of the more Barouque
authentication protocols

863
00:39:58,690 --> 00:40:01,190
require you run
some smartphone app,

864
00:40:01,190 --> 00:40:03,880
or you have some security
token or smart card or things

865
00:40:03,880 --> 00:40:04,790
like that.

866
00:40:04,790 --> 00:40:07,370
So that's a burden.

867
00:40:07,370 --> 00:40:08,870
Maybe not with a
smartphone so much,

868
00:40:08,870 --> 00:40:11,350
but having to carry around
one of these other gadgets is

869
00:40:11,350 --> 00:40:12,310
probably a pain.

870
00:40:12,310 --> 00:40:17,300
And so this is actually one
nice feature of passwords,

871
00:40:17,300 --> 00:40:20,340
you basically only have to
carry around in your brain,

872
00:40:20,340 --> 00:40:22,570
which is one that you
should have at all moments.

873
00:40:22,570 --> 00:40:25,427
So that's basically what
usability looks like.

874
00:40:25,427 --> 00:40:27,010
It is very interesting
in a high level

875
00:40:27,010 --> 00:40:30,600
that a lot of times
these sort of factors

876
00:40:30,600 --> 00:40:33,705
are given a little bit of a
short shrift in the community.

877
00:40:33,705 --> 00:40:36,080
Security can be when people
are evaluating these schemes.

878
00:40:36,080 --> 00:40:38,770
They say, oh, this thing uses
like a million bits of entropy,

879
00:40:38,770 --> 00:40:41,090
and can only be broken by
the Death Star or whatever.

880
00:40:41,090 --> 00:40:42,464
But then people
don't necessarily

881
00:40:42,464 --> 00:40:46,040
remember these are actually
very important factors too.

882
00:40:46,040 --> 00:40:52,550
OK so the next
high level category

883
00:40:52,550 --> 00:40:56,210
that the authors use to
evaluate authentication scheme

884
00:40:56,210 --> 00:40:58,350
is deployability.

885
00:40:58,350 --> 00:41:00,652
So the base idea
here is how easy

886
00:41:00,652 --> 00:41:05,940
is it to incorporate this system
in to current web services.

887
00:41:05,940 --> 00:41:07,890
So one thing they
look at, for example,

888
00:41:07,890 --> 00:41:12,753
is is it server compatible?

889
00:41:16,050 --> 00:41:18,350
And this basically means
can I easily integrate

890
00:41:18,350 --> 00:41:22,200
this scheme with today's
servers, which are based

891
00:41:22,200 --> 00:41:24,230
around text based passwords?

892
00:41:24,230 --> 00:41:27,440
And so since success here
is defined with respect

893
00:41:27,440 --> 00:41:30,820
to passwords, passwords succeed.

894
00:41:30,820 --> 00:41:35,700
So another metric is
browser compatibility.

895
00:41:35,700 --> 00:41:37,225
Similar type of thing.

896
00:41:37,225 --> 00:41:41,130
Can I use this scheme with
current off-the-shelf browsers

897
00:41:41,130 --> 00:41:44,390
without having to install
plug-in, something like that?

898
00:41:44,390 --> 00:41:48,408
Once again, passwords
win by default.

899
00:41:48,408 --> 00:41:50,396
And another interesting
one is excessibility.

900
00:41:54,870 --> 00:41:58,802
So can people who can use
passwords now, but maybe

901
00:41:58,802 --> 00:42:01,010
have some type of physical
disability-- maybe they're

902
00:42:01,010 --> 00:42:03,987
blind, or they can't hear well,
or they can't gesture well,

903
00:42:03,987 --> 00:42:04,820
or things like that.

904
00:42:04,820 --> 00:42:07,050
Can they actually
use this scheme?

905
00:42:07,050 --> 00:42:08,580
This is actually
pretty important.

906
00:42:08,580 --> 00:42:12,462
So once again, the
authors' saying yes.

907
00:42:12,462 --> 00:42:14,420
It's a little bit weird,
because it's not clear

908
00:42:14,420 --> 00:42:16,880
that all people with all
disabilities can use passwords,

909
00:42:16,880 --> 00:42:20,470
but they say yes here.

910
00:42:20,470 --> 00:42:22,690
So yes, so these are
three interesting things

911
00:42:22,690 --> 00:42:24,890
to think about with
respect to deployability.

912
00:42:24,890 --> 00:42:26,960
And the reason why this
deployability category

913
00:42:26,960 --> 00:42:29,940
is so important is because it's
very difficult to get anyone

914
00:42:29,940 --> 00:42:33,220
to upgrade anything ever.

915
00:42:33,220 --> 00:42:35,800
I mean people don't even
want to reboot their machines

916
00:42:35,800 --> 00:42:38,155
and get a new OS
update installed.

917
00:42:38,155 --> 00:42:40,780
So it's very difficult that this
scheme requires usable changes

918
00:42:40,780 --> 00:42:42,749
on the server to get
people on the server

919
00:42:42,749 --> 00:42:44,040
to actually do different stuff.

920
00:42:44,040 --> 00:42:45,340
This goes back to your
question, why don't we

921
00:42:45,340 --> 00:42:46,480
use these better things?

922
00:42:46,480 --> 00:42:47,590
Cause deployability
in many cases

923
00:42:47,590 --> 00:42:49,089
is super, super
important to people.

924
00:42:51,920 --> 00:42:56,450
All right, so then the final
category that we will look at

925
00:42:56,450 --> 00:42:57,125
is security.

926
00:43:00,690 --> 00:43:04,750
Right, so what kinds of attacks
can this scheme prevent?

927
00:43:04,750 --> 00:43:09,305
So a lot of these
security properties

928
00:43:09,305 --> 00:43:12,590
are resilient to foo.

929
00:43:12,590 --> 00:43:15,060
I'll just shorten
that one of reds.

930
00:43:15,060 --> 00:43:21,750
So is the scheme resilient
to physical observations?

931
00:43:25,090 --> 00:43:27,970
So the idea here is
that an attacker can not

932
00:43:27,970 --> 00:43:30,730
impersonate the
user after observing

933
00:43:30,730 --> 00:43:33,400
them authenticate a few times.

934
00:43:33,400 --> 00:43:35,540
So imagine that you
had a shoulder surfer.

935
00:43:35,540 --> 00:43:37,280
So you're somewhere
in a computer lab,

936
00:43:37,280 --> 00:43:38,821
someone's looking
over your shoulder,

937
00:43:38,821 --> 00:43:39,980
seeing what you type in.

938
00:43:39,980 --> 00:43:42,400
Someone's videotaping
you, maybe someone's

939
00:43:42,400 --> 00:43:44,802
got a microphone listening
to the acoustic signature

940
00:43:44,802 --> 00:43:46,677
of your keyboard and
trying to extract things

941
00:43:46,677 --> 00:43:49,630
from that, so on and so forth.

942
00:43:49,630 --> 00:43:53,820
So the authors say
that passwords actually

943
00:43:53,820 --> 00:43:55,190
failed this test.

944
00:43:55,190 --> 00:44:00,090
And that's because someone can
videotape typing in things,

945
00:44:00,090 --> 00:44:02,640
they can pretty easily figure
out what letters you typed.

946
00:44:02,640 --> 00:44:04,973
Or there's actually these
attacks where you can actually

947
00:44:04,973 --> 00:44:07,810
listen to the acoustic
fingerprint of the keyboard,

948
00:44:07,810 --> 00:44:11,840
and detect what was typed based
on what sounds that you hear.

949
00:44:11,840 --> 00:44:15,910
So passwords are not resistant
to physical observation.

950
00:44:15,910 --> 00:44:25,135
So another property is resistant
to targeted impersonation.

951
00:44:28,580 --> 00:44:30,630
And so the base
idea here that, is

952
00:44:30,630 --> 00:44:33,570
that is it possible for someone
who knows you-- a friend,

953
00:44:33,570 --> 00:44:35,280
an acquaintance, a
spouse, a loved one,

954
00:44:35,280 --> 00:44:38,795
a family member,
whatever-- to impersonate

955
00:44:38,795 --> 00:44:44,290
you using their knowledge of
who you are and what you do.

956
00:44:44,290 --> 00:44:46,667
So could your friend try
to pretend to be you easily

957
00:44:46,667 --> 00:44:47,750
in this particular scheme?

958
00:44:47,750 --> 00:44:53,065
So here the authors
basically have another one

959
00:44:53,065 --> 00:44:53,940
of these quasi-yeses.

960
00:44:56,900 --> 00:44:59,610
And they say quasi-yes
because they're not

961
00:44:59,610 --> 00:45:03,095
aware of any studies which
show that if you know a person,

962
00:45:03,095 --> 00:45:05,570
you're more likely to
guess their password.

963
00:45:05,570 --> 00:45:07,190
So they say quasi-yes for that.

964
00:45:07,190 --> 00:45:10,510
And so, note that resistance
is targeted impersonation.

965
00:45:10,510 --> 00:45:12,260
This is where most
security backup

966
00:45:12,260 --> 00:45:14,135
questions fail miserably.

967
00:45:14,135 --> 00:45:16,010
Because if someone knows
something about you,

968
00:45:16,010 --> 00:45:19,595
quite easily they can guess
your security questions

969
00:45:19,595 --> 00:45:22,860
in many cases.

970
00:45:22,860 --> 00:45:27,450
So then we have two categories
that involve guessing.

971
00:45:27,450 --> 00:45:30,990
So the first one is resilient
to throttle guessing.

972
00:45:34,930 --> 00:45:42,080
And so what this means is
if the attacker can not

973
00:45:42,080 --> 00:45:47,690
issue guesses at line
rate, because for, example,

974
00:45:47,690 --> 00:45:51,880
the server uses
anti-hammering mechanisms.

975
00:45:51,880 --> 00:45:56,720
Is the scheme safe
against the attacker?

976
00:45:56,720 --> 00:46:01,060
And so here, they say no.

977
00:46:01,060 --> 00:46:02,670
And so the reason
why they say no,

978
00:46:02,670 --> 00:46:05,480
is because in practice
passwords not only

979
00:46:05,480 --> 00:46:09,800
have sort of low inherit entropy
because they're not that long,

980
00:46:09,800 --> 00:46:12,570
but also they have that
skewed distribution.

981
00:46:12,570 --> 00:46:15,860
And so what that means is
that even if the attacker is

982
00:46:15,860 --> 00:46:18,260
throttled in some way,
typically the attacker can still

983
00:46:18,260 --> 00:46:20,040
make good forward
progress and crack

984
00:46:20,040 --> 00:46:22,140
a lot of people's passwords.

985
00:46:22,140 --> 00:46:26,010
So they define another
guessing property

986
00:46:26,010 --> 00:46:29,960
which is resistant to
unthrottled guessing.

987
00:46:34,030 --> 00:46:38,890
And so this is basically
saying, suppose

988
00:46:38,890 --> 00:46:44,110
that the attacker can issue
these authentication forgery

989
00:46:44,110 --> 00:46:47,280
request as quickly
as he or she wants.

990
00:46:47,280 --> 00:46:49,000
So in other words,
the attacker is only

991
00:46:49,000 --> 00:46:51,220
limited by the speed
of their hardware.

992
00:46:51,220 --> 00:46:54,440
So is the authentication
scheme resilient to that type

993
00:46:54,440 --> 00:46:55,290
of attack?

994
00:46:55,290 --> 00:46:59,560
And here maybe this answer's
also no, for the same reason

995
00:46:59,560 --> 00:47:01,470
that the answer was no up here.

996
00:47:01,470 --> 00:47:04,040
So basically passwords have
a very small entropy space

997
00:47:04,040 --> 00:47:07,040
and they come
skewed distribution.

998
00:47:07,040 --> 00:47:10,690
So that's all pretty
straightforward.

999
00:47:10,690 --> 00:47:13,603
One interesting
one is resiliency

1000
00:47:13,603 --> 00:47:16,390
to internal observation.

1001
00:47:21,890 --> 00:47:23,720
So this means that
the attacker can not

1002
00:47:23,720 --> 00:47:27,370
impersonate a user like
intercepting that users input.

1003
00:47:27,370 --> 00:47:31,770
For example, by installing
a keystroke logger

1004
00:47:31,770 --> 00:47:34,675
on the keyboard that
the user's using,

1005
00:47:34,675 --> 00:47:37,640
and using that logger
to steal keypresses.

1006
00:47:37,640 --> 00:47:39,790
This also means, for
example, that there's

1007
00:47:39,790 --> 00:47:41,450
no way for network
attacker who's

1008
00:47:41,450 --> 00:47:44,270
observing the things that the
client sending over the wire

1009
00:47:44,270 --> 00:47:48,670
to use that knowledge
of the network traffic

1010
00:47:48,670 --> 00:47:50,710
to later impersonate the user.

1011
00:47:50,710 --> 00:47:56,610
And so here they say password
do not have this scheme.

1012
00:47:56,610 --> 00:47:59,640
And they essentially say
it's because passwords

1013
00:47:59,640 --> 00:48:02,060
are static tokens.

1014
00:48:02,060 --> 00:48:03,160
They don't change.

1015
00:48:03,160 --> 00:48:06,500
And typically static tokens
are vulnerable to replay.

1016
00:48:06,500 --> 00:48:08,920
So if somehow, for
example, an attacker

1017
00:48:08,920 --> 00:48:11,680
installs a keystroke logger
and gets your password,

1018
00:48:11,680 --> 00:48:14,280
then basically the attacker
can use that password

1019
00:48:14,280 --> 00:48:17,020
until it's either expired or
revoked or something that.

1020
00:48:17,020 --> 00:48:18,470
It you just replay
it again it'll

1021
00:48:18,470 --> 00:48:20,960
go into that authenticating
server on the other side.

1022
00:48:20,960 --> 00:48:22,751
So here, passwords
actually fail that test.

1023
00:48:25,564 --> 00:48:27,522
Another thing that we
talked about a little bit

1024
00:48:27,522 --> 00:48:29,340
in this class phishing.

1025
00:48:29,340 --> 00:48:36,538
So resilience to phishing
is another security metric.

1026
00:48:36,538 --> 00:48:40,190
And the base idea here is that,
if the attacker can simulate

1027
00:48:40,190 --> 00:48:43,320
a valid service-- for
example, by attacking the DNS

1028
00:48:43,320 --> 00:48:45,870
infrastructure or
something like that--

1029
00:48:45,870 --> 00:48:49,200
then the attacker cannot collect
credentials from the user,

1030
00:48:49,200 --> 00:48:53,300
then the attacker can then use
to pretend to be the user later

1031
00:48:53,300 --> 00:48:53,925
on.

1032
00:48:53,925 --> 00:48:58,300
And so this basically
supposed penalized sites that

1033
00:48:58,300 --> 00:49:03,580
do not strongly tell
the user, hey, I'm

1034
00:49:03,580 --> 00:49:06,850
this particular service, so you
can feel confident to give me

1035
00:49:06,850 --> 00:49:07,950
your credentials.

1036
00:49:07,950 --> 00:49:11,160
And so if here passwords fail
just because phishing sites

1037
00:49:11,160 --> 00:49:13,217
are very, very popular.

1038
00:49:13,217 --> 00:49:15,175
So passwords don't really
intrinsically provide

1039
00:49:15,175 --> 00:49:16,341
any protection against that.

1040
00:49:20,620 --> 00:49:23,170
Now the next two
are particularly

1041
00:49:23,170 --> 00:49:28,040
interesting in the context of a
large scale distributed system.

1042
00:49:28,040 --> 00:49:30,390
So no trusted third party.

1043
00:49:33,760 --> 00:49:35,270
This essentially
means that other

1044
00:49:35,270 --> 00:49:38,410
than the client and the
server, there's no one else

1045
00:49:38,410 --> 00:49:44,580
in the system that is involved
in the authentication protocol.

1046
00:49:44,580 --> 00:49:47,719
And so, that means that
there's no third party who,

1047
00:49:47,719 --> 00:49:49,260
if that third party
were compromised,

1048
00:49:49,260 --> 00:49:51,310
the entire integrity of
the securities scheme

1049
00:49:51,310 --> 00:49:52,040
might fall apart.

1050
00:49:52,040 --> 00:49:54,343
And so, this is actually
an interesting property

1051
00:49:54,343 --> 00:49:56,780
to look at because a lot
of authentication problems

1052
00:49:56,780 --> 00:49:59,900
would go away if we could just
store all our authentication

1053
00:49:59,900 --> 00:50:01,863
information in one place.

1054
00:50:01,863 --> 00:50:04,050
We just store it in one
place, it's very simple,

1055
00:50:04,050 --> 00:50:05,690
we don't have to remember a
lot of stuff on the client,

1056
00:50:05,690 --> 00:50:07,850
we just say, whatever
service you want to use,

1057
00:50:07,850 --> 00:50:10,110
you always go to
this one third party,

1058
00:50:10,110 --> 00:50:11,980
and that third
party will always be

1059
00:50:11,980 --> 00:50:14,980
able to of authenticate
you, and then

1060
00:50:14,980 --> 00:50:17,090
allow you to go on your way.

1061
00:50:17,090 --> 00:50:20,640
Now of course third parties are
problematic with perspective

1062
00:50:20,640 --> 00:50:22,777
of robustness right
because if you

1063
00:50:22,777 --> 00:50:24,360
have one of these
global third parties

1064
00:50:24,360 --> 00:50:27,750
that everybody trusts, if that
third party gets subverted then

1065
00:50:27,750 --> 00:50:29,660
perhaps the integrity
of all the sites

1066
00:50:29,660 --> 00:50:32,400
that use that third party to
authenticate all those sites

1067
00:50:32,400 --> 00:50:35,000
are potentially in danger.

1068
00:50:35,000 --> 00:50:39,760
So they say that passwords do
not have a trusted third party

1069
00:50:39,760 --> 00:50:43,142
because each user is forced
to have a separate password

1070
00:50:43,142 --> 00:50:44,054
for each site.

1071
00:50:46,790 --> 00:50:48,814
A related property is