1
00:00:00,080 --> 00:00:02,430
The following content is
provided under a Creative

2
00:00:02,430 --> 00:00:03,820
Commons license.

3
00:00:03,820 --> 00:00:06,060
Your support will help
MIT OpenCourseWare

4
00:00:06,060 --> 00:00:10,140
continue to offer high quality
educational resources for free.

5
00:00:10,140 --> 00:00:12,700
To make a donation or to
view additional materials

6
00:00:12,700 --> 00:00:16,600
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:16,600 --> 00:00:17,263
at ocw.mit.edu.

8
00:00:26,360 --> 00:00:30,130
PROFESSOR: So today is our
third and probably final lecture

9
00:00:30,130 --> 00:00:31,870
on approximation algorithms.

10
00:00:31,870 --> 00:00:34,760
We're going to take
a different approach

11
00:00:34,760 --> 00:00:39,340
to proving inapproximability
to optimization problems called

12
00:00:39,340 --> 00:00:40,450
gap problems.

13
00:00:40,450 --> 00:00:43,620
And we'll think about
gap-preserving reductions.

14
00:00:43,620 --> 00:00:47,680
We will prove along the way
optimal lower bound on MAX-3SAT

15
00:00:47,680 --> 00:00:50,080
and other fun things.

16
00:00:50,080 --> 00:00:52,700
So let's start with
what a gap problem is.

17
00:00:58,340 --> 00:01:00,750
I haven't actually seen a
generic definition of this,

18
00:01:00,750 --> 00:01:03,570
so this is new terminology,
but I think helpful.

19
00:01:06,310 --> 00:01:10,420
A gap problem is a way of
converting an optimization

20
00:01:10,420 --> 00:01:12,680
problem into a decision problem.

21
00:01:12,680 --> 00:01:14,510
Now, we already know
one way to do that.

22
00:01:14,510 --> 00:01:16,590
If you have an NPO
optimization problem,

23
00:01:16,590 --> 00:01:18,690
you convert it into the
obvious decision problem,

24
00:01:18,690 --> 00:01:21,190
which is OPT at most k.

25
00:01:21,190 --> 00:01:25,600
For the minimization
problem, that is NP-complete.

26
00:01:25,600 --> 00:01:28,700
But we want to get something
useful from an approximability

27
00:01:28,700 --> 00:01:29,880
standpoint.

28
00:01:29,880 --> 00:01:35,190
And so the idea is, here's
a different problem.

29
00:01:35,190 --> 00:01:44,100
Instead of just deciding
whether OPT is at most k,

30
00:01:44,100 --> 00:01:49,310
we want to distinguish
between OPT being at most k

31
00:01:49,310 --> 00:01:57,360
versus OPT being at least
k over c for some value c.

32
00:01:57,360 --> 00:02:01,070
And the analogy here is with
c approximation algorithm,

33
00:02:01,070 --> 00:02:04,279
c does not have to be
constant, despite the name.

34
00:02:04,279 --> 00:02:05,320
Could be a function of n.

35
00:02:05,320 --> 00:02:06,600
Maybe it's log n.

36
00:02:06,600 --> 00:02:10,460
Maybe it's n to the
epsilon, whatever.

37
00:02:10,460 --> 00:02:14,306
So for a minimization
problem, normally--

38
00:02:14,306 --> 00:02:15,860
did I get this the right way?

39
00:02:15,860 --> 00:02:17,730
Sorry, that should be c times k.

40
00:02:24,050 --> 00:02:27,830
For minimization and
for maximization,

41
00:02:27,830 --> 00:02:29,905
it's going to be the reverse.

42
00:02:36,880 --> 00:02:42,030
And I think I'm going to use
strict inequality here also.

43
00:02:42,030 --> 00:02:43,590
So a minimization.

44
00:02:43,590 --> 00:02:46,750
There's a gap here
between k and c times k.

45
00:02:46,750 --> 00:02:51,640
We're imagining here
c is bigger than 1.

46
00:02:51,640 --> 00:02:53,956
So distinguishing
between being less than k

47
00:02:53,956 --> 00:02:59,320
and being at least c
times k leaves a hole.

48
00:02:59,320 --> 00:03:03,870
And the point is you
are promised that

49
00:03:03,870 --> 00:03:06,250
your input-- you
have a question?

50
00:03:06,250 --> 00:03:09,954
AUDIENCE: The second one,
should it really be c over k?

51
00:03:13,725 --> 00:03:14,600
PROFESSOR: Sorry, no.

52
00:03:17,540 --> 00:03:20,750
Thank you.

53
00:03:20,750 --> 00:03:25,771
C and k sound the same, so it's
always easy to mix them up.

54
00:03:25,771 --> 00:03:26,270
Cool.

55
00:03:26,270 --> 00:03:30,220
So the idea is that
you're-- so in both cases,

56
00:03:30,220 --> 00:03:33,740
there's a ratio gap here of c.

57
00:03:33,740 --> 00:03:35,580
And the idea is
that you're promised

58
00:03:35,580 --> 00:03:38,724
that your input falls into
one of these two categories.

59
00:03:38,724 --> 00:03:40,140
What does it mean
to distinguish--

60
00:03:40,140 --> 00:03:42,550
I mean, I tell you up
front the input either

61
00:03:42,550 --> 00:03:44,990
has this property
or this property,

62
00:03:44,990 --> 00:03:47,940
and I want you to
decide which one it is.

63
00:03:47,940 --> 00:03:51,610
And we'll call
these yes instances

64
00:03:51,610 --> 00:03:53,010
and these no instances.

65
00:03:55,720 --> 00:03:57,230
Normally with a
decision problem,

66
00:03:57,230 --> 00:04:02,080
the no instance is that OPT is
just one bigger or one smaller.

67
00:04:02,080 --> 00:04:05,190
Now we have a big gap between
the no instances and the yes

68
00:04:05,190 --> 00:04:05,690
instances.

69
00:04:05,690 --> 00:04:07,200
We're told that that's true.

70
00:04:07,200 --> 00:04:08,830
This is called promise problem.

71
00:04:12,560 --> 00:04:14,954
And effectively
what that means is

72
00:04:14,954 --> 00:04:16,829
if you're trying to come
up with an algorithm

73
00:04:16,829 --> 00:04:18,700
to solve this
decision problem, you

74
00:04:18,700 --> 00:04:21,720
don't care what the
algorithm does if OPT

75
00:04:21,720 --> 00:04:24,317
happens to fall in between.

76
00:04:24,317 --> 00:04:25,900
The algorithm can
do whatever it want.

77
00:04:25,900 --> 00:04:28,530
It can output digits
of pi in the middle,

78
00:04:28,530 --> 00:04:32,930
as long as when OPT is it
at most k or greater than

79
00:04:32,930 --> 00:04:35,060
or equal to k, it outputs yes.

80
00:04:35,060 --> 00:04:38,320
And when OPT is at least a
factor of c away from that,

81
00:04:38,320 --> 00:04:40,720
it outputs no.

82
00:04:40,720 --> 00:04:45,550
So that's an easier problem.

83
00:04:45,550 --> 00:04:56,090
And the cool thing is if the
c gap version of a problem

84
00:04:56,090 --> 00:05:06,990
is NP-hard, then so
is c approximating

85
00:05:06,990 --> 00:05:07,980
the original problem.

86
00:05:21,200 --> 00:05:24,340
So this really is in direct
analogy to c approximation.

87
00:05:24,340 --> 00:05:27,540
And so this lets us think
about an NP hardness

88
00:05:27,540 --> 00:05:31,750
for a decision problem and prove
an inapproximability result.

89
00:05:31,750 --> 00:05:33,950
This is nice because in
the last two lectures,

90
00:05:33,950 --> 00:05:39,040
we were having to keep track
of a lot more in just defining

91
00:05:39,040 --> 00:05:43,190
what inapproximability meant
and APX hardness and so on.

92
00:05:43,190 --> 00:05:45,674
Here it's kind of back
to regular NP hardnes.

93
00:05:45,674 --> 00:05:47,090
Now, the techniques
are completely

94
00:05:47,090 --> 00:05:50,800
different in this world than
our older NP hardness proofs.

95
00:05:50,800 --> 00:05:53,000
But still, it's kind of
comforting to be back

96
00:05:53,000 --> 00:05:56,170
in decision land.

97
00:05:56,170 --> 00:05:58,420
Cool.

98
00:05:58,420 --> 00:06:03,510
So because of this implication,
in previous lectures

99
00:06:03,510 --> 00:06:07,510
we were just worried about
proving inapproximability.

100
00:06:07,510 --> 00:06:10,110
But today we're
going to be thinking

101
00:06:10,110 --> 00:06:13,100
about proving that gap
problems are NP-hard,

102
00:06:13,100 --> 00:06:15,710
or some other kind of hardness.

103
00:06:15,710 --> 00:06:17,500
This is a stronger
type of result.

104
00:06:17,500 --> 00:06:19,520
So in general,
inapproximability is

105
00:06:19,520 --> 00:06:22,150
kind of what you care about
from the algorithmic standpoint,

106
00:06:22,150 --> 00:06:24,910
but gap results saying
that hey, your problem

107
00:06:24,910 --> 00:06:28,360
is hard even if you have
this huge gap between the yes

108
00:06:28,360 --> 00:06:30,240
instances and the
no instances, that's

109
00:06:30,240 --> 00:06:32,220
also of independent
interest, more

110
00:06:32,220 --> 00:06:34,370
about the structure
of the problem.

111
00:06:34,370 --> 00:06:35,490
But one implies the other.

112
00:06:35,490 --> 00:06:39,750
So this is the stronger
type of thing to go for.

113
00:06:39,750 --> 00:06:42,420
The practical reason to
care about this stuff

114
00:06:42,420 --> 00:06:47,550
is that this gap idea lets you
get stronger inapproximability

115
00:06:47,550 --> 00:06:48,650
results.

116
00:06:48,650 --> 00:06:52,550
The factor c you get by
thinking about gaps in practice

117
00:06:52,550 --> 00:06:56,070
seems to be larger
than the gaps you

118
00:06:56,070 --> 00:07:00,280
get by L reductions and things.

119
00:07:00,280 --> 00:07:05,110
So let me tell you about one
other type of gap problem.

120
00:07:05,110 --> 00:07:08,565
This is a standard one, a
little bit more precise.

121
00:07:14,200 --> 00:07:15,920
Consider MAX-SAT.

122
00:07:15,920 --> 00:07:17,770
Pick your favorite
version of MAX-SAT,

123
00:07:17,770 --> 00:07:20,100
or MAX-CSP was the
general form where you

124
00:07:20,100 --> 00:07:22,140
could have any type of clause.

125
00:07:22,140 --> 00:07:24,765
Instead of just a c gap, we will
define slightly more precisely

126
00:07:24,765 --> 00:07:39,400
an a, b gap, which is to
distinguish between OPT

127
00:07:39,400 --> 00:07:46,990
is less than a times
the number of clauses,

128
00:07:46,990 --> 00:07:51,645
and OPT is at least b times
the number of clauses.

129
00:07:55,740 --> 00:08:00,300
So whereas here everything
was relative to some input k

130
00:08:00,300 --> 00:08:03,580
that you want to
decide about, with SAT

131
00:08:03,580 --> 00:08:05,640
there's a kind of absolute
notion of what you'd

132
00:08:05,640 --> 00:08:07,950
like to achieve, which is
that you satisfy everything,

133
00:08:07,950 --> 00:08:09,270
all clauses are true.

134
00:08:09,270 --> 00:08:11,440
So typically we'll
think about b being one.

135
00:08:14,280 --> 00:08:17,460
And so you're distinguishing
between a satisfiable instance

136
00:08:17,460 --> 00:08:20,130
where all clauses are
satisfied, and something

137
00:08:20,130 --> 00:08:21,500
that's very unsatisfiable.

138
00:08:21,500 --> 00:08:25,341
There's some kind of usually
constant fraction unsatisfiable

139
00:08:25,341 --> 00:08:25,840
clauses.

140
00:08:29,250 --> 00:08:32,590
We need this level
of precision thinking

141
00:08:32,590 --> 00:08:35,409
about when you're right
up against 100% satisfied

142
00:08:35,409 --> 00:08:39,720
versus 1% satisfied or something
like that, or 1% satisfiable.

143
00:08:42,909 --> 00:08:43,409
Cool.

144
00:08:43,409 --> 00:08:45,825
AUDIENCE: Do you use the same
notation for one [INAUDIBLE]

145
00:08:45,825 --> 00:08:49,050
or only for-- so the one problem
like the maximum number of ones

146
00:08:49,050 --> 00:08:49,550
you can get.

147
00:08:49,550 --> 00:08:52,300
PROFESSOR: I haven't
seen it, but yeah, that's

148
00:08:52,300 --> 00:08:53,700
a good question.

149
00:08:53,700 --> 00:08:56,290
Certainly valid to do
it for-- we will see it

150
00:08:56,290 --> 00:08:57,540
for one other type of problem.

151
00:08:57,540 --> 00:09:00,670
For any problem, if you can
define some absolute notion

152
00:09:00,670 --> 00:09:02,270
of how much you'd
like to get, you

153
00:09:02,270 --> 00:09:05,720
can always measure relative to
that and define this kind of a,

154
00:09:05,720 --> 00:09:08,690
b gap problem.

155
00:09:08,690 --> 00:09:10,660
Cool.

156
00:09:10,660 --> 00:09:11,160
All right.

157
00:09:11,160 --> 00:09:14,740
So how do we get these gaps?

158
00:09:14,740 --> 00:09:17,440
There's two, well maybe
three ways, I guess.

159
00:09:24,870 --> 00:09:28,480
In general, we're going to
use reductions, like always.

160
00:09:28,480 --> 00:09:31,880
And you could start from
no gap and make a gap,

161
00:09:31,880 --> 00:09:33,930
or start from a
gap of additive one

162
00:09:33,930 --> 00:09:35,710
and turn it into a big
multiplicative gap.

163
00:09:35,710 --> 00:09:38,050
That will be
gap-producing reductions.

164
00:09:38,050 --> 00:09:40,890
You could start with some
gap and then make it bigger.

165
00:09:40,890 --> 00:09:43,440
That's gap-amplifying reduction.

166
00:09:43,440 --> 00:09:46,090
Or you could just start with
a gap and try to preserve it.

167
00:09:46,090 --> 00:09:48,010
That would be
gap-preserving reductions.

168
00:09:48,010 --> 00:09:49,950
In general, once
you have some gap,

169
00:09:49,950 --> 00:09:52,780
you try to keep it or make
it bigger to get stronger

170
00:09:52,780 --> 00:09:55,990
hardness for your problem.

171
00:09:55,990 --> 00:09:59,010
So the idea with a
gap-producing reduction

172
00:09:59,010 --> 00:10:02,200
is that you have no assumption
about your starting problem.

173
00:10:02,200 --> 00:10:04,325
In general, reduction we're
going from some problem

174
00:10:04,325 --> 00:10:06,430
a to some problem b.

175
00:10:06,430 --> 00:10:11,730
And what we would like is that
the output instance to problem

176
00:10:11,730 --> 00:10:21,250
b, the output of the
reduction has OPT equal to k

177
00:10:21,250 --> 00:10:29,500
or, for a minimization problem,
OPT bigger than c times k.

178
00:10:29,500 --> 00:10:33,420
And for a maximization problem,
OPT less than k over c.

179
00:10:36,430 --> 00:10:38,880
So that's just saying we
have a gap in the output.

180
00:10:38,880 --> 00:10:41,340
We assume nothing about
the input instance.

181
00:10:41,340 --> 00:10:43,300
That would be a
gap-producing production.

182
00:10:43,300 --> 00:10:46,720
Now we have seen some of
these before, or at least

183
00:10:46,720 --> 00:10:47,800
mentioned them.

184
00:10:47,800 --> 00:10:49,930
One of them, this is
from lecture three,

185
00:10:49,930 --> 00:10:51,460
I think for Tetris.

186
00:10:51,460 --> 00:10:54,380
We proved NP hardness, which was
this three partition reduction.

187
00:10:54,380 --> 00:10:57,080
And the idea is that if you
could satisfy that and open

188
00:10:57,080 --> 00:11:00,630
this thing, then you could get
a zillion points down here.

189
00:11:00,630 --> 00:11:02,360
In most of the
instances down here,

190
00:11:02,360 --> 00:11:04,680
we squeeze this down to
like an n to the epsilon.

191
00:11:04,680 --> 00:11:06,320
That's still hard.

192
00:11:06,320 --> 00:11:09,510
And so n to the 1 minus epsilon
of the instances down here,

193
00:11:09,510 --> 00:11:13,330
and you're given a ton of
pieces to fill in the space,

194
00:11:13,330 --> 00:11:14,510
get lots of points.

195
00:11:14,510 --> 00:11:18,260
If the answer was no here, then
you won't get those points.

196
00:11:18,260 --> 00:11:22,910
And so OPT is very small, at
most, say, n to the epsilon.

197
00:11:22,910 --> 00:11:24,626
If you can solve
this instance, we

198
00:11:24,626 --> 00:11:29,600
have a yes instance in the
input, then we get end points.

199
00:11:29,600 --> 00:11:32,210
So the gap there is n
to the 1 minus epsilon.

200
00:11:36,683 --> 00:11:40,380
So the Tetris reduction, we
assume nothing about the three

201
00:11:40,380 --> 00:11:41,180
partition instance.

202
00:11:41,180 --> 00:11:42,680
It was just yes or no.

203
00:11:42,680 --> 00:11:47,663
And we produced an instance
that had a gap of n

204
00:11:47,663 --> 00:11:48,830
to the 1 minus epsilon.

205
00:11:48,830 --> 00:11:50,840
We could set epsilon
to any constant

206
00:11:50,840 --> 00:11:54,160
we want bigger than zero.

207
00:11:54,160 --> 00:11:57,600
We also mentioned
another such reduction.

208
00:11:57,600 --> 00:11:59,840
And in general, for a
lot of games and puzzles,

209
00:11:59,840 --> 00:12:00,590
you can do this.

210
00:12:00,590 --> 00:12:02,350
It's sort of on all
or nothing deal.

211
00:12:02,350 --> 00:12:06,720
And gap-producing reduction
is a way to formalize that.

212
00:12:06,720 --> 00:12:09,760
Another problem we talked
about last class I believe

213
00:12:09,760 --> 00:12:12,270
was non-metric TSP.

214
00:12:12,270 --> 00:12:14,580
I just give you
a complete graph.

215
00:12:14,580 --> 00:12:17,750
Every edge has some number on it
that's the length of that edge.

216
00:12:17,750 --> 00:12:20,720
You want to find a TSP tour
of minimum total length.

217
00:12:20,720 --> 00:12:25,640
This is really hard to
approximate because depending

218
00:12:25,640 --> 00:12:31,710
on your model, you can use
let's say edge weights.

219
00:12:31,710 --> 00:12:36,330
And to be really annoying
would be 0, comma 1.

220
00:12:36,330 --> 00:12:39,600
And if I'm given a
Hamiltonicity instance, wherever

221
00:12:39,600 --> 00:12:41,680
there's an edge, I
put a weight of zero.

222
00:12:41,680 --> 00:12:44,250
Wherever there's not an
edge, I put a weight of one.

223
00:12:44,250 --> 00:12:46,810
And then if the input
graph was Hamiltonian,

224
00:12:46,810 --> 00:12:47,710
it's a yes instance.

225
00:12:47,710 --> 00:12:51,140
Then the output thing will
have a tour of length zero.

226
00:12:51,140 --> 00:12:55,250
And if the input
was not Hamiltonian,

227
00:12:55,250 --> 00:12:57,840
then the output
will have weight n.

228
00:12:57,840 --> 00:13:00,030
Ratio between n and
zero is infinity.

229
00:13:00,030 --> 00:13:02,580
So this is an
infinite gap creation

230
00:13:02,580 --> 00:13:04,390
if you allow weights of zero.

231
00:13:04,390 --> 00:13:06,840
If you say zero
is cheating, which

232
00:13:06,840 --> 00:13:10,390
we did and some papers
do, you could instead

233
00:13:10,390 --> 00:13:14,460
do one and infinity, where
infinity is the largest

234
00:13:14,460 --> 00:13:15,730
representable number.

235
00:13:15,730 --> 00:13:18,180
So that's going to be
something like 2 to the n

236
00:13:18,180 --> 00:13:22,980
if you allow usual binary
encodings of numbers.

237
00:13:22,980 --> 00:13:25,350
If you don't, the
PB case, then that

238
00:13:25,350 --> 00:13:27,410
would be n to some constant.

239
00:13:27,410 --> 00:13:30,030
But you get a big
gap in any case.

240
00:13:30,030 --> 00:13:36,180
So you get some gap equals huge.

241
00:13:36,180 --> 00:13:39,700
So these are kind of trivial
senses of inapproximability,

242
00:13:39,700 --> 00:13:42,040
but hey, that's
one way to do it.

243
00:13:42,040 --> 00:13:43,540
What we're going
to talk about today

244
00:13:43,540 --> 00:13:48,460
are other known ways to get
gap production that are really

245
00:13:48,460 --> 00:13:50,150
cool and more broadly useful.

246
00:13:50,150 --> 00:13:53,190
This is useful when you have a
sort of all or nothing problem.

247
00:13:53,190 --> 00:13:54,850
A lot of the time,
it's not so clear.

248
00:13:54,850 --> 00:13:57,060
There's a constant
factor approximation.

249
00:13:57,060 --> 00:14:00,319
So some giant gap like this
isn't going to be possible,

250
00:14:00,319 --> 00:14:01,485
but still gaps are possible.

251
00:14:04,420 --> 00:14:13,190
Now, an important part of the
story here is the PCP theorem.

252
00:14:13,190 --> 00:14:16,050
So this is not about drugs.

253
00:14:16,050 --> 00:14:24,840
This is about another
complexity class.

254
00:14:24,840 --> 00:14:27,330
And the complexity
class is normally

255
00:14:27,330 --> 00:14:30,080
written PCP of order
log n, comma order one.

256
00:14:30,080 --> 00:14:33,470
I'm going to simplify this
to just PCP as the class.

257
00:14:33,470 --> 00:14:35,790
The other notions
make sense here,

258
00:14:35,790 --> 00:14:38,660
although the parameters don't
turn out to matter too much.

259
00:14:38,660 --> 00:14:41,250
And it's rather lengthy
to write that every time.

260
00:14:41,250 --> 00:14:43,800
So I'm just going to write PCP.

261
00:14:43,800 --> 00:14:47,330
Let me first tell you what
this class is about briefly,

262
00:14:47,330 --> 00:14:49,940
and then we'll see why
it's directly related

263
00:14:49,940 --> 00:14:52,960
to gap problems,
hence where a lot

264
00:14:52,960 --> 00:14:56,000
of these gap-producing
reductions come from.

265
00:14:56,000 --> 00:14:59,640
So PCP stands for
Probabilistically Checkable

266
00:14:59,640 --> 00:15:00,140
Proof.

267
00:15:11,730 --> 00:15:16,890
The checkable
proof refers to NP.

268
00:15:16,890 --> 00:15:19,060
Every yes instance
has a checkable proof

269
00:15:19,060 --> 00:15:20,930
that the answer is yes.

270
00:15:20,930 --> 00:15:22,480
Probabilistically
checkable means

271
00:15:22,480 --> 00:15:25,650
you can check it even faster
with high probability.

272
00:15:25,650 --> 00:15:29,970
So normally to check a proof,
we take polynomial time in NP.

273
00:15:29,970 --> 00:15:33,360
Here we want to
achieve constant time.

274
00:15:33,360 --> 00:15:34,630
That's the main idea.

275
00:15:34,630 --> 00:15:37,220
That can't be done perfectly,
but you can do it correctly

276
00:15:37,220 --> 00:15:38,750
with high probability.

277
00:15:38,750 --> 00:15:41,720
So in general, a
problem in PCP has

278
00:15:41,720 --> 00:15:44,240
certificates of polynomial
length, just like NP.

279
00:15:54,530 --> 00:16:01,060
And we have an algorithm for
checking certificates, which

280
00:16:01,060 --> 00:16:08,090
is given certificate,
and it's given order log

281
00:16:08,090 --> 00:16:10,285
n bits of randomness.

282
00:16:17,240 --> 00:16:19,560
That's what this first
parameter refers to,

283
00:16:19,560 --> 00:16:21,975
is how much randomness
the algorithm's given.

284
00:16:21,975 --> 00:16:23,600
So we restrict the
amount of randomness

285
00:16:23,600 --> 00:16:26,680
to a very small amount.

286
00:16:26,680 --> 00:16:29,340
And it should tell you
whether the instance

287
00:16:29,340 --> 00:16:32,210
is a yes instance
or a no instance,

288
00:16:32,210 --> 00:16:34,850
in some sense if you're
given the right certificate.

289
00:16:34,850 --> 00:16:37,880
So in particular,
if the instance

290
00:16:37,880 --> 00:16:41,990
was a yes instance-- so this
is back to decision problems,

291
00:16:41,990 --> 00:16:42,610
just like NP.

292
00:16:42,610 --> 00:16:45,200
There's no optimization here.

293
00:16:45,200 --> 00:16:47,480
But we're going to apply
this to gap problems,

294
00:16:47,480 --> 00:16:50,910
and that will relate
us to optimization.

295
00:16:50,910 --> 00:17:02,710
So let's say there's no error
on yes instances, although you

296
00:17:02,710 --> 00:17:03,450
could relax that.

297
00:17:03,450 --> 00:17:05,619
It won't make a big difference.

298
00:17:05,619 --> 00:17:09,500
So if you have a yes instance,
and you give the right

299
00:17:09,500 --> 00:17:18,099
certificate-- so this is
for some certificate--

300
00:17:18,099 --> 00:17:19,710
the algorithm's
guaranteed to say yes.

301
00:17:19,710 --> 00:17:21,450
So no error there.

302
00:17:21,450 --> 00:17:26,510
Where we add some slack is
if there's a no instance.

303
00:17:26,510 --> 00:17:29,060
Now normally in NP
for a no instance,

304
00:17:29,060 --> 00:17:32,320
there is no correct certificate.

305
00:17:32,320 --> 00:17:34,020
Now, the algorithm
will sometimes

306
00:17:34,020 --> 00:17:37,090
say yes even if we give
it the wrong certificate.

307
00:17:37,090 --> 00:17:38,730
There is no right certificate.

308
00:17:38,730 --> 00:17:43,272
But it will say so with some
at most constant probability.

309
00:17:43,272 --> 00:17:50,140
So let's say the probability
that the algorithm says

310
00:17:50,140 --> 00:17:59,860
no is at least some constant,
presumably less than one.

311
00:17:59,860 --> 00:18:02,230
If it's one, then that's NP.

312
00:18:02,230 --> 00:18:05,240
If it's a half,
that would be fine.

313
00:18:05,240 --> 00:18:08,270
A tenth, a hundredth,
they'll all be the same.

314
00:18:08,270 --> 00:18:10,210
Because once you have
such an algorithm that

315
00:18:10,210 --> 00:18:11,940
achieves some
constant probability,

316
00:18:11,940 --> 00:18:16,335
you could apply it log
1 over epsilon times.

317
00:18:19,380 --> 00:18:25,220
And we reduce the
error to epsilon.

318
00:18:28,194 --> 00:18:30,610
The probability of error goes
to epsilon if we just repeat

319
00:18:30,610 --> 00:18:32,310
this log 1 over epsilon times.

320
00:18:32,310 --> 00:18:36,940
So in constant
time-- it didn't say.

321
00:18:36,940 --> 00:18:40,380
The order one here refers to the
running time of the algorithm.

322
00:18:40,380 --> 00:18:47,040
So this is an order
one time algorithm.

323
00:18:47,040 --> 00:18:49,120
So the point is, the
algorithm's super fast

324
00:18:49,120 --> 00:18:52,020
and still in constant
time for constant epsilon.

325
00:18:52,020 --> 00:18:55,130
You can get arbitrarily
small error probability,

326
00:18:55,130 --> 00:18:57,700
say one in 100 or
one in a million,

327
00:18:57,700 --> 00:19:00,640
and it's still pretty good.

328
00:19:00,640 --> 00:19:03,690
And you're checking your
proof super, super fast.

329
00:19:03,690 --> 00:19:04,628
Question.

330
00:19:04,628 --> 00:19:06,970
AUDIENCE: Why is there a
limit on the randomness?

331
00:19:09,352 --> 00:19:10,810
PROFESSOR: This
limit on randomness

332
00:19:10,810 --> 00:19:13,316
is not strictly necessary.

333
00:19:13,316 --> 00:19:14,690
For example, n
bits of randomness

334
00:19:14,690 --> 00:19:15,814
turned out not to help you.

335
00:19:15,814 --> 00:19:17,060
That was proved later.

336
00:19:17,060 --> 00:19:19,550
But we're going to
use this in a moment.

337
00:19:19,550 --> 00:19:22,860
It will help us simulate this
algorithm without randomness,

338
00:19:22,860 --> 00:19:23,909
basically.

339
00:19:23,909 --> 00:19:24,847
Yeah.

340
00:19:24,847 --> 00:19:27,192
AUDIENCE: If the verifier
runs in constant time,

341
00:19:27,192 --> 00:19:29,540
can it either read
or was written?

342
00:19:29,540 --> 00:19:34,050
PROFESSOR: So this is constant
time in a model of computation

343
00:19:34,050 --> 00:19:36,380
where you can read log
n bits in one step.

344
00:19:36,380 --> 00:19:38,582
So your word, let's
say, is log n bits long.

345
00:19:38,582 --> 00:19:40,540
So you have enough time
to read the randomness.

346
00:19:40,540 --> 00:19:42,789
Obviously you don't have
time to read the certificate,

347
00:19:42,789 --> 00:19:44,520
because that has
polynomial length.

348
00:19:44,520 --> 00:19:48,050
But yeah, constant time.

349
00:19:48,050 --> 00:19:48,920
Cool.

350
00:19:48,920 --> 00:19:49,830
Other questions?

351
00:19:49,830 --> 00:19:52,460
So that is the
definition of PCP.

352
00:19:52,460 --> 00:19:57,255
Now let me relate
it to gap problems.

353
00:20:00,010 --> 00:20:21,680
So let's say first claim is
that if we look at this gap sap

354
00:20:21,680 --> 00:20:25,840
problem, where b equals one and
a is some constant, presumably

355
00:20:25,840 --> 00:20:31,310
less than one, then-- in fact,
that should be less than one.

356
00:20:31,310 --> 00:20:36,760
Why did I write strictly
less than 1 here?

357
00:20:36,760 --> 00:20:38,620
This is a constant
less than one.

358
00:20:38,620 --> 00:20:44,749
Then I claim that
problem is in PCP,

359
00:20:44,749 --> 00:20:46,540
that there is a
probabilistically checkable

360
00:20:46,540 --> 00:20:47,960
proof for this instance.

361
00:20:47,960 --> 00:20:51,090
Namely, it's a satisfying
variable assignment.

362
00:20:51,090 --> 00:20:53,500
Again, this instance
either has the prop--

363
00:20:53,500 --> 00:20:56,170
when in a yes instance
all of the entire thing

364
00:20:56,170 --> 00:20:57,220
is satisfiable.

365
00:20:57,220 --> 00:21:05,480
So just like before, I
can have a certificate,

366
00:21:05,480 --> 00:21:10,291
just like an NP satisfying
assignment to the variables

367
00:21:10,291 --> 00:21:10,790
is good.

368
00:21:15,230 --> 00:21:18,280
In the no instance, now I
know, let's say, at most half

369
00:21:18,280 --> 00:21:22,340
of the things are satisfied
if this is one half.

370
00:21:22,340 --> 00:21:25,460
And so what is my
algorithm going to do?

371
00:21:28,300 --> 00:21:31,330
In order to get some at
most constant probability

372
00:21:31,330 --> 00:21:34,462
of failure, it's going
to choose a random clause

373
00:21:34,462 --> 00:21:35,795
and check that it was satisfied.

374
00:21:54,530 --> 00:21:55,800
Uniform random.

375
00:21:55,800 --> 00:21:57,580
So I've got log n bits.

376
00:21:57,580 --> 00:21:59,500
Let's say there are n clauses.

377
00:21:59,500 --> 00:22:04,730
So I can choose one of them at
random by flipping log n coins.

378
00:22:04,730 --> 00:22:09,915
And then check so that
involves-- this is three SATs.

379
00:22:09,915 --> 00:22:11,570
It only involves
three variables.

380
00:22:11,570 --> 00:22:14,180
I check those three
variable value assignments

381
00:22:14,180 --> 00:22:18,380
in my certificate by random
access into the certificate.

382
00:22:18,380 --> 00:22:20,695
In constant time, I
determine whether that clause

383
00:22:20,695 --> 00:22:21,640
is satisfied.

384
00:22:21,640 --> 00:22:24,290
If the clause is satisfied,
algorithm returns yes.

385
00:22:24,290 --> 00:22:25,780
Otherwise, return no.

386
00:22:25,780 --> 00:22:27,750
Now, if it was a
satisfying assignment,

387
00:22:27,750 --> 00:22:29,750
the algorithm will
always say yes.

388
00:22:29,750 --> 00:22:31,330
So that's good.

389
00:22:31,330 --> 00:22:34,010
If it was not
satisfiable, we know

390
00:22:34,010 --> 00:22:37,700
that, let's say at most half
of the clauses are satisfiable.

391
00:22:37,700 --> 00:22:40,020
Which means in
every certificate,

392
00:22:40,020 --> 00:22:43,210
the algorithm will say no
at least half the time.

393
00:22:43,210 --> 00:22:46,410
And half is whatever
that constant is.

394
00:22:46,410 --> 00:22:51,590
So that means the probability
that the algorithm

395
00:22:51,590 --> 00:23:02,190
is wrong is less than 1 over
the gap, whatever that ratio is.

396
00:23:02,190 --> 00:23:03,710
Cool?

397
00:23:03,710 --> 00:23:04,260
Yeah.

398
00:23:04,260 --> 00:23:06,020
AUDIENCE: So does
this [INAUDIBLE]?

399
00:23:08,760 --> 00:23:09,810
So for [INAUDIBLE].

400
00:23:12,630 --> 00:23:18,680
PROFESSOR: Let me tell
you, the PCP theorem

401
00:23:18,680 --> 00:23:22,720
is that NP equals PCP.

402
00:23:22,720 --> 00:23:23,540
This is proved.

403
00:23:23,540 --> 00:23:26,100
So all problems are in PCP.

404
00:23:26,100 --> 00:23:29,370
But this is some motivation
for where this class came from.

405
00:23:33,210 --> 00:23:34,760
I'm not going to
prove this theorem.

406
00:23:34,760 --> 00:23:36,240
The original proof
is super long.

407
00:23:36,240 --> 00:23:38,420
Since then, there have been
relatively short proofs.

408
00:23:38,420 --> 00:23:40,900
I think the shortest proof
currently is two pages long.

409
00:23:40,900 --> 00:23:42,890
Still not going to
prove it because it's

410
00:23:42,890 --> 00:23:47,730
a bit beside the
point to some extent.

411
00:23:47,730 --> 00:23:51,400
It does use reductions
and gap amplification,

412
00:23:51,400 --> 00:23:55,950
but it's technical to
prove it, let's say.

413
00:23:55,950 --> 00:23:59,770
But I will give you some more
motivation for why it's true.

414
00:23:59,770 --> 00:24:10,880
So for example, so
here's one claim.

415
00:24:10,880 --> 00:24:16,820
If one-- let's
change this notation.

416
00:24:16,820 --> 00:24:31,170
If less than 1, comma
1, gap 3SAT is NP-hard,

417
00:24:31,170 --> 00:24:36,650
then NP equals PCP.

418
00:24:36,650 --> 00:24:38,880
So we know that this
is true, but before we

419
00:24:38,880 --> 00:24:45,170
know that here-- so we just
proved that this thing is NPCP.

420
00:24:45,170 --> 00:24:48,849
And if furthermore
this problem--

421
00:24:48,849 --> 00:24:50,390
we're going to prove
this is NP-hard.

422
00:24:50,390 --> 00:24:52,860
That's the motivation.

423
00:24:52,860 --> 00:24:54,800
If you believe
that it's NP-hard,

424
00:24:54,800 --> 00:24:58,330
then we know all problems in
NP can reduce to this thing.

425
00:24:58,330 --> 00:24:59,980
And then that thing is NPCP.

426
00:24:59,980 --> 00:25:02,350
So that tells us that
all problems in NP,

427
00:25:02,350 --> 00:25:06,090
you can convert them into
less than 1, comma 1 gap 3SAT

428
00:25:06,090 --> 00:25:09,530
and then get a PCP
algorithm for them.

429
00:25:09,530 --> 00:25:13,190
So that would be one way
to prove the PCP theorem.

430
00:25:13,190 --> 00:25:16,490
In fact, the reverse
is also true.

431
00:25:16,490 --> 00:25:20,770
And this is sort of more
directly useful to us.

432
00:25:20,770 --> 00:25:41,180
If, let's say, 3SAT is NPCP,
then the gap version of 3SAT

433
00:25:41,180 --> 00:25:41,680
is NP-hard.

434
00:25:45,100 --> 00:25:49,630
This is interesting
because-- this is true

435
00:25:49,630 --> 00:25:54,290
because NP equals PCP, in
particular 3SAT is NPCP.

436
00:25:54,290 --> 00:25:56,180
And so we're going to
be able to conclude,

437
00:25:56,180 --> 00:25:59,850
by a very short argument,
that the gap version of 3SAT

438
00:25:59,850 --> 00:26:00,740
is also NP-hard.

439
00:26:00,740 --> 00:26:03,720
And this proves constant factor
inapproximability of 3SAT.

440
00:26:03,720 --> 00:26:05,790
We will see a tighter
constant in a little bit,

441
00:26:05,790 --> 00:26:09,840
but this will be our
first such bound.

442
00:26:09,840 --> 00:26:11,865
And this is a very
general kind of algorithm.

443
00:26:11,865 --> 00:26:12,800
It's kind of cool.

444
00:26:16,090 --> 00:26:19,310
So PCP is easy for the
gap version of 3SAT.

445
00:26:19,310 --> 00:26:21,970
But suppose there was a
probabilistically checkable

446
00:26:21,970 --> 00:26:24,910
proof for just straight up
3SAT when you're not given

447
00:26:24,910 --> 00:26:28,050
any gap bound, which is true.

448
00:26:28,050 --> 00:26:30,580
It does exist.

449
00:26:30,580 --> 00:26:32,580
So we're going to
use that algorithm.

450
00:26:32,580 --> 00:26:35,740
And we're going to do a
gap-preserving reduction.

451
00:26:46,280 --> 00:26:48,910
The PCP algorithm we're given,
because we're looking at PCP

452
00:26:48,910 --> 00:26:52,800
log n, comma order one,
runs in constant time.

453
00:26:52,800 --> 00:26:55,720
Constant time algorithm
can't do very much.

454
00:26:55,720 --> 00:26:58,200
In particular, I can
write the algorithm

455
00:26:58,200 --> 00:27:00,545
as a constant size formula.

456
00:27:04,510 --> 00:27:07,240
It's really a distribution
over such formulas defined

457
00:27:07,240 --> 00:27:09,750
by the log n and the bits.

458
00:27:09,750 --> 00:27:13,240
But let's say it's a
random variable where

459
00:27:13,240 --> 00:27:17,200
for each possible random choice
is a constant size formula that

460
00:27:17,200 --> 00:27:19,510
evaluates to true or
false, corresponding

461
00:27:19,510 --> 00:27:21,270
to whether the algorithm
says yes or no.

462
00:27:21,270 --> 00:27:22,645
We know we can
convert algorithms

463
00:27:22,645 --> 00:27:27,350
to formulas if they're
a short amount of time.

464
00:27:27,350 --> 00:27:29,370
So we can make
that a CNF formula.

465
00:27:29,370 --> 00:27:31,870
Why not?

466
00:27:31,870 --> 00:27:33,510
3CNF if we want.

467
00:27:33,510 --> 00:27:37,850
My goal is to-- I
want to reduce 3SAT

468
00:27:37,850 --> 00:27:39,140
to the gap version of 3SAT.

469
00:27:39,140 --> 00:27:40,730
Because 3SAT we know is NP-hard.

470
00:27:40,730 --> 00:27:44,290
So if I can reduce it to the
gap version of 3SAT, I'm happy.

471
00:27:44,290 --> 00:27:47,360
Then I know the gap version
of 3SAT is also hard.

472
00:27:47,360 --> 00:27:48,735
So here is my reduction.

473
00:27:53,260 --> 00:27:57,260
So I'm given the 3SAT
formula, and the algorithm

474
00:27:57,260 --> 00:28:01,260
evaluates some formula on
it and the certificate.

475
00:28:01,260 --> 00:28:12,914
What I'm going to do is try
all of the random choices.

476
00:28:12,914 --> 00:28:14,580
Because there's only
log n bits, there's

477
00:28:14,580 --> 00:28:17,620
only polynomially many possible
choices for those bits.

478
00:28:20,970 --> 00:28:23,592
Order log n so it's
n to some constant.

479
00:28:23,592 --> 00:28:25,800
And I want to take this
formula, take the conjunction

480
00:28:25,800 --> 00:28:27,750
over all of those choices.

481
00:28:27,750 --> 00:28:31,090
If the algorithm
always says yes,

482
00:28:31,090 --> 00:28:33,740
then this formula
will be satisfied.

483
00:28:33,740 --> 00:28:38,180
So in the yes instance case,
I get a satisfiable formula.

484
00:28:38,180 --> 00:28:45,680
So yes, complies satisfiable,
100% satisfiable.

485
00:28:45,680 --> 00:28:48,410
That corresponds to this number.

486
00:28:48,410 --> 00:28:51,720
I want it to be 100%
in the yes case.

487
00:28:51,720 --> 00:28:59,650
In the no case, I know
that a constant fraction

488
00:28:59,650 --> 00:29:03,150
of these random
choices give a no.

489
00:29:03,150 --> 00:29:06,420
Meaning, they will
not be satisfied.

490
00:29:06,420 --> 00:29:10,660
For any choice,
any certificate, I

491
00:29:10,660 --> 00:29:14,840
know that a constant
fraction of these terms which

492
00:29:14,840 --> 00:29:19,100
I'm conjuncting will
evaluate to false because

493
00:29:19,100 --> 00:29:20,260
of the definition of PCP.

494
00:29:20,260 --> 00:29:25,310
That's what probability
algorithm saying no means.

495
00:29:25,310 --> 00:29:37,560
So it's a constant fraction
of the terms are false.

496
00:29:40,250 --> 00:29:43,310
The terms are the things
we're conjuncting over.

497
00:29:43,310 --> 00:29:49,070
But each term here is a
constant size CNF formula.

498
00:29:49,070 --> 00:29:51,720
So when I and those together,
I really just get one giant

499
00:29:51,720 --> 00:29:53,740
and of clauses.

500
00:29:53,740 --> 00:29:56,330
Constant fraction larger
than the number of terms.

501
00:29:56,330 --> 00:29:59,410
And if a term is false,
that means at least one

502
00:29:59,410 --> 00:30:01,372
of the clauses is false.

503
00:30:01,372 --> 00:30:03,830
And there's only a constant
number of clauses in each term.

504
00:30:03,830 --> 00:30:06,410
So this means a
constant fraction

505
00:30:06,410 --> 00:30:10,510
of the clauses in that giant
conjunction are also false.

506
00:30:19,500 --> 00:30:21,980
And that is essentially it.

507
00:30:29,420 --> 00:30:30,500
That is my reduction.

508
00:30:42,310 --> 00:30:45,540
So in the yes instance, I get
100% percent satisfiable thing.

509
00:30:45,540 --> 00:30:47,990
In the no instance,
I get some constant

510
00:30:47,990 --> 00:30:49,920
strictly less than
1 satisfiable thing.

511
00:30:49,920 --> 00:30:53,010
Because in any solution,
I get a constant fraction

512
00:30:53,010 --> 00:30:54,844
that turn out to be
false, constant fraction

513
00:30:54,844 --> 00:30:55,468
of the clauses.

514
00:30:55,468 --> 00:30:57,820
Now what the constant is,
you'd have to work out things.

515
00:30:57,820 --> 00:31:00,170
You'd have to know how
big your PCP algorithm is.

516
00:31:00,170 --> 00:31:03,960
But at least we get a
constant lower bound proving--

517
00:31:03,960 --> 00:31:08,530
in particular, proving there's
no P [? task ?] for MAX-3SAT.

518
00:31:08,530 --> 00:31:12,720
This is what you might call
a gap-amplifying reduction,

519
00:31:12,720 --> 00:31:15,150
in the sense we
started with no gap.

520
00:31:15,150 --> 00:31:17,940
The instance of 3SAT was
either true or false.

521
00:31:17,940 --> 00:31:21,105
And we ended up with something
with a significant gap.

522
00:31:24,977 --> 00:31:26,560
So what we're going
to talk about next

523
00:31:26,560 --> 00:31:28,355
is called gap-preserving
reductions.

524
00:31:33,860 --> 00:31:38,120
Maybe before I get there,
what we just showed

525
00:31:38,120 --> 00:31:42,370
is that the PCP
theorem is equivalent.

526
00:31:42,370 --> 00:31:45,600
And in particular, we get
gap problems being NP-hard.

527
00:31:45,600 --> 00:31:48,710
This is why we care about PCPs.

528
00:31:48,710 --> 00:31:53,180
And then in general,
once we have

529
00:31:53,180 --> 00:31:55,750
these kinds of gap
hardness results,

530
00:31:55,750 --> 00:31:58,670
we convert our-- when we're
thinking about reductions

531
00:31:58,670 --> 00:32:03,780
from a to b, because we know
gap implies inapproximability,

532
00:32:03,780 --> 00:32:06,600
we could say, OK, 3SAT
is inapproximable,

533
00:32:06,600 --> 00:32:10,390
and then do, say, an l reduction
from 3SAT to something else.

534
00:32:10,390 --> 00:32:13,500
The something else is
therefore inapproximable also.

535
00:32:13,500 --> 00:32:15,880
That's all good.

536
00:32:15,880 --> 00:32:17,600
But we can also,
instead of thinking

537
00:32:17,600 --> 00:32:21,090
about the inapproximability and
how much carries from a to b,

538
00:32:21,090 --> 00:32:23,550
we can think about
the gap directly.

539
00:32:23,550 --> 00:32:26,480
And this is sort of the main
approach in this lecture

540
00:32:26,480 --> 00:32:28,580
that I'm trying to
demonstrate is by preserving

541
00:32:28,580 --> 00:32:31,440
the gap directly,
a, well you get

542
00:32:31,440 --> 00:32:34,350
new gap bounds and generally
stronger gap bounds.

543
00:32:34,350 --> 00:32:37,310
And then those imply
inapproximability results.

544
00:32:37,310 --> 00:32:40,100
But the gap bounds are stronger
than the inapproximability,

545
00:32:40,100 --> 00:32:43,130
and also they tend to give
larger constant factors

546
00:32:43,130 --> 00:32:46,030
in the inapproximability
results.

547
00:32:46,030 --> 00:32:50,630
So what do we want out of
a gap-preserving reduction?

548
00:32:50,630 --> 00:32:58,720
Let's say we have
an instance x of A.

549
00:32:58,720 --> 00:33:07,940
We convert that into an instance
x prime of some problem B.

550
00:33:07,940 --> 00:33:09,930
We're just going to
think about the OPT of x

551
00:33:09,930 --> 00:33:12,800
versus the OPT of x prime.

552
00:33:12,800 --> 00:33:17,980
And what we want for, let's
say, a minimization problem

553
00:33:17,980 --> 00:33:21,330
is two properties.

554
00:33:21,330 --> 00:33:27,770
One is that the OPT-- if the
OPT of x is at most some k,

555
00:33:27,770 --> 00:33:35,270
then the OPT of x prime
is at most some k prime.

556
00:33:35,270 --> 00:33:37,620
And conversely.

557
00:33:54,320 --> 00:33:56,290
So in general, OPT
may not be preserved.

558
00:33:56,290 --> 00:34:00,910
But let's say it changes
by some prime operation.

559
00:34:00,910 --> 00:34:08,969
So in fact, you can think of k
and k prime as functions of n.

560
00:34:08,969 --> 00:34:12,659
So if I know that OPT of x is
at most some function of n,

561
00:34:12,659 --> 00:34:16,111
then I get that OPT of x prime
is at most some other function

562
00:34:16,111 --> 00:34:17,060
of n.

563
00:34:17,060 --> 00:34:19,420
But there's some known
relation between the two.

564
00:34:19,420 --> 00:34:21,760
What I care about is this gap c.

565
00:34:21,760 --> 00:34:23,860
Should be a c prime here.

566
00:34:23,860 --> 00:34:27,310
So what this is saying is
suppose I had a gap, if I know

567
00:34:27,310 --> 00:34:30,100
that all the solutions
are either less than k

568
00:34:30,100 --> 00:34:33,030
or more than c times
k, I want that to be

569
00:34:33,030 --> 00:34:35,290
preserved for some
possibly other gap c

570
00:34:35,290 --> 00:34:37,187
prime in the new problem.

571
00:34:37,187 --> 00:34:39,520
So this is pretty general,
but this is the sort of thing

572
00:34:39,520 --> 00:34:40,940
we want to preserve.

573
00:34:40,940 --> 00:34:44,679
If we had a gap of c before,
we get some gap c prime after.

574
00:34:44,679 --> 00:34:48,600
If c prime equals c, this would
be a perfectly gap-preserving

575
00:34:48,600 --> 00:34:49,100
reduction.

576
00:34:49,100 --> 00:34:51,440
Maybe we'll lose
some constant factor.

577
00:34:51,440 --> 00:34:54,480
If c prime is
greater than c, this

578
00:34:54,480 --> 00:34:56,280
is called gap amplification.

579
00:35:02,150 --> 00:35:04,720
And gap amplification
is essentially

580
00:35:04,720 --> 00:35:11,790
how the PCP theorem is
shown, by repeatedly

581
00:35:11,790 --> 00:35:14,670
growing the gap until
it's something reasonable.

582
00:35:14,670 --> 00:35:18,400
And if you want to, say, prove
that set cover is log n hard,

583
00:35:18,400 --> 00:35:20,690
it's a similar thing where
you start with a small gap

584
00:35:20,690 --> 00:35:22,610
constant factor, and
then you grow it,

585
00:35:22,610 --> 00:35:25,420
and you show you can grow it
to log n before you run out

586
00:35:25,420 --> 00:35:28,010
of space to write it in
your problem essentially,

587
00:35:28,010 --> 00:35:30,880
before your instance gets
more than polynomial.

588
00:35:30,880 --> 00:35:33,240
Or if you want to prove that
[INAUDIBLE] can't be solved

589
00:35:33,240 --> 00:35:37,170
in better than whatever
n to the 1 minus epsilon,

590
00:35:37,170 --> 00:35:40,510
then a similar trick of
gap amplification works.

591
00:35:40,510 --> 00:35:43,080
Those amplification
arguments are involved,

592
00:35:43,080 --> 00:35:46,260
and so I'm not going
to show them here.

593
00:35:46,260 --> 00:35:49,260
But I will show you an example
of a gap-preserving reduction

594
00:35:49,260 --> 00:35:50,685
next, unless there
are questions.

595
00:35:53,930 --> 00:36:02,190
Cool So I'm going to
reduce a problem which

596
00:36:02,190 --> 00:36:16,220
we have mentioned
before, which is MAX

597
00:36:16,220 --> 00:36:19,320
exactly 3 XOR- and XNOR-SAT.

598
00:36:19,320 --> 00:36:22,760
This is linear equations,
[INAUDIBLE] two,

599
00:36:22,760 --> 00:36:28,210
where every equation
has exactly three terms.

600
00:36:28,210 --> 00:36:38,586
So something like xi XOR xj XOR
xk equals one, or something.

601
00:36:38,586 --> 00:36:39,960
You can also have
negations here.

602
00:36:42,530 --> 00:36:46,460
So I have a bunch of
equations like that.

603
00:36:46,460 --> 00:36:49,191
I'm going to just tell you
a gap bound on this problem,

604
00:36:49,191 --> 00:36:51,690
and then we're going to reduce
it to another problem, namely

605
00:36:51,690 --> 00:36:54,290
MAX-E3-SAT.

606
00:36:54,290 --> 00:36:58,360
So the claim here
is that this problem

607
00:36:58,360 --> 00:37:04,650
is one half plus epsilon,
one minus epsilon, gap

608
00:37:04,650 --> 00:37:09,060
hard for any epsilon.

609
00:37:14,460 --> 00:37:15,920
Which in particular
implies that it

610
00:37:15,920 --> 00:37:19,830
is one half minus
epsilon inapproximable,

611
00:37:19,830 --> 00:37:21,020
unless p equals NP.

612
00:37:24,399 --> 00:37:25,690
But this is of course stronger.

613
00:37:25,690 --> 00:37:28,890
It says if you just
look at instances where

614
00:37:28,890 --> 00:37:37,520
let's say 99% of the equations
are satisfiable versus when

615
00:37:37,520 --> 00:37:42,060
51% are satisfiable, it's
NP-hard to distinguish

616
00:37:42,060 --> 00:37:42,810
between those two.

617
00:37:46,070 --> 00:37:47,940
Why one half here?

618
00:37:47,940 --> 00:37:49,855
Because there is a one
half approximation.

619
00:37:54,830 --> 00:37:56,730
I've kind of mentioned
the general approach

620
00:37:56,730 --> 00:37:58,773
for approximation
algorithms for SAT

621
00:37:58,773 --> 00:38:02,980
is take a random assignment,
variable assignment.

622
00:38:02,980 --> 00:38:08,610
And in this case, because these
statements are about a parity,

623
00:38:08,610 --> 00:38:10,330
if you think of xk
as random, it doesn't

624
00:38:10,330 --> 00:38:11,950
matter what these two are.

625
00:38:11,950 --> 00:38:14,170
50% probability this
will be satisfied.

626
00:38:14,170 --> 00:38:17,580
And so you can always satisfy
at least half of the clauses

627
00:38:17,580 --> 00:38:20,700
because this randomized
algorithm will satisfy half

628
00:38:20,700 --> 00:38:21,540
in expectation.

629
00:38:21,540 --> 00:38:24,840
Therefore, in at least one
instance, it will do so.

630
00:38:24,840 --> 00:38:26,740
But if you allow
randomized approximation,

631
00:38:26,740 --> 00:38:29,800
this is a one half approximation
or a two approximation,

632
00:38:29,800 --> 00:38:32,520
depending on your perspective.

633
00:38:32,520 --> 00:38:34,700
So this is really tight.

634
00:38:34,700 --> 00:38:35,870
That's good news.

635
00:38:35,870 --> 00:38:41,580
And this is essentially a
form of the PCP theorem.

636
00:38:41,580 --> 00:38:47,070
PCP theorem says that
there's some algorithm,

637
00:38:47,070 --> 00:38:49,970
and you can prove that in fact
there is an algorithm that

638
00:38:49,970 --> 00:38:50,800
looks like this.

639
00:38:50,800 --> 00:38:58,010
It's a bunch of linear equations
with three terms per equation.

640
00:38:58,010 --> 00:39:02,300
So let's take that as given.

641
00:39:07,690 --> 00:39:10,950
Now, what I want to
show is a reduction

642
00:39:10,950 --> 00:39:15,440
from that problem to MAX-E3-SAT.

643
00:39:15,440 --> 00:39:24,840
So remember MAX-E3-SAT,
you're given

644
00:39:24,840 --> 00:39:27,680
CNF where every
clause has exactly

645
00:39:27,680 --> 00:39:30,840
three distinct literals.

646
00:39:30,840 --> 00:39:34,100
You want to maximize the
number of satisfied things.

647
00:39:34,100 --> 00:39:37,670
So this is roughly the problem
we were talking about up there.

648
00:39:40,230 --> 00:39:48,570
So first thing
I'm going to do is

649
00:39:48,570 --> 00:39:52,840
I want to reduce this to that.

650
00:39:52,840 --> 00:39:55,700
And this is the reduction.

651
00:39:55,700 --> 00:39:57,970
And the first claim is just
that it's an L-reduction.

652
00:39:57,970 --> 00:39:59,680
So that's something
we're familiar with.

653
00:39:59,680 --> 00:40:00,930
Let's think about it that way.

654
00:40:00,930 --> 00:40:04,510
Then we will think about it
in a gap-preserving sense.

655
00:40:04,510 --> 00:40:08,500
So there are two types of
equations we need to satisfy,

656
00:40:08,500 --> 00:40:12,547
the sort of odd case
or the even case.

657
00:40:12,547 --> 00:40:14,130
Again, each of these
could be negated.

658
00:40:14,130 --> 00:40:17,920
I'm just going to double negate
means unnegated over here.

659
00:40:17,920 --> 00:40:21,550
So each equation is going to
be replaced with exactly four

660
00:40:21,550 --> 00:40:26,260
clauses in the E3-SAT instance.

661
00:40:26,260 --> 00:40:34,660
And the idea is, well, if I want
the parity of them to be odd,

662
00:40:34,660 --> 00:40:37,400
it should be the case that
at least one of them is true.

663
00:40:37,400 --> 00:40:40,680
And if you stare at it
long enough, also when

664
00:40:40,680 --> 00:40:44,890
you put two bars in there, I
don't want exactly two of them

665
00:40:44,890 --> 00:40:45,540
to be true.

666
00:40:45,540 --> 00:40:46,970
That's the parity constraint.

667
00:40:46,970 --> 00:40:51,800
If this is true, all four
of these should be true.

668
00:40:51,800 --> 00:40:54,140
That's the first claim,
just by the parity

669
00:40:54,140 --> 00:40:55,690
of the number of bars.

670
00:40:55,690 --> 00:40:59,010
There's either zero
bars or two bars,

671
00:40:59,010 --> 00:41:02,710
or three positive
or one positive.

672
00:41:02,710 --> 00:41:04,390
That's the two cases.

673
00:41:04,390 --> 00:41:10,600
And in this situation where
I want the parity to be even,

674
00:41:10,600 --> 00:41:14,790
even number of trues, I
have all the even number

675
00:41:14,790 --> 00:41:17,190
of trues cases over here.

676
00:41:17,190 --> 00:41:20,736
Here are two of them even,
and here none of them even.

677
00:41:23,700 --> 00:41:26,610
And again, if this is satisfied,
then all four of those

678
00:41:26,610 --> 00:41:28,250
are satisfied.

679
00:41:28,250 --> 00:41:33,300
Now, if these are not
satisfied, by the same argument

680
00:41:33,300 --> 00:41:35,650
you can show that at least
one of these is violated.

681
00:41:35,650 --> 00:41:38,940
But in fact, just
one will be violated.

682
00:41:38,940 --> 00:41:43,360
So for example, so this
is just a case analysis.

683
00:41:43,360 --> 00:41:46,500
Let's say I set all
of these to be zero,

684
00:41:46,500 --> 00:41:49,450
and so their XOR is
zero and not one.

685
00:41:49,450 --> 00:41:53,270
So if they're all false, then
this will not be satisfied,

686
00:41:53,270 --> 00:41:56,650
but the other three will be.

687
00:41:56,650 --> 00:41:59,700
And in general, because
we have, for example,

688
00:41:59,700 --> 00:42:03,320
xi appearing true and
false in different cases,

689
00:42:03,320 --> 00:42:07,690
you will satisfy
three out of four

690
00:42:07,690 --> 00:42:10,730
on the right when you
don't satisfy on the left.

691
00:42:10,730 --> 00:42:13,287
So the difference is
three versus four.

692
00:42:13,287 --> 00:42:15,620
When these are satisfied, you
satisfy four on the right.

693
00:42:15,620 --> 00:42:17,995
When they're unsatisfied, you
satisfy three on the right.

694
00:42:17,995 --> 00:42:18,910
That's all.

695
00:42:18,910 --> 00:42:19,410
Claiming.

696
00:42:27,260 --> 00:42:32,540
So if the equation
is satisfied, then we

697
00:42:32,540 --> 00:42:38,920
get four in the 3SAT instance.

698
00:42:38,920 --> 00:42:46,670
And if it's unsatisfied, we
turn out to get exactly three.

699
00:42:53,280 --> 00:42:56,180
So I want to prove that
this is an L-reduction.

700
00:42:56,180 --> 00:42:57,940
To prove L-reduction,
we need two things.

701
00:42:57,940 --> 00:43:02,880
One is that the additive gap,
if I solve the 3SAT instance

702
00:43:02,880 --> 00:43:07,570
and convert it back into
a corresponding solution

703
00:43:07,570 --> 00:43:11,730
to MAX-E3 XNOR SAT, which
don't change anything.

704
00:43:11,730 --> 00:43:14,380
The variables are just
what they were before.

705
00:43:14,380 --> 00:43:17,970
That the additive gap
from OPT on the right side

706
00:43:17,970 --> 00:43:21,170
is at most some constant
times the additive gap

707
00:43:21,170 --> 00:43:23,960
on the left side, or vice versa.

708
00:43:23,960 --> 00:43:26,840
In this case, the gap is
exactly preserved because it's

709
00:43:26,840 --> 00:43:28,300
four versus three over here.

710
00:43:28,300 --> 00:43:29,980
It's one versus zero over here.

711
00:43:29,980 --> 00:43:32,750
So additive gap remains one.

712
00:43:32,750 --> 00:43:39,165
And that is called beta, I
think, in L-reduction land.

713
00:43:45,220 --> 00:43:48,580
So this was property
two in the L-reduction.

714
00:43:48,580 --> 00:43:58,130
So the additive error in this
case is exactly preserved.

715
00:44:01,630 --> 00:44:02,580
So there's no scale.

716
00:44:02,580 --> 00:44:05,530
Beta equals one.

717
00:44:05,530 --> 00:44:08,540
If there's some other gap,
if it was five versus three,

718
00:44:08,540 --> 00:44:12,310
then we'd have beta equal two.

719
00:44:12,310 --> 00:44:13,740
Then there was the
other property,

720
00:44:13,740 --> 00:44:17,070
which is you need to show that
you don't blow up OPT too much.

721
00:44:17,070 --> 00:44:20,050
We want the OPT on
the right hand side

722
00:44:20,050 --> 00:44:23,380
to be at most some constant
times OPT on the left hand

723
00:44:23,380 --> 00:44:25,560
side.

724
00:44:25,560 --> 00:44:28,980
This requires a
little bit more care

725
00:44:28,980 --> 00:44:31,290
because we need to make sure
OPT is linear, basically.

726
00:44:31,290 --> 00:44:36,070
We did a lot of these
arguments last lecture.

727
00:44:36,070 --> 00:44:38,040
Because even when you
don't satisfy things,

728
00:44:38,040 --> 00:44:38,920
you still get points.

729
00:44:38,920 --> 00:44:42,680
And the difference between
zero and three is big ratio.

730
00:44:42,680 --> 00:44:44,442
We want that to not
happen too much.

731
00:44:44,442 --> 00:44:46,150
And it doesn't happen
too much because we

732
00:44:46,150 --> 00:44:52,140
know the left hand side OPT is
at least a half of all clauses.

733
00:44:52,140 --> 00:44:55,400
So it's not like there are
very many unsatisfied clauses.

734
00:44:55,400 --> 00:44:56,960
At most, half of
them are unsatisfied

735
00:44:56,960 --> 00:45:02,150
because at least half are
satisfiable in the case of OPT.

736
00:45:02,150 --> 00:45:06,070
So here's the full argument.

737
00:45:12,930 --> 00:45:18,640
In general, OPT for
the 3SAT instance

738
00:45:18,640 --> 00:45:22,020
is going to be four times
all the satisfiable things

739
00:45:22,020 --> 00:45:24,280
plus three times all the
unsatisfiable things.

740
00:45:24,280 --> 00:45:29,600
This is the same thing
as saying the-- sorry.

741
00:45:29,600 --> 00:45:31,900
You take three times
the number of equations.

742
00:45:31,900 --> 00:45:35,090
Every equation gets
three points for free.

743
00:45:35,090 --> 00:45:38,530
And then if you also satisfy
them, you get one more point.

744
00:45:38,530 --> 00:45:42,400
So this is an equation on
those things, the two OPTs.

745
00:45:42,400 --> 00:45:44,930
And we get plus three times
the number of equations.

746
00:45:44,930 --> 00:45:48,090
And because there is a
one half approximation,

747
00:45:48,090 --> 00:45:54,670
we know that number of equations
is at most two times OPT.

748
00:46:01,470 --> 00:46:04,300
Because OPT is at least a
half the number of equations.

749
00:46:04,300 --> 00:46:10,502
And so this thing is overall at
most six plus one seven times

750
00:46:10,502 --> 00:46:11,340
OPT E3 XNOR.

751
00:46:16,260 --> 00:46:21,240
And this is the thing
called alpha in L-reduction.

752
00:46:21,240 --> 00:46:23,110
I wanted to compute
these explicitly

753
00:46:23,110 --> 00:46:25,810
because I want to see how
much inapproximability I get.

754
00:46:25,810 --> 00:46:28,925
Because I started with a
tight inapproximability bound

755
00:46:28,925 --> 00:46:32,220
of one half minus
epsilon being impossible,

756
00:46:32,220 --> 00:46:34,730
whereas one half is possible.

757
00:46:34,730 --> 00:46:40,100
It's tight up to this very tiny
arbitrary additive constant.

758
00:46:40,100 --> 00:46:42,950
And over here, we're
going to lose something.

759
00:46:42,950 --> 00:46:45,580
We know from L-reductions,
if you were inapproximable

760
00:46:45,580 --> 00:46:47,976
before, you get
inapproximability

761
00:46:47,976 --> 00:46:49,100
in this case of MAX-E3-SAT.

762
00:46:49,100 --> 00:46:50,500
E3

763
00:46:50,500 --> 00:46:53,520
So what is the factor?

764
00:46:53,520 --> 00:46:57,250
If you think of-- there's
one simplification

765
00:46:57,250 --> 00:46:59,800
here relative to what
I presented before.

766
00:46:59,800 --> 00:47:02,400
A couple lectures ago, we
always thought about one

767
00:47:02,400 --> 00:47:06,870
plus epsilon approximation,
and how does epsilon change.

768
00:47:06,870 --> 00:47:09,520
And that works really well
for minimization problems.

769
00:47:09,520 --> 00:47:12,090
For a maximization problem,
your approximation factor

770
00:47:12,090 --> 00:47:16,640
is-- an approximation
factor of one

771
00:47:16,640 --> 00:47:20,140
plus epsilon means you are at
least this thing times OPT.

772
00:47:20,140 --> 00:47:24,000
And this thing gets
awkward to work with.

773
00:47:24,000 --> 00:47:26,800
Equivalently, with a
different notion of epsilon,

774
00:47:26,800 --> 00:47:30,260
you could just think of a one
minus epsilon approximation

775
00:47:30,260 --> 00:47:32,070
and how does epsilon change.

776
00:47:32,070 --> 00:47:35,810
And in general, for
maximization problem,

777
00:47:35,810 --> 00:47:38,610
if you have one minus
epsilon approximation

778
00:47:38,610 --> 00:47:41,100
before the L-reduction,
then afterwards you

779
00:47:41,100 --> 00:47:47,870
will have a one minus
epsilon over alpha beta.

780
00:47:47,870 --> 00:47:50,750
So for maximization, we
had one plus epsilon.

781
00:47:50,750 --> 00:47:53,270
And then we got one plus
epsilon over alpha beta.

782
00:47:53,270 --> 00:47:54,900
With the minuses,
it also works out.

783
00:47:54,900 --> 00:47:57,550
That's a cleaner way
to do maximization.

784
00:47:57,550 --> 00:47:59,030
So this was a
maximization problem.

785
00:47:59,030 --> 00:48:02,880
We had over here
epsilon was-- sorry,

786
00:48:02,880 --> 00:48:04,160
different notions of epsilon.

787
00:48:04,160 --> 00:48:07,180
Here we have one half
inapproximability

788
00:48:07,180 --> 00:48:09,780
One half is also known
as one minus one half.

789
00:48:09,780 --> 00:48:12,500
So epsilon here is a half.

790
00:48:12,500 --> 00:48:16,280
And alpha was seven.

791
00:48:16,280 --> 00:48:20,460
Beta was one.

792
00:48:20,460 --> 00:48:22,590
And so we just divide by seven.

793
00:48:22,590 --> 00:48:30,650
So in this case, we
get that MAX-E3-SAT

794
00:48:30,650 --> 00:48:40,920
is one minus one half divided
by seven, which is 1/14.

795
00:48:40,920 --> 00:48:45,100
Technically there's
a minus epsilon here.

796
00:48:45,100 --> 00:48:47,559
Sorry, bad overuse of epsilon.

797
00:48:47,559 --> 00:48:49,600
This is, again, for any
epsilon greater than zero

798
00:48:49,600 --> 00:48:52,560
because we had some epsilon
greater than zero here.

799
00:48:52,560 --> 00:48:54,890
Slightly less than one
half is impossible.

800
00:48:54,890 --> 00:49:00,010
So over here we get slightly
less than one minus 1/14

801
00:49:00,010 --> 00:49:00,850
is impossible.

802
00:49:00,850 --> 00:49:14,030
This is 13/14 minus
epsilon, which is OK.

803
00:49:14,030 --> 00:49:15,370
It's a bound.

804
00:49:15,370 --> 00:49:17,090
But it's not a tight bound.

805
00:49:17,090 --> 00:49:20,710
The right answer
for MAX-3SAT is 7/8.

806
00:49:20,710 --> 00:49:23,700
Because if you take, again,
a uniform random assignment,

807
00:49:23,700 --> 00:49:28,100
every variable flips a coin,
heads or tails, true or false.

808
00:49:28,100 --> 00:49:32,470
Then 7/8 of the clauses will
be satisfied in expectation.

809
00:49:32,470 --> 00:49:36,170
Because if you look at a clause,
if it has exactly three terms

810
00:49:36,170 --> 00:49:39,720
and it's an or of three things,
you just need at least one head

811
00:49:39,720 --> 00:49:41,180
to satisfy this thing.

812
00:49:41,180 --> 00:49:43,770
So you get a 50% chance to
do it in the first time,

813
00:49:43,770 --> 00:49:45,978
and then a quarter chance
to do it in the third time,

814
00:49:45,978 --> 00:49:51,760
and in general 7/8 chance to
get it one of the three times.

815
00:49:51,760 --> 00:49:57,050
7/8 is smaller than 13/14,
so we're not quite there yet.

816
00:49:57,050 --> 00:50:01,060
But this reduction
will do it if we

817
00:50:01,060 --> 00:50:02,940
think about it from
the perspective

818
00:50:02,940 --> 00:50:05,390
of gap-preserving reductions.

819
00:50:05,390 --> 00:50:07,990
So from this general
L-reduction black box

820
00:50:07,990 --> 00:50:13,540
that we only lose an alpha beta
factor, yeah we get this bound.

821
00:50:13,540 --> 00:50:16,600
But from a gap perspective,
we can do better.

822
00:50:16,600 --> 00:50:19,500
The reason we can do better
is because gaps are always

823
00:50:19,500 --> 00:50:22,410
talking about yes instances
where lots of things

824
00:50:22,410 --> 00:50:23,112
are satisfied.

825
00:50:23,112 --> 00:50:25,570
That means we're most of the
time in the case where we have

826
00:50:25,570 --> 00:50:28,897
fours on the right hand side, or
a situation where we have lots

827
00:50:28,897 --> 00:50:31,230
of things unsatisfied, that
means we have lots of threes

828
00:50:31,230 --> 00:50:32,330
on the right hand side.

829
00:50:32,330 --> 00:50:35,230
It lets us get a
slightly tighter bound.

830
00:50:35,230 --> 00:50:36,030
So let's do that.

831
00:50:57,340 --> 00:51:04,630
So here is a gap argument
about the same reduction.

832
00:51:04,630 --> 00:51:15,820
What we're going to claim is
that 7/8 minus epsilon gap 3SAT

833
00:51:15,820 --> 00:51:21,000
is NP-hard, which implies
7/8 inapproximability,

834
00:51:21,000 --> 00:51:23,040
but by looking at it
from the gap perspective,

835
00:51:23,040 --> 00:51:28,110
we will get this stronger
bound versus the 13/14 bound.

836
00:51:28,110 --> 00:51:33,430
So the proof is by a
gap-preserving reduction,

837
00:51:33,430 --> 00:51:41,800
namely that reduction, from
MAX-E3-XNOR-SAT to MAX-3SAT,

838
00:51:41,800 --> 00:51:43,450
E3-SAT I should say.

839
00:51:46,990 --> 00:51:49,420
And so the idea
is the following.

840
00:51:49,420 --> 00:51:51,940
Either we have a yes
instance or a no instance.

841
00:51:55,200 --> 00:52:01,940
If we have a yes instance
to the equation problem,

842
00:52:01,940 --> 00:52:09,010
then we know that at least one
minus epsilon of the equations

843
00:52:09,010 --> 00:52:11,650
are satisfiable.

844
00:52:11,650 --> 00:52:15,820
So we have one minus epsilon.

845
00:52:15,820 --> 00:52:17,445
Let's say m is the
number of equations.

846
00:52:26,570 --> 00:52:28,030
In the no instance
case, of course

847
00:52:28,030 --> 00:52:30,410
we know that not too
many are satisfied.

848
00:52:30,410 --> 00:52:35,890
At most, one half plus epsilon
fraction of the equations

849
00:52:35,890 --> 00:52:37,250
are satisfiable.

850
00:52:43,460 --> 00:52:48,060
So in both cases, I want to
see what that converts into.

851
00:52:48,060 --> 00:52:53,457
So in the yes instance,
we get all four

852
00:52:53,457 --> 00:52:56,690
of those things being satisfied.

853
00:52:56,690 --> 00:53:02,230
So that means we're going
to have at least one

854
00:53:02,230 --> 00:53:08,200
minus epsilon times m times
four clauses satisfied.

855
00:53:08,200 --> 00:53:11,460
We'll also have
epsilon m times three.

856
00:53:11,460 --> 00:53:12,925
Those are the unsatisfied.

857
00:53:12,925 --> 00:53:14,950
And maybe some of them
are actually satisfied,

858
00:53:14,950 --> 00:53:18,337
but this is a lower bound
on how many clauses we get.

859
00:53:21,870 --> 00:53:24,190
On the other hand,
in this situation

860
00:53:24,190 --> 00:53:25,860
where not too many
are satisfied,

861
00:53:25,860 --> 00:53:28,060
that means we get a
tighter upper bound.

862
00:53:28,060 --> 00:53:37,020
So we have one half plus
epsilon times m times four.

863
00:53:37,020 --> 00:53:44,490
And then there's the rest, one
half minus epsilon times three.

864
00:53:44,490 --> 00:53:46,620
And maybe some of these
are not satisfied,

865
00:53:46,620 --> 00:53:51,090
but this is an upper bound on
how many clauses are satisfied

866
00:53:51,090 --> 00:53:55,670
in the 3SAT instance versus
equations in the 3x [INAUDIBLE]

867
00:53:55,670 --> 00:53:57,500
SAT instance.

868
00:53:57,500 --> 00:54:01,770
Now I just want
to compute these.

869
00:54:01,770 --> 00:54:04,040
So everything's times m.

870
00:54:04,040 --> 00:54:06,830
And over here we have
four minus four epsilon.

871
00:54:06,830 --> 00:54:09,850
Over here we have
plus three epsilon.

872
00:54:09,850 --> 00:54:14,390
So that is four minus epsilon m.

873
00:54:14,390 --> 00:54:17,420
And here we have again
everything is times m.

874
00:54:17,420 --> 00:54:25,960
So we have 4/2, also known
as two, plus four epsilon.

875
00:54:28,470 --> 00:54:33,600
Plus we have 3/2
minus three epsilon.

876
00:54:33,600 --> 00:54:35,805
So the epsilons add
up to plus epsilon.

877
00:54:35,805 --> 00:54:36,680
Then I check and see.

878
00:54:36,680 --> 00:54:39,750
Four epsilon minus
three epsilon.

879
00:54:39,750 --> 00:54:45,371
And then we have 4/2 plus
3/2, also known as 7/2.

880
00:54:45,371 --> 00:54:45,870
Yes.

881
00:54:54,120 --> 00:54:59,520
So we had a gap before, and
we get this new gap after.

882
00:54:59,520 --> 00:55:01,135
When we have a yes
instance, we know

883
00:55:01,135 --> 00:55:03,010
that there will be at
least this many clauses

884
00:55:03,010 --> 00:55:04,650
satisfied in the 3SAT.

885
00:55:04,650 --> 00:55:07,520
And there'll be at most this
many in the no instance.

886
00:55:07,520 --> 00:55:15,340
So what we proved is
this bound that-- sorry,

887
00:55:15,340 --> 00:55:16,560
get them in the right order.

888
00:55:16,560 --> 00:55:18,740
7/2 is the smaller one.

889
00:55:18,740 --> 00:55:27,720
7/2 plus epsilon, comma
four minus epsilon gap 3SAT,

890
00:55:27,720 --> 00:55:33,520
E3-SAT, is NP-hard.

891
00:55:33,520 --> 00:55:36,990
Because we had NP hardness
of the gap before,

892
00:55:36,990 --> 00:55:38,570
we did this
gap-preserving reduction,

893
00:55:38,570 --> 00:55:40,787
which ended up
with this new gap,

894
00:55:40,787 --> 00:55:42,620
with this being for no
instances, this being

895
00:55:42,620 --> 00:55:44,640
for yes instances.

896
00:55:44,640 --> 00:55:47,510
And so if we want to-- this
is with the comma notation

897
00:55:47,510 --> 00:55:51,100
for the yes and no what
fraction is satisfied.

898
00:55:51,100 --> 00:55:56,370
If you convert it back
into the c gap notation,

899
00:55:56,370 --> 00:55:58,880
you just take the ratio
between these two things.

900
00:55:58,880 --> 00:56:04,190
And ignoring the epsilons,
this is like 4 divided by 7/2.

901
00:56:04,190 --> 00:56:12,140
So that is 7/8 or 8/7, depending
on which way you're looking.

902
00:56:12,140 --> 00:56:18,080
So we get also 7/8 gap.

903
00:56:18,080 --> 00:56:24,590
Sorry, I guess it's 8/7 the
way I was phrasing it before.

904
00:56:24,590 --> 00:56:26,200
It's also NP-hard.

905
00:56:26,200 --> 00:56:28,994
And so that proves-- there's
also a minus epsilon.

906
00:56:28,994 --> 00:56:30,160
So I should have kept those.

907
00:56:30,160 --> 00:56:34,380
Slightly different epsilon, but
minus two epsilon, whatever.

908
00:56:34,380 --> 00:56:37,800
And so this gives us the 8/7
is the best approximation

909
00:56:37,800 --> 00:56:39,180
factor we can hope for.

910
00:56:39,180 --> 00:56:40,800
AUDIENCE: In the
first notation, isn't

911
00:56:40,800 --> 00:56:42,352
it the fraction of clauses?

912
00:56:42,352 --> 00:56:44,877
So between zero and one?

913
00:56:44,877 --> 00:56:45,710
PROFESSOR: Oh, yeah.

914
00:56:45,710 --> 00:56:47,520
Four is a little funny.

915
00:56:47,520 --> 00:56:48,020
Right.

916
00:56:48,020 --> 00:56:52,600
I needed to scale-- thank you--
because the number of clauses

917
00:56:52,600 --> 00:56:54,970
in the resulting thing
is actually 4m, not m.

918
00:56:54,970 --> 00:56:58,990
So everything here needs
to be divided by four.

919
00:56:58,990 --> 00:57:01,950
It won't affect the final
ratio, but this should really

920
00:57:01,950 --> 00:57:06,050
be over four and over four.

921
00:57:06,050 --> 00:57:15,190
So also known as
7/8 plus epsilon,

922
00:57:15,190 --> 00:57:18,190
comma one minus epsilon.

923
00:57:18,190 --> 00:57:21,470
Now it's a little clearer, 7/8.

924
00:57:21,470 --> 00:57:24,260
Cool.

925
00:57:24,260 --> 00:57:25,252
Yeah.

926
00:57:25,252 --> 00:57:30,220
AUDIENCE: So are there any
[INAUDIBLE] randomness?

927
00:57:30,220 --> 00:57:34,470
AUDIENCE: So for [INAUDIBLE],
you can be the randomness.

928
00:57:34,470 --> 00:57:36,780
Randomness would
give you one half.

929
00:57:36,780 --> 00:57:40,026
[INAUDIBLE] algorithm
gives you 1.8.

930
00:57:40,026 --> 00:57:42,150
PROFESSOR: So you can beat
it by a constant factor.

931
00:57:42,150 --> 00:57:44,340
Probably not by more
than a constant factor.

932
00:57:44,340 --> 00:57:48,280
MAX CUT is an example
where you can beat it.

933
00:57:48,280 --> 00:57:51,720
I think I have the Goemans
Williamson bound here.

934
00:57:55,270 --> 00:57:57,360
MAX CUT, the best
approximation is

935
00:57:57,360 --> 00:58:01,190
0.878, which is better than
what you get by random,

936
00:58:01,190 --> 00:58:03,620
which is a half I guess.

937
00:58:03,620 --> 00:58:05,870
Cool.

938
00:58:05,870 --> 00:58:06,370
All right.

939
00:58:08,740 --> 00:58:09,240
Cool.

940
00:58:09,240 --> 00:58:12,350
So we get optimal
bound for MAX-E3-SAT,

941
00:58:12,350 --> 00:58:16,751
assuming an optimum bound for
E3-XNOR-SAT, which is from PCP.

942
00:58:16,751 --> 00:58:17,250
Yeah.

943
00:58:17,250 --> 00:58:19,000
AUDIENCE: So I'm sorry, can
you explain to me again why

944
00:58:19,000 --> 00:58:20,645
we don't get this
from the L-reduction,

945
00:58:20,645 --> 00:58:22,270
but we do get it from
the gap argument,

946
00:58:22,270 --> 00:58:24,510
even though the reduction
is the same reduction?

947
00:58:24,510 --> 00:58:27,149
PROFESSOR: It just lets
us give a tighter argument

948
00:58:27,149 --> 00:58:27,690
in this case.

949
00:58:27,690 --> 00:58:30,800
By thinking about yes instances
and no instances separately,

950
00:58:30,800 --> 00:58:32,320
we get one thing.

951
00:58:32,320 --> 00:58:35,820
Because this reduction is
designed to do different things

952
00:58:35,820 --> 00:58:37,354
for yes and no instances.

953
00:58:37,354 --> 00:58:39,270
Whereas the L-reduction
just says generically,

954
00:58:39,270 --> 00:58:42,150
if you satisfy these
parameters alpha and beta,

955
00:58:42,150 --> 00:58:44,460
you get some inapproximability
result on the output,

956
00:58:44,460 --> 00:58:46,470
but it's conservative.

957
00:58:46,470 --> 00:58:47,610
It's a conservative bound.

958
00:58:47,610 --> 00:58:50,000
If you just use properties
one and two up here,

959
00:58:50,000 --> 00:58:51,650
that's the best you could show.

960
00:58:51,650 --> 00:58:56,114
But by essentially
reanalyzing property one,

961
00:58:56,114 --> 00:58:58,280
but thinking separately
about yes and no instances--

962
00:58:58,280 --> 00:59:00,370
this held for all instances.

963
00:59:00,370 --> 00:59:02,010
We got a bound of seven.

964
00:59:02,010 --> 00:59:03,705
But in the yes and
the no cases, you

965
00:59:03,705 --> 00:59:05,705
can essentially get a
slightly tighter constant.

966
00:59:09,541 --> 00:59:10,040
All right.

967
00:59:10,040 --> 00:59:14,000
I want to tell you about
another cool problem.

968
00:59:31,990 --> 00:59:46,040
Another gap hardness that you
can get out of PCP analysis

969
00:59:46,040 --> 00:59:52,090
by some gap amplification
essentially, which

970
00:59:52,090 --> 00:59:53,060
is called label cover.

971
01:00:00,110 --> 01:00:04,470
So this problem takes a
little bit of time to define.

972
01:00:04,470 --> 01:00:08,020
But the basic point is there
are very strong lower bounds

973
01:00:08,020 --> 01:00:09,270
on the approximation factor.

974
01:00:26,030 --> 01:00:28,300
So you're given a bipartite
graph, no weights.

975
01:00:32,680 --> 01:00:37,620
The bipartition is A,
B. And furthermore, A

976
01:00:37,620 --> 01:00:40,495
can be divided into k chunks.

977
01:00:44,120 --> 01:00:51,515
And so can B. And these
are disjoint unions.

978
01:00:54,120 --> 01:01:01,480
And let's say size of
A is n, size of B is n,

979
01:01:01,480 --> 01:01:08,010
and size of each Ai
is also the same.

980
01:01:08,010 --> 01:01:11,736
We don't have to make these
assumptions, but you can.

981
01:01:11,736 --> 01:01:13,670
So let's make it a
little bit cleaner.

982
01:01:13,670 --> 01:01:17,340
So in general, A consists of k
groups, each of size n over k.

983
01:01:17,340 --> 01:01:21,470
B consists of k groups,
each of size n over k.

984
01:01:21,470 --> 01:01:26,010
So that's our-- we have A
here with these little groups.

985
01:01:26,010 --> 01:01:29,560
We have B, these little groups.

986
01:01:29,560 --> 01:01:31,060
And there's some
edges between them.

987
01:01:37,700 --> 01:01:43,890
In general, your goal is
to choose some subset of A,

988
01:01:43,890 --> 01:01:49,900
let's call it A prime, and some
subset of B, call it B prime.

989
01:01:49,900 --> 01:01:55,680
And one other thing
I want to talk about

990
01:01:55,680 --> 01:01:57,680
is called a super edge.

991
01:02:02,550 --> 01:02:05,280
And then I'll say what we
want out of these subsets

992
01:02:05,280 --> 01:02:06,720
that we choose.

993
01:02:06,720 --> 01:02:09,830
Imagine contracting
each of these groups.

994
01:02:09,830 --> 01:02:13,250
There are n over k
items here, and there

995
01:02:13,250 --> 01:02:16,330
are k different groups.

996
01:02:16,330 --> 01:02:20,200
Imagine contracting each
group to a single vertex.

997
01:02:20,200 --> 01:02:22,170
This is A1.

998
01:02:22,170 --> 01:02:25,310
This is B3.

999
01:02:25,310 --> 01:02:27,820
I want to say that there's
a super edge from the group

1000
01:02:27,820 --> 01:02:29,790
A1 to the group
B3 because there's

1001
01:02:29,790 --> 01:02:31,790
at least one edge between them.

1002
01:02:31,790 --> 01:02:33,890
If I squashed A1
to a single vertex,

1003
01:02:33,890 --> 01:02:37,110
B3 down to a single vertex, I
would get an edge between them.

1004
01:02:37,110 --> 01:02:45,100
So a super edge, Ai Bi--
Ai Bj, I should say--

1005
01:02:45,100 --> 01:02:50,110
exists if there's
at least one edge

1006
01:02:50,110 --> 01:02:56,840
in AI cross Bj, at least one
edge connecting those groups.

1007
01:02:56,840 --> 01:03:02,110
And I'm going to call such a
super edge covered by A prime B

1008
01:03:02,110 --> 01:03:09,910
prime if at least one of those
edges is in this chosen set.

1009
01:03:09,910 --> 01:03:12,952
So if there's at least
one edge-- sorry.

1010
01:03:16,150 --> 01:03:21,090
If this Ai cross Bj, these
are all the possible edges

1011
01:03:21,090 --> 01:03:33,920
between those groups, intersects
A prime cross B prime.

1012
01:03:33,920 --> 01:03:37,320
And in general, I want to cover
all the hyper edges if I can.

1013
01:03:37,320 --> 01:03:40,460
So I would like to
have a solution where,

1014
01:03:40,460 --> 01:03:44,070
if there is some edge
between A1 and B3,

1015
01:03:44,070 --> 01:03:46,930
then in the set of
vertices I choose,

1016
01:03:46,930 --> 01:03:51,600
A prime and B prime in the left,
they induce at least one edge

1017
01:03:51,600 --> 01:03:54,980
from A1 to B3, and
also from A2 to B3

1018
01:03:54,980 --> 01:03:57,900
because there is an
edge that I drew here.

1019
01:03:57,900 --> 01:04:00,697
I want ideally to choose
the endpoints of that edge,

1020
01:04:00,697 --> 01:04:02,780
or some other edge that
connects those two groups.

1021
01:04:02,780 --> 01:04:03,125
Yeah.

1022
01:04:03,125 --> 01:04:05,310
AUDIENCE: So you're choosing
subsets A prime of A.

1023
01:04:05,310 --> 01:04:07,434
Is there some restriction
on the subset you choose?

1024
01:04:07,434 --> 01:04:08,879
Why don't you choose all of A?

1025
01:04:08,879 --> 01:04:09,545
PROFESSOR: Wait.

1026
01:04:09,545 --> 01:04:10,293
AUDIENCE: Oh, OK.

1027
01:04:10,293 --> 01:04:11,800
You're not done yet?

1028
01:04:11,800 --> 01:04:14,095
PROFESSOR: Nope.

1029
01:04:14,095 --> 01:04:15,595
That's about half
of the definition.

1030
01:04:23,190 --> 01:04:26,130
it's a lot to say it's not
that complicated of a problem.

1031
01:04:30,800 --> 01:04:33,180
So there's two versions.

1032
01:04:33,180 --> 01:04:35,410
That's part of what
makes it longer.

1033
01:04:35,410 --> 01:04:37,350
We'll start with the
maximization version,

1034
01:04:37,350 --> 01:04:39,520
which is called Max-Rep.

1035
01:04:39,520 --> 01:04:42,880
So we have two constraints
on A prime and B prime.

1036
01:04:47,020 --> 01:04:50,180
First is that we choose exactly
one vertex from each group.

1037
01:04:57,700 --> 01:05:02,920
So we got A prime
intersect Ai equals

1038
01:05:02,920 --> 01:05:11,600
one, and B prime intersect Bj
equals one, for all i and j.

1039
01:05:11,600 --> 01:05:15,720
OK And then subject
to that constraint,

1040
01:05:15,720 --> 01:05:20,380
we want to maximize the
number of covered super edges.

1041
01:05:28,140 --> 01:05:33,250
Intuition here is that
those groups are labels.

1042
01:05:33,250 --> 01:05:35,440
And there's really one
super vertex there,

1043
01:05:35,440 --> 01:05:37,860
and you want to choose
one of those labels

1044
01:05:37,860 --> 01:05:39,720
to satisfy the instance.

1045
01:05:39,720 --> 01:05:42,380
So here you're only allowed to
choose one label per vertex.

1046
01:05:42,380 --> 01:05:44,860
We choose one out of
each of the groups.

1047
01:05:44,860 --> 01:05:48,330
Then you'd like to cover
as many edges as you can.

1048
01:05:48,330 --> 01:05:53,736
If there is an edge in the
super graph from Ai to Bj,

1049
01:05:53,736 --> 01:05:57,837
you would like to
include an induced edge.

1050
01:05:57,837 --> 01:05:59,920
There should actually be
an edge between the label

1051
01:05:59,920 --> 01:06:03,520
you assign to Ai and the
label you assign to Bj.

1052
01:06:03,520 --> 01:06:05,690
That's this version.

1053
01:06:05,690 --> 01:06:09,060
The complementary problem
is a minimization problem

1054
01:06:09,060 --> 01:06:13,400
where we switch what is relaxed,
what constraint is relaxed,

1055
01:06:13,400 --> 01:06:15,990
and what constraint must hold.

1056
01:06:15,990 --> 01:06:19,270
So here we're going to
allow multiple labels

1057
01:06:19,270 --> 01:06:21,980
for each super vertex,
multiple vertices

1058
01:06:21,980 --> 01:06:24,560
to be chosen from each group.

1059
01:06:24,560 --> 01:06:28,970
Instead we force that
everything is covered.

1060
01:06:28,970 --> 01:06:39,220
We want to cover every
super edge that exists.

1061
01:06:39,220 --> 01:06:45,710
And our goal is to minimize
the size of these sets,

1062
01:06:45,710 --> 01:06:48,560
A prime plus B prime.

1063
01:06:48,560 --> 01:06:51,320
So this is sort of
the dual problem.

1064
01:06:51,320 --> 01:06:52,859
Here we force one
level per vertex.

1065
01:06:52,859 --> 01:06:54,900
We want to maximize the
number of covered things.

1066
01:06:54,900 --> 01:06:56,660
Here we force everything
to be covered.

1067
01:06:56,660 --> 01:07:00,082
We want to essentially minimize
the number of labels we assign.

1068
01:07:03,060 --> 01:07:05,980
So these problems
are both very hard.

1069
01:07:05,980 --> 01:07:09,540
This should build you
some more intuition.

1070
01:07:09,540 --> 01:07:13,360
Let me show you a puzzle
which is basically

1071
01:07:13,360 --> 01:07:19,100
exactly this game, designed by
MIT professor Dana Moshkovitz.

1072
01:07:19,100 --> 01:07:22,110
So here's a word puzzle.

1073
01:07:22,110 --> 01:07:24,980
Your goal is to put letters
into each of these boxes-- this

1074
01:07:24,980 --> 01:07:30,260
is B, and this is A-- such
that-- for example, this

1075
01:07:30,260 --> 01:07:33,740
is animal, which means
these three things pointed

1076
01:07:33,740 --> 01:07:37,410
by the red arrows, those
letters should concatenate

1077
01:07:37,410 --> 01:07:41,480
to form an animal, like cat.

1078
01:07:41,480 --> 01:07:43,840
Bat is the example.

1079
01:07:43,840 --> 01:07:48,030
So if I write B, A, and T,
animal is satisfied perfectly.

1080
01:07:48,030 --> 01:07:51,460
Because all three
letters form a word,

1081
01:07:51,460 --> 01:07:54,590
I get three points so far.

1082
01:07:54,590 --> 01:07:57,290
Next let's think
about transportation.

1083
01:07:57,290 --> 01:07:59,180
For example, cab is
a three-letter word

1084
01:07:59,180 --> 01:08:00,410
that is transportation.

1085
01:08:00,410 --> 01:08:02,200
Notice there's always
three over here.

1086
01:08:02,200 --> 01:08:07,870
This corresponds to some
regularity constraint

1087
01:08:07,870 --> 01:08:09,210
on the bipartite graph.

1088
01:08:09,210 --> 01:08:12,010
There's always going to be three
arrows going from left to right

1089
01:08:12,010 --> 01:08:16,960
for every group.

1090
01:08:16,960 --> 01:08:18,550
So transportation, fine.

1091
01:08:18,550 --> 01:08:22,630
We got C-A-B. That is happy.

1092
01:08:22,630 --> 01:08:25,330
We happen to reuse the A,
so we get three more points,

1093
01:08:25,330 --> 01:08:26,689
total of six.

1094
01:08:26,689 --> 01:08:32,220
Furniture, we have
B, blank, and T left.

1095
01:08:32,220 --> 01:08:33,720
This is going to
be a little harder.

1096
01:08:33,720 --> 01:08:35,720
I don't know of any
furniture that starts with B

1097
01:08:35,720 --> 01:08:38,160
and ends with T and
is three letters long.

1098
01:08:38,160 --> 01:08:40,950
But if you, for example,
write an E here,

1099
01:08:40,950 --> 01:08:44,005
that's pretty close to the
word bed, which is furniture.

1100
01:08:44,005 --> 01:08:45,880
So in general, of course,
each of these words

1101
01:08:45,880 --> 01:08:48,580
corresponds to a set
of English words.

1102
01:08:48,580 --> 01:08:52,290
That's going to be the
groups on the left.

1103
01:08:52,290 --> 01:08:54,630
So this Ai group
for furniture is

1104
01:08:54,630 --> 01:08:57,229
the set of all words that are
furniture and three letters

1105
01:08:57,229 --> 01:08:59,149
long.

1106
01:08:59,149 --> 01:09:02,104
And then for each such
choice on the left,

1107
01:09:02,104 --> 01:09:03,520
for each such
choice on the right,

1108
01:09:03,520 --> 01:09:04,978
you can say is,
are they compatible

1109
01:09:04,978 --> 01:09:08,180
by either putting
an edge or not.

1110
01:09:08,180 --> 01:09:12,300
And so this is-- we got two
out of three of these edges.

1111
01:09:12,300 --> 01:09:13,649
These two are satisfied.

1112
01:09:13,649 --> 01:09:14,330
This one's not.

1113
01:09:14,330 --> 01:09:18,099
So we get two more points
for a total of eight.

1114
01:09:18,099 --> 01:09:19,640
This is for the
maximization problem.

1115
01:09:19,640 --> 01:09:22,830
Minimization would be different.

1116
01:09:22,830 --> 01:09:27,109
Here's a verb, where we
almost get cry, C-B-Y.

1117
01:09:27,109 --> 01:09:29,600
So we get two more points.

1118
01:09:29,600 --> 01:09:30,870
Here is another.

1119
01:09:30,870 --> 01:09:32,170
We want a verb.

1120
01:09:32,170 --> 01:09:36,270
Blank, A, Y. There are
multiple such verbs.

1121
01:09:36,270 --> 01:09:37,920
You can think of them.

1122
01:09:37,920 --> 01:09:40,800
And on the other hand,
we have a food, which

1123
01:09:40,800 --> 01:09:44,970
is supposed to be blank, E,
Y. So a pretty good choice

1124
01:09:44,970 --> 01:09:46,529
would be P for that top letter.

1125
01:09:46,529 --> 01:09:49,800
Then you get pay exactly
and almost get pea.

1126
01:09:49,800 --> 01:09:52,510
So a total score of 15.

1127
01:09:52,510 --> 01:09:58,585
And so this would be a
solution to Max-Rep of cost 15.

1128
01:09:58,585 --> 01:10:00,275
It's not the best.

1129
01:10:00,275 --> 01:10:02,150
And if you stare at this
example long enough,

1130
01:10:02,150 --> 01:10:05,630
you can actually get a perfect
solution of score 18, where

1131
01:10:05,630 --> 01:10:06,630
there are no violations.

1132
01:10:06,630 --> 01:10:12,440
Basically, in particular you do
say here and get soy for food.

1133
01:10:12,440 --> 01:10:16,015
AUDIENCE: So the sets on
the right are 26 letters?

1134
01:10:16,015 --> 01:10:16,640
PROFESSOR: Yes.

1135
01:10:16,640 --> 01:10:19,760
The Bis here are the
alphabet A through Z,

1136
01:10:19,760 --> 01:10:22,019
and the sets on the
left are a set of words.

1137
01:10:22,019 --> 01:10:24,310
And then you're going to
connect two of them by an edge

1138
01:10:24,310 --> 01:10:30,530
if that letter happens to
match on the right, [INAUDIBLE]

1139
01:10:30,530 --> 01:10:31,660
letter.

1140
01:10:31,660 --> 01:10:33,650
So it's a little--
I mean, the mapping

1141
01:10:33,650 --> 01:10:34,650
is slightly complicated.

1142
01:10:34,650 --> 01:10:37,430
But this is a particular
instance of Max-Rep.

1143
01:10:41,080 --> 01:10:49,430
So what-- well, we get
some super extreme hardness

1144
01:10:49,430 --> 01:10:51,260
for these problems.

1145
01:10:51,260 --> 01:11:05,710
So let's start with epsilon,
comma one gap Max-Rep

1146
01:11:05,710 --> 01:11:06,350
is NP-hard.

1147
01:11:16,160 --> 01:11:20,500
So what I mean by this
is in the best situation,

1148
01:11:20,500 --> 01:11:22,860
you cover all of
the super edges.

1149
01:11:22,860 --> 01:11:26,021
So the one means 100% of
the super edges are covered.

1150
01:11:26,021 --> 01:11:28,270
Epsilon means that at most
an epsilon fraction of them

1151
01:11:28,270 --> 01:11:29,170
are covered.

1152
01:11:29,170 --> 01:11:30,297
So that problem is NP-hard.

1153
01:11:30,297 --> 01:11:32,255
This is a bit stronger
than what we had before.

1154
01:11:32,255 --> 01:11:36,270
Before we had a particular
constant, comma one or one

1155
01:11:36,270 --> 01:11:38,200
minus epsilon or something.

1156
01:11:38,200 --> 01:11:41,170
Here, for any constant
epsilon, this is true.

1157
01:11:43,842 --> 01:11:45,550
And there's a similar
result for Min-Rep.

1158
01:11:45,550 --> 01:11:49,160
It's just from one
to one over epsilon.

1159
01:11:49,160 --> 01:11:51,870
So this means there is no
constant factor approximation.

1160
01:11:51,870 --> 01:11:55,490
Max-Rep is not in APX.

1161
01:11:55,490 --> 01:11:57,460
But it's worse than that.

1162
01:11:57,460 --> 01:12:01,580
We need to assume slightly more.

1163
01:12:01,580 --> 01:12:08,336
In general, what you can show,
if you have some constant, p,

1164
01:12:08,336 --> 01:12:18,930
or there is a constant p, such
that if you can solve this gap

1165
01:12:18,930 --> 01:12:22,700
problem, one over p to the k,
so very tiny fraction of things

1166
01:12:22,700 --> 01:12:25,590
satisfied versus all
of the super edges

1167
01:12:25,590 --> 01:12:37,340
covered, then NP can be solved
in n to the order k time.

1168
01:12:42,300 --> 01:12:44,470
So we haven't usually
used this class.

1169
01:12:44,470 --> 01:12:47,230
Usually we talk about p, which
is the union of all these

1170
01:12:47,230 --> 01:12:48,442
for constant k.

1171
01:12:48,442 --> 01:12:50,150
But here k doesn't
have to be a constant.

1172
01:12:50,150 --> 01:12:52,180
It could be some function of n.

1173
01:12:52,180 --> 01:12:54,640
And in particular, if
p does not equal NP,

1174
01:12:54,640 --> 01:12:57,310
then k constant is not possible.

1175
01:12:57,310 --> 01:13:00,305
So this result
implies this result.

1176
01:13:03,170 --> 01:13:09,570
But if we let k get bigger
than a constant, like log-log n

1177
01:13:09,570 --> 01:13:15,570
or something, then we get
some separation between-- we

1178
01:13:15,570 --> 01:13:18,170
get a somewhat weaker
statement here.

1179
01:13:18,170 --> 01:13:20,470
We know if p does
not equal NP, we know

1180
01:13:20,470 --> 01:13:23,000
that NP is not contained in p.

1181
01:13:23,000 --> 01:13:26,380
But if we furthermore
assume that NP doesn't

1182
01:13:26,380 --> 01:13:31,970
have subexponential solutions,
and very subexponential

1183
01:13:31,970 --> 01:13:34,690
solutions, then we
get various gap bounds

1184
01:13:34,690 --> 01:13:36,460
inapproximability on Max-Rep.

1185
01:13:36,460 --> 01:13:39,920
So a reasonable
limit, for example,

1186
01:13:39,920 --> 01:13:53,060
is that-- let's say we assume
NP is not in n to the polylog n.

1187
01:13:56,336 --> 01:13:59,760
n to the polylog n is usually
called quasi-polynomial.

1188
01:13:59,760 --> 01:14:01,290
It's almost polynomial.

1189
01:14:01,290 --> 01:14:04,870
Log n is kind of close
to constant-- ish.

1190
01:14:04,870 --> 01:14:08,987
This is the same as two to the
polylog n, n to the polylog n.

1191
01:14:08,987 --> 01:14:10,070
But it's a little clearer.

1192
01:14:10,070 --> 01:14:13,700
This is obviously close
to polynomial, quite far

1193
01:14:13,700 --> 01:14:17,250
from exponential, which is
two to the n, not polylog.

1194
01:14:17,250 --> 01:14:18,800
So very different
from exponential.

1195
01:14:18,800 --> 01:14:23,390
So almost everyone
believes NP does not admit

1196
01:14:23,390 --> 01:14:24,550
quasi-polynomial solutions.

1197
01:14:24,550 --> 01:14:26,890
All problems in NP would
have to admit that.

1198
01:14:26,890 --> 01:14:28,566
3SAT, for example,
people don't think

1199
01:14:28,566 --> 01:14:30,482
you can do better than
some constant to the n.

1200
01:14:33,150 --> 01:14:39,030
Then what do we get when
we plug in that value of k?

1201
01:14:39,030 --> 01:14:47,390
That there is a no 1/2 to the
log to the one minus epsilon

1202
01:14:47,390 --> 01:14:49,545
n approximation.

1203
01:14:52,200 --> 01:14:56,080
Or also, the same
thing, gap is hard.

1204
01:14:56,080 --> 01:14:57,960
Now, it's not NP-hard.

1205
01:14:57,960 --> 01:14:59,419
But it's as hard
as this problem.

1206
01:14:59,419 --> 01:15:01,210
If you believe this is
not true, then there

1207
01:15:01,210 --> 01:15:02,830
will be no polynomial
time algorithm

1208
01:15:02,830 --> 01:15:06,530
to solve this
factor gap Max-Rep.

1209
01:15:06,530 --> 01:15:07,890
So this is very large.

1210
01:15:07,890 --> 01:15:12,425
We've seen this before in
this table of various results.

1211
01:15:12,425 --> 01:15:14,800
Near the bottom, there is a
lower bound of two to the log

1212
01:15:14,800 --> 01:15:16,240
to one minus epsilon n.

1213
01:15:16,240 --> 01:15:18,630
This is not assuming
p does not equal NP.

1214
01:15:18,630 --> 01:15:20,720
It's assuming this
statement, NP does not

1215
01:15:20,720 --> 01:15:23,260
have quasi-polynomial
algorithms.

1216
01:15:23,260 --> 01:15:25,100
And you see here
our friends Max-Rep

1217
01:15:25,100 --> 01:15:28,750
and Min-Rep, two
versions of label cover.

1218
01:15:28,750 --> 01:15:32,690
So I'm not going to
prove these theorems.

1219
01:15:32,690 --> 01:15:35,170
But again, they're
PCP style arguments

1220
01:15:35,170 --> 01:15:38,890
with some gap boosting.

1221
01:15:38,890 --> 01:15:43,850
But I would say most or a
lot of approximation lower

1222
01:15:43,850 --> 01:15:47,900
bounds in s world today
start from Max-Rep or Min-Rep

1223
01:15:47,900 --> 01:15:51,020
and reduce to the problem
using usually some kind

1224
01:15:51,020 --> 01:15:52,850
of gap-preserving reduction.

1225
01:15:52,850 --> 01:15:55,020
Maybe they lose the gap,
but we have such a huge gap

1226
01:15:55,020 --> 01:15:56,830
to start with that
even if you lose gap,

1227
01:15:56,830 --> 01:15:59,490
you still get
pretty good results.

1228
01:15:59,490 --> 01:16:03,150
So a couple of quick
examples here on the slides.

1229
01:16:03,150 --> 01:16:05,110
Directed Steiner forest.

1230
01:16:05,110 --> 01:16:07,200
Remember, you have
a directed graph,

1231
01:16:07,200 --> 01:16:10,360
and you have a bunch
of terminal pairs.

1232
01:16:10,360 --> 01:16:14,210
And you want to, in particular,
connect via directed path

1233
01:16:14,210 --> 01:16:17,990
some Ais and Bjs, let's say.

1234
01:16:17,990 --> 01:16:22,050
And you want to do so by
choosing the fewest vertices

1235
01:16:22,050 --> 01:16:23,470
in this graph.

1236
01:16:23,470 --> 01:16:27,560
So what I'm going to do, if I'm
given my bipartite graph here

1237
01:16:27,560 --> 01:16:29,904
for Min-Rep, I'm
just going to add--

1238
01:16:29,904 --> 01:16:31,320
to represent that
this is a group,

1239
01:16:31,320 --> 01:16:34,290
I'm going to add a vertex here
connect by directed edges here.

1240
01:16:34,290 --> 01:16:35,870
And there's a group
down here, so I'm

1241
01:16:35,870 --> 01:16:37,860
going to have downward
edges down there.

1242
01:16:37,860 --> 01:16:40,850
And whenever there's a
super edge from, say,

1243
01:16:40,850 --> 01:16:43,980
A2, capital A2 to
capital B1, then

1244
01:16:43,980 --> 01:16:46,510
I'm going to say in my directed
Steiner forest problem,

1245
01:16:46,510 --> 01:16:49,926
I want a path from
little a2 to little b1.

1246
01:16:49,926 --> 01:16:51,800
So in general, whenever
there's a super edge,

1247
01:16:51,800 --> 01:16:53,460
I add that constraint.

1248
01:16:53,460 --> 01:16:56,020
And then any solution to
directed Steiner forest

1249
01:16:56,020 --> 01:16:58,030
will exactly be a
solution to Min-Rep.

1250
01:16:58,030 --> 01:17:02,040
You're just forcing the
addition of the Ais and Bis.

1251
01:17:02,040 --> 01:17:03,560
It's again an L-reduction.

1252
01:17:03,560 --> 01:17:05,880
You're just offsetting by
a fixed additive amount.

1253
01:17:05,880 --> 01:17:07,791
So your gap OPT
will be the same.

1254
01:17:07,791 --> 01:17:10,290
And so you get that this problem
is just as hard as Min-Rep.

1255
01:17:15,870 --> 01:17:18,040
Well, this is another
one from set cover.

1256
01:17:18,040 --> 01:17:20,950
You can also show node
weighted Steiner trees.

1257
01:17:20,950 --> 01:17:22,410
Log n hard to approximate.

1258
01:17:22,410 --> 01:17:25,250
That's not from Min-Rep,
but threw it in there

1259
01:17:25,250 --> 01:17:27,661
while we're on the
topic of Steiner trees.

1260
01:17:27,661 --> 01:17:28,160
All right.

1261
01:17:28,160 --> 01:17:31,797
I want to mention one
more thing quickly

1262
01:17:31,797 --> 01:17:33,005
in my zero minutes remaining.

1263
01:17:45,530 --> 01:17:48,900
And that is unique games.

1264
01:17:54,500 --> 01:17:58,030
So unique games is a
special case of, say,

1265
01:17:58,030 --> 01:18:08,920
Max-Rep, or either label cover
problem, where the edges in Ai

1266
01:18:08,920 --> 01:18:12,427
cross Bj form a matching.

1267
01:18:17,780 --> 01:18:19,810
For every choice in
the left, there's

1268
01:18:19,810 --> 01:18:25,120
a unique choice on the right
and vice versa that matches.

1269
01:18:25,120 --> 01:18:27,170
Well, there's at most
one choice, I guess.

1270
01:18:27,170 --> 01:18:30,960
And I think that
corresponds to these games.

1271
01:18:30,960 --> 01:18:32,840
Once you choose
a word over here,

1272
01:18:32,840 --> 01:18:34,490
there's unique
letter that matches.

1273
01:18:34,490 --> 01:18:36,180
The reverse is not true.

1274
01:18:36,180 --> 01:18:38,500
So in this problem,
it's more like a star,

1275
01:18:38,500 --> 01:18:39,350
left to right star.

1276
01:18:39,350 --> 01:18:41,980
Once you choose this
word, it's fixed

1277
01:18:41,980 --> 01:18:43,730
what you have to choose
on the right side.

1278
01:18:43,730 --> 01:18:45,563
But if you choose a
single letter over here,

1279
01:18:45,563 --> 01:18:47,650
it does not uniquely
determine the word over here.

1280
01:18:47,650 --> 01:18:49,630
So unique games is
quite a bit stronger.

1281
01:18:49,630 --> 01:18:51,630
You choose either side,
it forces the other one,

1282
01:18:51,630 --> 01:18:54,640
if you want to cover that edge.

1283
01:18:54,640 --> 01:18:58,870
OK So far so good.

1284
01:18:58,870 --> 01:19:03,750
Unique games conjecture is that
the special case is also hard.

1285
01:19:03,750 --> 01:19:14,190
Unique games conjecture is that
epsilon one minus epsilon gap

1286
01:19:14,190 --> 01:19:18,451
unique game is NP-hard.

1287
01:19:18,451 --> 01:19:19,950
Of course, there
are weaker versions

1288
01:19:19,950 --> 01:19:21,917
of this conjecture
that don't say NP-hard,

1289
01:19:21,917 --> 01:19:24,000
maybe assuming some weaker
assumption that there's

1290
01:19:24,000 --> 01:19:27,350
no polynomial time algorithm.

1291
01:19:27,350 --> 01:19:30,030
Unlike every other complexity
theoretic assumption

1292
01:19:30,030 --> 01:19:31,960
I have mentioned in
this class, this one

1293
01:19:31,960 --> 01:19:33,600
is the subject of much debate.

1294
01:19:33,600 --> 01:19:35,320
Not everyone believes
that it's true.

1295
01:19:35,320 --> 01:19:37,380
Some people believe
that it's false.

1296
01:19:37,380 --> 01:19:40,300
Many people believe--
basically people don't know

1297
01:19:40,300 --> 01:19:41,480
is the short answer.

1298
01:19:41,480 --> 01:19:45,085
There's some somewhat scary
evidence that it's not true.

1299
01:19:45,085 --> 01:19:47,710
There's slightly stronger forms
of this that are definitely not

1300
01:19:47,710 --> 01:19:50,140
true, which I won't get into.

1301
01:19:50,140 --> 01:19:53,100
There is a subexponential
algorithm for this problem.

1302
01:19:53,100 --> 01:19:56,750
But it's still up in the air.

1303
01:19:56,750 --> 01:19:58,750
A lot of people like to
assume that this is true

1304
01:19:58,750 --> 01:20:02,950
because it makes life
a lot more beautiful,

1305
01:20:02,950 --> 01:20:05,390
especially from an
inapproximability standpoint.

1306
01:20:05,390 --> 01:20:09,480
So for example, MAX-2SAT, the
best approximation algorithm is

1307
01:20:09,480 --> 01:20:11,081
0.940.

1308
01:20:11,081 --> 01:20:12,580
If you assume that
unique games, you

1309
01:20:12,580 --> 01:20:14,360
can prove a matching
lower bound.

1310
01:20:14,360 --> 01:20:17,770
That was MAX-2SAT for
MAX-CUT, as was mentioned,

1311
01:20:17,770 --> 01:20:22,990
0.878 is the best upper
bound by Goemans Williamson.

1312
01:20:22,990 --> 01:20:25,220
If you assume unique games,
then that's also tight.

1313
01:20:25,220 --> 01:20:27,915
There's a matching
this minus epsilon

1314
01:20:27,915 --> 01:20:33,330
or plus epsilon
inapproximability result.

1315
01:20:33,330 --> 01:20:36,060
And vertex cover, two.

1316
01:20:36,060 --> 01:20:37,790
You probably know how to do two.

1317
01:20:37,790 --> 01:20:41,060
If you assume unique games,
two is the right answer.

1318
01:20:41,060 --> 01:20:42,900
If you don't assume
anything, the best

1319
01:20:42,900 --> 01:20:45,350
we know how to prove
using all of this stuff

1320
01:20:45,350 --> 01:20:51,400
is 0.857 versus 0.5.

1321
01:20:51,400 --> 01:20:56,450
So it's nice to assume
unique games is true.

1322
01:20:56,450 --> 01:21:00,120
Very cool results is if you look
at over all the different CSP

1323
01:21:00,120 --> 01:21:03,030
problems that we've seen,
all the MAX-CSP problems,

1324
01:21:03,030 --> 01:21:05,400
and you try to solve it
using a particular kind

1325
01:21:05,400 --> 01:21:10,490
of semi-definite programming,
there's an STP relaxation.

1326
01:21:10,490 --> 01:21:13,040
If you don't know STPs,
ignore this sentence.

1327
01:21:13,040 --> 01:21:15,940
There's an STP relaxation
of all CSP problems.

1328
01:21:15,940 --> 01:21:18,310
You do the obvious thing.

1329
01:21:18,310 --> 01:21:21,240
And that STP will have
an integrality gap.

1330
01:21:21,240 --> 01:21:23,870
And if you believe
unique games conjecture,

1331
01:21:23,870 --> 01:21:27,670
then that integrality gap equals
the approximability factor,

1332
01:21:27,670 --> 01:21:28,400
one for one.

1333
01:21:28,400 --> 01:21:30,780
And so in this sense, if
you're trying to solve any CSP

1334
01:21:30,780 --> 01:21:32,810
problem, semi-definite
programming

1335
01:21:32,810 --> 01:21:37,050
is the ultimate tool for all
approximation algorithms.

1336
01:21:37,050 --> 01:21:39,500
Because if there's
a gap in the STP,

1337
01:21:39,500 --> 01:21:41,230
you can prove an
inapproximability result

1338
01:21:41,230 --> 01:21:43,270
of that minus epsilon.

1339
01:21:43,270 --> 01:21:44,682
So this is amazingly powerful.

1340
01:21:44,682 --> 01:21:46,890
The only catch is, we don't
know whether unique games

1341
01:21:46,890 --> 01:21:48,350
conjecture is true.

1342
01:21:48,350 --> 01:21:51,080
And for that reason, I'm not
going to spend more time on it.

1343
01:21:51,080 --> 01:21:56,180
But this gives you a flavor of
this side of the field, the gap

1344
01:21:56,180 --> 01:22:00,090
preservation approximation.

1345
01:22:00,090 --> 01:22:02,284
Any final questions?

1346
01:22:02,284 --> 01:22:03,248
Yeah.

1347
01:22:03,248 --> 01:22:05,158
AUDIENCE: If there's a
[INAUDIBLE] algorithm

1348
01:22:05,158 --> 01:22:05,658
[INAUDIBLE]?

1349
01:22:09,550 --> 01:22:12,530
PROFESSOR: It's
fine for a problem

1350
01:22:12,530 --> 01:22:15,570
to be slightly subexponential.

1351
01:22:15,570 --> 01:22:19,970
It's like two to the n to
the epsilon or something.

1352
01:22:19,970 --> 01:22:22,600
So when you do an
NP reduction, you

1353
01:22:22,600 --> 01:22:24,380
can blow things up by
a polynomial factor.

1354
01:22:24,380 --> 01:22:28,000
And so that n to the
epsilon becomes n again.

1355
01:22:28,000 --> 01:22:29,890
So if you start
from 3SAT where we

1356
01:22:29,890 --> 01:22:33,600
don't believe there's a
subexponential thing, when you

1357
01:22:33,600 --> 01:22:36,910
reduce to this, you
might end up putting it--

1358
01:22:36,910 --> 01:22:38,870
you lose that polynomial factor.

1359
01:22:38,870 --> 01:22:42,489
And so it's not a contradiction.

1360
01:22:42,489 --> 01:22:43,030
A bit subtle.

1361
01:22:45,840 --> 01:22:46,620
Cool.

1362
01:22:46,620 --> 01:22:48,910
See you Thursday.