1
00:00:16,602 --> 00:00:18,810
YUFEI ZHAO: We've been
spending the past few lectures

2
00:00:18,810 --> 00:00:21,750
discussing Szemeredi's
Regularity Lemma.

3
00:00:21,750 --> 00:00:24,060
And one of the
first applications

4
00:00:24,060 --> 00:00:26,730
that we discussed of
the Regularity Lemma

5
00:00:26,730 --> 00:00:29,580
is the triangle removal Lemma.

6
00:00:29,580 --> 00:00:31,740
So today, I want to
revisit this topic

7
00:00:31,740 --> 00:00:34,860
and show you a strengthening
of the Removal Lemma

8
00:00:34,860 --> 00:00:37,860
for which new regularity
techniques are needed.

9
00:00:42,030 --> 00:00:45,090
But first, recall the
graph removal Lemma.

10
00:00:57,690 --> 00:01:03,590
In the graph removal Lemma,
we have that for every graph H

11
00:01:03,590 --> 00:01:10,040
and epsilon bigger than zero,
there exists some delta such

12
00:01:10,040 --> 00:01:24,730
that if an N vertex graph
has fewer than delta

13
00:01:24,730 --> 00:01:35,270
and to the number of vertices
of H, many copies of H,

14
00:01:35,270 --> 00:01:53,620
then it can be made H-free by
removing fewer than epsilon N

15
00:01:53,620 --> 00:01:54,520
squared edges.

16
00:01:58,230 --> 00:02:01,065
Even in the case when
H is a triangle, when

17
00:02:01,065 --> 00:02:02,940
this is called a triangle
removal Lemma, even

18
00:02:02,940 --> 00:02:06,330
in that case, basically
the regularity method

19
00:02:06,330 --> 00:02:08,280
is more or less the only
way that we currently

20
00:02:08,280 --> 00:02:10,380
know how to prove this theorem.

21
00:02:10,380 --> 00:02:14,310
So we saw this a
few lectures ago.

22
00:02:14,310 --> 00:02:16,320
What I would like
to discuss today

23
00:02:16,320 --> 00:02:19,650
is a variant of
this result where

24
00:02:19,650 --> 00:02:24,030
instead of considering
copies of H,

25
00:02:24,030 --> 00:02:29,460
we're now considering
induced copies of H. OK?

26
00:02:29,460 --> 00:02:35,010
So this is the induced
graph removal Lemma

27
00:02:35,010 --> 00:02:39,780
where the only difference is
that the hypothesis is now

28
00:02:39,780 --> 00:02:44,910
going to be changed to
induced copies of H.

29
00:02:44,910 --> 00:02:46,590
And the conclusion
is that you can

30
00:02:46,590 --> 00:02:51,330
make the graph induced H-free.

31
00:02:51,330 --> 00:02:52,920
So let me remind
you, the difference

32
00:02:52,920 --> 00:02:55,860
between the induced
graph subgraph

33
00:02:55,860 --> 00:02:59,040
and the usual subgraph.

34
00:02:59,040 --> 00:03:12,220
So we say that H is an induced
copy of G, induced subgraph

35
00:03:12,220 --> 00:03:42,842
of G. If one can obtain H from
G by deleting vertices of G.

36
00:03:42,842 --> 00:03:44,300
You're not allowed
to delete edges,

37
00:03:44,300 --> 00:03:46,910
but only allowed
to delete vertices.

38
00:03:46,910 --> 00:03:55,520
So in other words,
the four cycle

39
00:03:55,520 --> 00:04:05,210
is not an induced
subgraph because, well,

40
00:04:05,210 --> 00:04:08,270
if you select four vertices, you
don't generate this four cycle.

41
00:04:08,270 --> 00:04:09,290
You get extra edges.

42
00:04:09,290 --> 00:04:12,060
So it is a subgraph, but
not an induced subgraph.

43
00:04:20,050 --> 00:04:23,290
So it is a theorem, the
induced graph removal Lemma.

44
00:04:23,290 --> 00:04:24,790
So it's a theorem,
and let's discuss

45
00:04:24,790 --> 00:04:26,220
how we may prove that theorem.

46
00:04:26,220 --> 00:04:26,720
Question.

47
00:04:30,550 --> 00:04:33,000
OK, question is, why is it
stronger than the graph removal

48
00:04:33,000 --> 00:04:34,810
lemma?

49
00:04:34,810 --> 00:04:40,060
So it's not stronger, but
we'll see the relationship

50
00:04:40,060 --> 00:04:41,270
between the two.

51
00:04:41,270 --> 00:04:46,330
So I claim that it is more
difficult to do this theorem.

52
00:04:46,330 --> 00:04:48,940
Any more questions?

53
00:04:48,940 --> 00:04:57,320
So let's pretend for a second
that whatever's in here

54
00:04:57,320 --> 00:04:58,605
is not quite true.

55
00:04:58,605 --> 00:04:59,480
So here's an example.

56
00:05:06,110 --> 00:05:14,360
For example, if your H is
three isolated vertices.

57
00:05:14,360 --> 00:05:15,890
So what is that saying?

58
00:05:15,890 --> 00:05:18,230
We're looking at
copies of H which

59
00:05:18,230 --> 00:05:19,700
are three isolated vertices.

60
00:05:19,700 --> 00:05:26,330
So really you are looking at
triangles in g complement.

61
00:05:26,330 --> 00:05:30,040
So this is exactly the
triangle removal lemma

62
00:05:30,040 --> 00:05:35,300
in the complement of g, but
you can't get rid of these guys

63
00:05:35,300 --> 00:05:36,680
by removing edges.

64
00:05:36,680 --> 00:05:38,360
So we need to make
the modification

65
00:05:38,360 --> 00:05:40,580
where instead of
removing these edges,

66
00:05:40,580 --> 00:05:48,080
we need to both remove and
add by adding or deleting.

67
00:05:51,490 --> 00:05:52,758
So maybe at the same time.

68
00:05:52,758 --> 00:05:55,050
So you're allowed to add some
edges, delete some edges.

69
00:05:55,050 --> 00:05:58,788
But in total, you change no more
than epsilon n squared edges.

70
00:05:58,788 --> 00:06:01,080
So those are sometimes also
known as the edit distance.

71
00:06:05,690 --> 00:06:08,530
You're allowed to change edges.

72
00:06:08,530 --> 00:06:10,810
So you can add edges
and delete edges.

73
00:06:16,090 --> 00:06:17,940
Any questions about
the statement?

74
00:06:20,880 --> 00:06:24,550
All right, so let's
think about how would you

75
00:06:24,550 --> 00:06:27,760
prove this result following
the proof that we did

76
00:06:27,760 --> 00:06:30,660
for the triangle removal lemma.

77
00:06:30,660 --> 00:06:33,340
So let's pretend that
we go through this proof

78
00:06:33,340 --> 00:06:35,230
and think about
what could go wrong.

79
00:06:35,230 --> 00:06:37,840
So remember in the application
of the removal lemma,

80
00:06:37,840 --> 00:06:40,250
so the recipe has three steps.

81
00:06:40,250 --> 00:06:42,454
The first step we
do a partition.

82
00:06:45,420 --> 00:06:49,220
So we partition applying
Szemeredi's regularity lemma

83
00:06:49,220 --> 00:06:50,890
to this partition.

84
00:06:50,890 --> 00:06:56,400
And the second step
is do a cleaning,

85
00:06:56,400 --> 00:06:59,820
and the two key things
that happen in the cleaning

86
00:06:59,820 --> 00:07:11,130
is we remove low density pairs
of parts and irregular pairs.

87
00:07:13,940 --> 00:07:16,380
And the third step
we claim that once we

88
00:07:16,380 --> 00:07:19,890
do the cleaning, once
we remove those edges,

89
00:07:19,890 --> 00:07:22,230
the resulting
graphs should be H3.

90
00:07:22,230 --> 00:07:26,970
Because if we're not H3, then
by considering the vertex parts

91
00:07:26,970 --> 00:07:29,460
where H lie and applying
the counting lemma,

92
00:07:29,460 --> 00:07:33,430
you can generate many
more copies of H.

93
00:07:33,430 --> 00:07:35,550
So these were the three
main steps in the proof

94
00:07:35,550 --> 00:07:37,530
of the triangle removal lemma.

95
00:07:37,530 --> 00:07:40,050
So let's see what
happens when we

96
00:07:40,050 --> 00:07:43,530
try to apply this strategy
to the induced version.

97
00:07:43,530 --> 00:07:47,760
I mean, the partition you still
do the regularity partition.

98
00:07:47,760 --> 00:07:50,370
Nothing really changes there.

99
00:07:50,370 --> 00:07:54,420
So let's see in the
cleaning step what happens.

100
00:07:54,420 --> 00:07:56,460
For low density pairs--

101
00:07:56,460 --> 00:07:59,250
well, so now we need to think
about not just low density

102
00:07:59,250 --> 00:08:01,700
pairs, but also
high density pairs.

103
00:08:01,700 --> 00:08:04,770
Because in the induced, we
think about edges and non-edges

104
00:08:04,770 --> 00:08:05,950
at the same time.

105
00:08:05,950 --> 00:08:09,030
So you might think of a
strategy which is like the edge

106
00:08:09,030 --> 00:08:11,460
density is less than n.

107
00:08:11,460 --> 00:08:15,480
So less than epsilon, then
you remove all those edges.

108
00:08:15,480 --> 00:08:18,450
And if the edge density is
bigger than 1 plus epsilon,

109
00:08:18,450 --> 00:08:22,100
then you add all
of those edges in.

110
00:08:22,100 --> 00:08:23,860
So this is the
natural generalization

111
00:08:23,860 --> 00:08:25,820
of our strategy for
triangle removal

112
00:08:25,820 --> 00:08:27,400
lemma for the induced setting.

113
00:08:27,400 --> 00:08:30,570
So so far,
everything's still OK.

114
00:08:30,570 --> 00:08:33,490
But now what would you do
for the irregular pairs?

115
00:08:37,970 --> 00:08:41,030
That's problematic.

116
00:08:41,030 --> 00:08:43,820
Previously for
triangle removal lemma,

117
00:08:43,820 --> 00:08:47,270
we just said if a pair is
irregular, get rid of that pair

118
00:08:47,270 --> 00:08:51,440
and it will never show
up in the counting stage.

119
00:08:51,440 --> 00:08:54,540
But that strategy
no longer works.

120
00:08:54,540 --> 00:08:59,870
Because for example, if
your graph H being counted

121
00:08:59,870 --> 00:09:08,630
is this here, you do the
regularity partition,

122
00:09:08,630 --> 00:09:12,550
and one of your
pairs is irregular.

123
00:09:12,550 --> 00:09:16,240
So you, let's say, get rid of
all those edges in between.

124
00:09:16,240 --> 00:09:20,680
Then maybe you have
some embedding of H

125
00:09:20,680 --> 00:09:25,905
where you are going to
use the removed edges.

126
00:09:30,010 --> 00:09:34,040
And now you don't
have a counting lemma.

127
00:09:34,040 --> 00:09:41,070
You cannot say, I found this
copy of H in my changed graph.

128
00:09:41,070 --> 00:09:43,650
And by the counting lemma I
could get many copies of H

129
00:09:43,650 --> 00:09:47,620
because you have no control over
this irregular pair anymore.

130
00:09:47,620 --> 00:09:50,880
So the fact that you
have to add and remove

131
00:09:50,880 --> 00:09:52,660
makes it unclear
what to do here,

132
00:09:52,660 --> 00:09:54,810
and this is a big obstacle
in the application

133
00:09:54,810 --> 00:09:59,370
of the regularity lemma to
the induced removal lemma

134
00:09:59,370 --> 00:10:02,310
application.

135
00:10:02,310 --> 00:10:04,350
Any questions about
this obstacle?

136
00:10:08,550 --> 00:10:10,940
So make sure you understand
why this is an issue.

137
00:10:10,940 --> 00:10:13,840
Otherwise you won't
really appreciate

138
00:10:13,840 --> 00:10:16,550
what will happen next.

139
00:10:16,550 --> 00:10:21,630
So somehow we need to find some
kind of regularity partition

140
00:10:21,630 --> 00:10:25,170
to get no irregular pairs.

141
00:10:25,170 --> 00:10:29,780
So the question
is, is there a way

142
00:10:29,780 --> 00:10:36,580
to partition so that there
are no irregular pairs?

143
00:10:42,130 --> 00:10:44,530
For those of you who have
started your homework

144
00:10:44,530 --> 00:10:49,150
problem on time, you realize
that the answer is no.

145
00:10:49,150 --> 00:10:50,770
So one of the
homework problems is

146
00:10:50,770 --> 00:10:53,700
for you to show that for
the specific graph known

147
00:10:53,700 --> 00:10:54,700
as the half graph.

148
00:10:58,110 --> 00:11:00,588
So there was an
example in homework

149
00:11:00,588 --> 00:11:01,630
that for the half graph--

150
00:11:08,665 --> 00:11:11,800
so you'll see in the
homework what this graph is--

151
00:11:11,800 --> 00:11:15,050
you cannot partition it so that
you get rid of all irregular

152
00:11:15,050 --> 00:11:15,550
pairs.

153
00:11:15,550 --> 00:11:18,340
Irregular pairs are
necessary in the statement

154
00:11:18,340 --> 00:11:19,650
of regularity lemma.

155
00:11:22,240 --> 00:11:24,340
So what I want to show
you today is a way

156
00:11:24,340 --> 00:11:29,400
to do what's called a strong
regularity lemma in which you

157
00:11:29,400 --> 00:11:33,000
obtain a somewhat different
consequence that will allow

158
00:11:33,000 --> 00:11:35,438
you to get rid of
irregular pairs

159
00:11:35,438 --> 00:11:36,730
in the more restricted setting.

160
00:11:39,730 --> 00:11:42,100
So this is the issue,
the irregular pairs.

161
00:11:48,420 --> 00:11:50,790
Before telling you what
this regularity lemma is,

162
00:11:50,790 --> 00:11:55,050
I want to give you a
small generalization

163
00:11:55,050 --> 00:11:58,350
of the induced graph removal
lemma, or just a different way

164
00:11:58,350 --> 00:12:00,540
to think about the statement.

165
00:12:00,540 --> 00:12:03,720
And you can think of it as
a colorful version instead

166
00:12:03,720 --> 00:12:08,970
of induced where you
have edges and no edges.

167
00:12:08,970 --> 00:12:11,070
You can also have colored edges.

168
00:12:11,070 --> 00:12:14,190
So colorful removal
lemma, although this name

169
00:12:14,190 --> 00:12:15,300
is not standard.

170
00:12:21,580 --> 00:12:25,610
So colorful-- so when
we talk about graphs,

171
00:12:25,610 --> 00:12:29,040
it's colorful graph
removal lemma.

172
00:12:29,040 --> 00:12:37,490
So for every k, r, and epsilon,
there exists delta such

173
00:12:37,490 --> 00:12:58,020
that if curly H is a set of
r edge of the complete graph

174
00:12:58,020 --> 00:13:00,590
on little k vertices.

175
00:13:00,590 --> 00:13:03,800
So edge coloring just
means using r colors

176
00:13:03,800 --> 00:13:04,700
to color the edges.

177
00:13:04,700 --> 00:13:07,530
So there are no restrictions
about what are allowed,

178
00:13:07,530 --> 00:13:08,450
what are not allowed.

179
00:13:08,450 --> 00:13:12,293
So just a set of
possible r colorings.

180
00:13:12,293 --> 00:13:22,000
Then if the complete graph--

181
00:13:27,823 --> 00:13:28,990
say it slightly differently.

182
00:13:28,990 --> 00:13:44,650
So then every r edge coloring
of the complete graph

183
00:13:44,650 --> 00:14:02,850
on n vertices with fewer than
delta fraction of its k vertex

184
00:14:02,850 --> 00:14:20,310
subsets, say k vertex subgraphs,
belonging to the script H.

185
00:14:20,310 --> 00:14:34,670
So every such graph can be made
curly H free by recoloring,

186
00:14:34,670 --> 00:14:49,710
so using the same r colors, a
fewer than epsilon fraction.

187
00:14:49,710 --> 00:15:00,710
So less than epsilon fraction
of the edges of this kn.

188
00:15:03,360 --> 00:15:06,540
So in particular, the
version that we just stated,

189
00:15:06,540 --> 00:15:16,260
the induced version, so the
induced graph removal lemma,

190
00:15:16,260 --> 00:15:31,450
is the same as having two
colors and H having exactly one

191
00:15:31,450 --> 00:15:44,620
red-blue coloring of k
of the complete graph

192
00:15:44,620 --> 00:15:48,780
on the same number
of vertices as H.

193
00:15:48,780 --> 00:15:51,930
So you color red the edges
and blue the non-edges,

194
00:15:51,930 --> 00:15:53,650
for instance.

195
00:15:53,650 --> 00:15:58,680
And you're saying, I want to
color the big complete graph

196
00:15:58,680 --> 00:16:02,760
with red and blue in such a way
that there are very few copies

197
00:16:02,760 --> 00:16:03,900
of that pattern.

198
00:16:03,900 --> 00:16:05,730
So then I can recolor
the red and blue

199
00:16:05,730 --> 00:16:09,750
in a small number of places to
get rid of all such patterns.

200
00:16:09,750 --> 00:16:11,940
So having a colored
pattern somewhere

201
00:16:11,940 --> 00:16:14,610
in your graph in this
complete graph coloring

202
00:16:14,610 --> 00:16:19,170
is the same as having
an induced subgraph.

203
00:16:19,170 --> 00:16:19,750
Yeah?

204
00:16:19,750 --> 00:16:21,130
AUDIENCE: So after done--

205
00:16:21,130 --> 00:16:23,976
like the statement after done
is a really long sentence.

206
00:16:23,976 --> 00:16:24,560
Can I--

207
00:16:24,560 --> 00:16:28,020
YUFEI ZHAO: Yeah, OK.

208
00:16:28,020 --> 00:16:37,380
So every r edge coloring of kn
with a small number of patterns

209
00:16:37,380 --> 00:16:43,710
can be made h-free by recoloring
a small fraction of the edges.

210
00:16:43,710 --> 00:16:45,630
So like in a triangle
removal lemma,

211
00:16:45,630 --> 00:16:49,170
every graph with a small
number of triangles

212
00:16:49,170 --> 00:16:51,540
can be made
triangle-free by removing

213
00:16:51,540 --> 00:16:52,710
a small number of edges.

214
00:16:58,020 --> 00:17:01,710
Any other questions?

215
00:17:01,710 --> 00:17:07,079
So this is a restatement of
the induced removal lemma

216
00:17:07,079 --> 00:17:10,349
with a bit more generality.

217
00:17:10,349 --> 00:17:12,960
It's OK if you like
this one more or less,

218
00:17:12,960 --> 00:17:15,810
but let's talk about the
induced version from now on.

219
00:17:15,810 --> 00:17:18,450
But the same proofs that
I will talk about also

220
00:17:18,450 --> 00:17:22,710
applies to this version where
you have somewhat more colors.

221
00:17:26,349 --> 00:17:30,940
So the variant of the
regularity lemma that we'll need

222
00:17:30,940 --> 00:17:33,370
is known as a strong
regularity lemma.

223
00:17:47,320 --> 00:17:49,210
To state the strong
regularity lemma,

224
00:17:49,210 --> 00:17:52,420
let me recall a notion that came
up in the proof of Szemeredi's

225
00:17:52,420 --> 00:17:54,820
regularity lemma.

226
00:17:54,820 --> 00:17:57,740
And this was the
notion of an energy.

227
00:17:57,740 --> 00:18:04,930
So recall that if you have
a partition, denoted P. So

228
00:18:04,930 --> 00:18:11,980
if this is a partition of
the vertex set of a graph, G,

229
00:18:11,980 --> 00:18:17,770
and here n is the
number of vertices,

230
00:18:17,770 --> 00:18:29,390
we defined this notion of energy
to be this quantity denoted

231
00:18:29,390 --> 00:18:36,790
q, which is basically a
squared mean of the densities

232
00:18:36,790 --> 00:18:42,010
between vertex parts
appropriately normalized

233
00:18:42,010 --> 00:18:46,930
if the vertexes do not
all have the same size.

234
00:18:52,860 --> 00:18:58,690
In the proof of Szemeredi's
regularity lemma,

235
00:18:58,690 --> 00:19:00,760
there was an important
energy increment

236
00:19:00,760 --> 00:19:12,250
step which says that if you have
some partition p that is not

237
00:19:12,250 --> 00:19:24,318
epsilon regular, then there
exists a refinement, Q.

238
00:19:24,318 --> 00:19:29,920
And this refinement has
the property that Q has

239
00:19:29,920 --> 00:19:37,440
a small number of pieces, or
not too large as a function of P

240
00:19:37,440 --> 00:19:41,170
So it's bounded at
least in terms of P.

241
00:19:41,170 --> 00:19:48,370
But also if P is not epsilon
regular, then the energy of Q

242
00:19:48,370 --> 00:19:55,160
is significantly larger than
the energy of P. So remember,

243
00:19:55,160 --> 00:19:58,580
this was an important step in
the proof of regularity lemma.

244
00:20:03,130 --> 00:20:07,380
So to state the strong
regularity lemma,

245
00:20:07,380 --> 00:20:09,298
we need that notion of energy.

246
00:20:09,298 --> 00:20:11,340
And the statement of the
strong regularity lemma,

247
00:20:11,340 --> 00:20:13,298
if you've never seen this
kind of thing before,

248
00:20:13,298 --> 00:20:14,880
will seem a bit
intimidating at first

249
00:20:14,880 --> 00:20:19,230
because it involves a whole
sequence of parameters.

250
00:20:19,230 --> 00:20:20,640
But we'll get used to it.

251
00:20:23,880 --> 00:20:26,790
So instead of one
epsilon parameter,

252
00:20:26,790 --> 00:20:35,945
now you have a sequence
of positive epsilons.

253
00:20:35,945 --> 00:20:38,030
And part of the strength
of this regularity lemma

254
00:20:38,030 --> 00:20:41,120
is that depending on the
application you have in mind,

255
00:20:41,120 --> 00:20:44,570
you can make the sequence
go to zero pretty quickly.

256
00:20:44,570 --> 00:20:48,810
Thereby increasing the strength
of the regularity lemma.

257
00:20:48,810 --> 00:20:53,750
So there exists some m
bound, which depends only

258
00:20:53,750 --> 00:21:07,960
on your epsilons such that
every graph has not just one,

259
00:21:07,960 --> 00:21:16,170
but now we're going to get a
pair of vertex partitions P

260
00:21:16,170 --> 00:21:22,930
and Q with the
following properties.

261
00:21:22,930 --> 00:21:27,180
So first, P refines--

262
00:21:27,180 --> 00:21:36,990
so Q refines P. So it's
a pair of partitions,

263
00:21:36,990 --> 00:21:38,650
one refining the other.

264
00:21:42,400 --> 00:21:45,250
The number of parts
of Q is bounded

265
00:21:45,250 --> 00:21:47,260
just like in the usual
regularity lemma.

266
00:21:50,070 --> 00:21:54,960
The partition P
epsilon 0 regular.

267
00:21:57,720 --> 00:22:02,350
And here is the new part
that's the most important one.

268
00:22:02,350 --> 00:22:07,220
Q is very epsilon regular.

269
00:22:07,220 --> 00:22:09,250
So it's not just
epsilon 0 regular,

270
00:22:09,250 --> 00:22:13,410
it's epsilon sub the number
of parts of P regular.

271
00:22:16,330 --> 00:22:23,380
So you should think of
this as extremely regular

272
00:22:23,380 --> 00:22:28,280
because you get to choose what
the sequence of epsilon is.

273
00:22:28,280 --> 00:22:32,720
And finally, the energy
difference between P and Q

274
00:22:32,720 --> 00:22:33,980
is not too big.

275
00:22:43,690 --> 00:22:46,830
This is the statement of
the strong regularity lemma.

276
00:22:46,830 --> 00:22:48,750
It produces for you
not just one partition,

277
00:22:48,750 --> 00:22:50,780
but a pair of partitions.

278
00:22:50,780 --> 00:22:52,800
And in this pair
of partitions, you

279
00:22:52,800 --> 00:22:57,480
have one partition, P,
which is similar to the one

280
00:22:57,480 --> 00:22:59,660
that we obtained from
Szemeredi's regularity lemma

281
00:22:59,660 --> 00:23:06,150
is some epsilon 0 regular, but
we also get a refinement Q.

282
00:23:06,150 --> 00:23:11,250
And this Q is extremely regular.

283
00:23:11,250 --> 00:23:19,510
So you can think
that is P, then Q

284
00:23:19,510 --> 00:23:22,040
is an extremely
regular refinement

285
00:23:22,040 --> 00:23:28,210
of P. Any questions
about the statement

286
00:23:28,210 --> 00:23:29,960
of the strong regularity lemma?

287
00:23:33,860 --> 00:23:35,610
So the sequence of
epsilons gives you

288
00:23:35,610 --> 00:23:39,810
flexibility on how to apply it,
but let's see how to prove it.

289
00:23:39,810 --> 00:23:43,800
And the proof is once you
understand how this works,

290
00:23:43,800 --> 00:23:47,070
conceptually it's pretty short.

291
00:23:47,070 --> 00:23:50,370
But let me do it slowly
so that we can appreciate

292
00:23:50,370 --> 00:23:54,970
this sequence of epsilons.

293
00:23:54,970 --> 00:23:58,270
And the idea is that we
will repeatedly apply

294
00:23:58,270 --> 00:23:59,570
Szemeredi's regularity lemma.

295
00:24:06,030 --> 00:24:08,510
So start with the
regularity lemma.

296
00:24:08,510 --> 00:24:12,980
We'll apply it
repeatedly to generate

297
00:24:12,980 --> 00:24:16,540
a sequence of partitions.

298
00:24:16,540 --> 00:24:21,620
So first, let me remind you
a statement of Szemeredi's

299
00:24:21,620 --> 00:24:22,610
regularity lemma.

300
00:24:22,610 --> 00:24:24,320
This is slightly
different from the one

301
00:24:24,320 --> 00:24:28,330
that we stated, but comes
out of the same proof.

302
00:24:28,330 --> 00:24:33,440
So for every epsilon,
there exists some m0

303
00:24:33,440 --> 00:24:41,500
which depends on epsilon such
that for every partition P0,

304
00:24:41,500 --> 00:24:46,530
so starting with
some partition--

305
00:24:46,530 --> 00:24:48,510
so actually, let me
start with just P.

306
00:24:48,510 --> 00:24:54,920
So if you start with some
partition of the vertex set

307
00:24:54,920 --> 00:25:10,590
of g, there exists a refinement
P prime of P into at most--

308
00:25:10,590 --> 00:25:27,520
OK, so the refinement has is
such that with each part of P

309
00:25:27,520 --> 00:25:38,960
refined into at
most m0 parts such

310
00:25:38,960 --> 00:25:44,650
that P prime, the new
partition, is epsilon regular.

311
00:25:52,470 --> 00:25:54,900
So this is a statement of
Szemeredi's regularity lemma

312
00:25:54,900 --> 00:25:56,850
that we will apply repeatedly.

313
00:25:56,850 --> 00:25:58,870
So in the version that
we've seen before,

314
00:25:58,870 --> 00:26:02,500
we would start with
a trivial partition.

315
00:26:02,500 --> 00:26:05,740
And applying
refinements repeatedly

316
00:26:05,740 --> 00:26:09,880
in the proof to get a partition
into a bounded number of parts

317
00:26:09,880 --> 00:26:14,080
such that the final
partition is epsilon regular.

318
00:26:14,080 --> 00:26:17,020
But instead, in the proof
of the regularity lemma

319
00:26:17,020 --> 00:26:19,180
if you start with not
a trivial partition

320
00:26:19,180 --> 00:26:24,100
but start with a given partition
and run this exact same proof,

321
00:26:24,100 --> 00:26:26,380
you find this consequence.

322
00:26:26,380 --> 00:26:29,320
Except now you can guarantee
that the final partition

323
00:26:29,320 --> 00:26:31,570
is a refinement of the
one that you are given.

324
00:26:35,880 --> 00:26:42,940
So let's apply the
statement, and we obtain

325
00:26:42,940 --> 00:26:50,640
a sequence of partitions of g--

326
00:26:50,640 --> 00:26:52,460
the vertex set of g--

327
00:26:52,460 --> 00:27:07,250
starting with P0 being a
trivial partition, and so on.

328
00:27:07,250 --> 00:27:17,330
Such that each partition,
each P sub i plus 1

329
00:27:17,330 --> 00:27:29,940
refines the previous
one, and such

330
00:27:29,940 --> 00:27:39,370
that each P sub i plus 1 is
epsilon sub a P sub i regular.

331
00:27:42,360 --> 00:27:45,460
So you apply the regularity
lemma with parameter

332
00:27:45,460 --> 00:27:49,150
based on the number of
parts you currently have.

333
00:27:49,150 --> 00:27:51,010
Applied to the
current partition,

334
00:27:51,010 --> 00:27:56,820
you get a finer partition
that's extremely regular.

335
00:27:56,820 --> 00:27:58,920
And you also know
that the number

336
00:27:58,920 --> 00:28:04,350
of parts of the new
partition is bounded in terms

337
00:28:04,350 --> 00:28:06,737
of the previous partition.

338
00:28:23,138 --> 00:28:26,115
All right.

339
00:28:26,115 --> 00:28:26,990
Any questions so far?

340
00:28:31,510 --> 00:28:35,790
So now we get this
sequence of partitions.

341
00:28:35,790 --> 00:28:37,320
We can keep on doing this.

342
00:28:37,320 --> 00:28:43,150
So g could be arbitrarily
large, but eventually we

343
00:28:43,150 --> 00:28:45,980
will be able to obtain
the last condition here,

344
00:28:45,980 --> 00:28:49,170
which is the only thing
that is missing so far.

345
00:28:49,170 --> 00:28:59,510
So since the energy is
bounded between 0 and 1,

346
00:28:59,510 --> 00:29:06,650
there exists some i at
most 1 over epsilon 0

347
00:29:06,650 --> 00:29:15,860
such that the energy goes
up by less than epsilon 0.

348
00:29:26,390 --> 00:29:28,700
Because otherwise your
energy would exceed 1.

349
00:29:33,370 --> 00:29:38,590
So now let's set
P to be this Pi,

350
00:29:38,590 --> 00:29:43,920
and Q to be this,
the refinement--

351
00:29:43,920 --> 00:29:45,670
the next term in the partition.

352
00:29:50,470 --> 00:29:54,680
And what we find is that the--

353
00:29:54,680 --> 00:29:57,110
so then you have basically
all the conditions.

354
00:29:57,110 --> 00:30:03,080
So p is epsilon 0 regular,
because it is epsilon--

355
00:30:03,080 --> 00:30:07,870
the previous term, which is
at most epsilon 0 regular.

356
00:30:07,870 --> 00:30:11,970
And you have this one as
well, and this one as well.

357
00:30:11,970 --> 00:30:14,700
And we want to show that
the number of parts of Q

358
00:30:14,700 --> 00:30:16,960
is bounded.

359
00:30:16,960 --> 00:30:19,310
And that's basically
because each time there

360
00:30:19,310 --> 00:30:22,790
was a bound on the number
of parts which depends only

361
00:30:22,790 --> 00:30:24,890
on the regularity
parameters, and you're

362
00:30:24,890 --> 00:30:27,980
repeating that bound a
bounded number of times.

363
00:30:30,520 --> 00:30:44,610
So Q is-- so it's bounded as
a function of the sequence

364
00:30:44,610 --> 00:30:47,010
of epsilons-- this infinite
vector of epsilons,

365
00:30:47,010 --> 00:30:49,180
but it is a bounded number.

366
00:30:49,180 --> 00:30:54,323
You're only iterating this
bound a bounded number of times.

367
00:30:54,323 --> 00:30:55,490
And that finishes the proof.

368
00:31:00,946 --> 00:31:02,363
Any questions?

369
00:31:09,770 --> 00:31:13,180
It may be somewhat mysterious
to you right now why we do this,

370
00:31:13,180 --> 00:31:14,840
so we'll get that
application a second.

371
00:31:14,840 --> 00:31:18,010
But for now, I just want to
comment a bit on the bounds.

372
00:31:26,090 --> 00:31:32,130
Of course, the bounds depend
on what epsilon i's do you use.

373
00:31:32,130 --> 00:31:33,900
And typically, you
want the epsilon i's

374
00:31:33,900 --> 00:31:37,350
to decrease with more
parts that you have.

375
00:31:37,350 --> 00:31:40,020
And with almost all
reasonable applications

376
00:31:40,020 --> 00:31:44,450
of this regularity lemma,
the strong regularity lemma--

377
00:31:44,450 --> 00:31:52,960
so for example, with epsilon
i being some epsilon divided

378
00:31:52,960 --> 00:31:58,140
by, let's say, i plus 1 or
any polynomial of the i's--

379
00:31:58,140 --> 00:32:02,230
or you can even let it decay
quicker than that, as well.

380
00:32:02,230 --> 00:32:06,310
You see, basically what
happens is that you

381
00:32:06,310 --> 00:32:09,310
are applying this m0 bound.

382
00:32:15,300 --> 00:32:25,360
m0 applied in succession
1 over epsilon times.

383
00:32:29,250 --> 00:32:33,800
In the regularity lemma, we
saw that the m0 that comes out

384
00:32:33,800 --> 00:32:37,130
of Szemeredi's graph regularity
lemma is the tower function.

385
00:32:39,872 --> 00:32:43,970
So the tower function,
that's a tower of i

386
00:32:43,970 --> 00:32:52,326
is defined to be the exponential
function iterated i times.

387
00:32:52,326 --> 00:32:53,920
So of course, I'm
being somewhat loose

388
00:32:53,920 --> 00:32:57,250
here with the exact dependence,
but you get the idea

389
00:32:57,250 --> 00:33:02,050
that now we want to apply
the tower function i times.

390
00:33:15,220 --> 00:33:17,350
Instead of iterating
the exponential i times,

391
00:33:17,350 --> 00:33:20,380
now you iterate the
tower function i times.

392
00:33:20,380 --> 00:33:23,292
And some of you laughing, this
is an incredibly large number.

393
00:33:23,292 --> 00:33:25,000
It's even larger than
the tower function.

394
00:33:29,490 --> 00:33:33,530
So in literature, especially
around the regularity lemma,

395
00:33:33,530 --> 00:33:37,020
this function where you
iterate the tower function i

396
00:33:37,020 --> 00:33:39,873
times is given the name wowzer.

397
00:33:39,873 --> 00:33:47,280
[LAUGHTER] As in, wow,
this is a huge number.

398
00:33:47,280 --> 00:33:50,400
So it's a step up in
the Ackerman hierarchy.

399
00:33:50,400 --> 00:33:53,480
So if you repeat the
wowzer function i times,

400
00:33:53,480 --> 00:33:56,640
you move up one ladder
in the Ackerman hierarchy

401
00:33:56,640 --> 00:34:01,060
and this hierarchy of
rapidly growing functions.

402
00:34:01,060 --> 00:34:03,560
But in any case, it's bounded
and that's good enough for us.

403
00:34:10,695 --> 00:34:11,570
Any questions so far?

404
00:34:16,380 --> 00:34:16,950
Yeah?

405
00:34:16,950 --> 00:34:19,857
AUDIENCE: What do you
call like [INAUDIBLE]

406
00:34:19,857 --> 00:34:21,690
YUFEI ZHAO: Yes, so
question is, what do you

407
00:34:21,690 --> 00:34:23,920
call wowzer iterated?

408
00:34:23,920 --> 00:34:27,170
I'm not aware of a
standard name for that.

409
00:34:27,170 --> 00:34:28,980
Actually, even the
name wowzer somehow

410
00:34:28,980 --> 00:34:32,139
is very common in the
combinatorics community,

411
00:34:32,139 --> 00:34:34,199
but I think most people
outside this community

412
00:34:34,199 --> 00:34:35,510
will not recognize this word.

413
00:34:40,139 --> 00:34:42,389
Any more questions?

414
00:34:42,389 --> 00:34:46,145
So another way it's a step
up in Ackerman hierarchy.

415
00:34:46,145 --> 00:34:48,270
So it's enumerated one,
two, three, four, you know,

416
00:34:48,270 --> 00:34:49,170
if you keep going up.

417
00:34:52,994 --> 00:34:56,340
All right.

418
00:34:56,340 --> 00:35:01,290
Another remark about this
strong regularity lemma

419
00:35:01,290 --> 00:35:04,910
is that it will be
convenient for us-- actually,

420
00:35:04,910 --> 00:35:07,950
some are more essential compared
to our previous applications--

421
00:35:07,950 --> 00:35:09,912
to make the parts equitable.

422
00:35:13,150 --> 00:35:18,710
So P and Q equitable.

423
00:35:18,710 --> 00:35:22,175
And basically, the parts
are such that all the--

424
00:35:22,175 --> 00:35:25,068
the partitions are such that
all the parts have basically

425
00:35:25,068 --> 00:35:26,235
the same number of vertices.

426
00:35:26,235 --> 00:35:30,485
So I won't make it
precise, but you can do it.

427
00:35:30,485 --> 00:35:31,610
It's not too hard to do it.

428
00:35:31,610 --> 00:35:35,180
And you can prove
it similar to how

429
00:35:35,180 --> 00:35:39,030
I described how to modify the
proof of the regularity level.

430
00:35:39,030 --> 00:35:41,240
So I won't belabor
that point, but we'll

431
00:35:41,240 --> 00:35:42,470
use the equitable version.

432
00:35:45,790 --> 00:35:50,760
All right, so how does one
use this regularity lemma?

433
00:35:50,760 --> 00:35:53,460
Let me state a
corollary, and let

434
00:35:53,460 --> 00:35:55,850
me call this a corollary
star because you actually

435
00:35:55,850 --> 00:35:58,430
need to do some work
to get it to follow

436
00:35:58,430 --> 00:35:59,880
from the strong
regularity lemma.

437
00:35:59,880 --> 00:36:01,547
But the corollary is
the version that we

438
00:36:01,547 --> 00:36:06,420
will apply that if you
start with a decreasing

439
00:36:06,420 --> 00:36:17,215
sequence of this epsilon,
then there exists a delta such

440
00:36:17,215 --> 00:36:18,340
that the following is true.

441
00:36:22,650 --> 00:36:41,360
Every n vertex graph has an
equitable vertex partition,

442
00:36:41,360 --> 00:36:51,530
call it i through the
k, and a subset Wi

443
00:36:51,530 --> 00:36:59,580
of each Vi such that the
following properties hold.

444
00:36:59,580 --> 00:37:04,960
First, all the W's
are fairly large.

445
00:37:04,960 --> 00:37:07,070
They're at least
constant proportion

446
00:37:07,070 --> 00:37:08,540
of the total vertex set.

447
00:37:13,040 --> 00:37:22,520
Between every pair of Wi Wj,
it is epsilon sub k regular.

448
00:37:30,890 --> 00:37:32,640
And this is the point
I want to emphasize.

449
00:37:32,640 --> 00:37:35,360
So here there are not you
regular pairs anymore.

450
00:37:35,360 --> 00:37:37,170
So it is every.

451
00:37:41,274 --> 00:37:44,870
So no irregular pairs
between the Wi's,

452
00:37:44,870 --> 00:37:48,740
and also we need
to include the case

453
00:37:48,740 --> 00:37:50,900
when i equals the j, as well.

454
00:37:50,900 --> 00:37:53,495
So each Wi is
regular with itself.

455
00:37:57,060 --> 00:38:05,850
And furthermore, the edge
densities between the V's are

456
00:38:05,850 --> 00:38:11,820
similar to the edge densities
between the corresponding W's.

457
00:38:11,820 --> 00:38:18,060
And here it is for
most pairs for all

458
00:38:18,060 --> 00:38:23,550
but at most epsilon
k square pairs.

459
00:38:30,320 --> 00:38:31,745
Epsilon 0, yeah.

460
00:38:31,745 --> 00:38:32,860
At most epsilon 0.

461
00:38:40,660 --> 00:38:42,405
Any questions about
the statement?

462
00:38:57,440 --> 00:39:00,950
So let me show you
how you could deduce

463
00:39:00,950 --> 00:39:04,520
the corollary from the
strong regularity lemma.

464
00:39:16,950 --> 00:39:18,720
So first, let me
draw your picture.

465
00:39:23,070 --> 00:39:26,346
So here you have a
regularity partition.

466
00:39:31,010 --> 00:39:38,155
And so these are your
V's, and inside each V

467
00:39:38,155 --> 00:39:50,340
I find a W such that
if I look at the edge

468
00:39:50,340 --> 00:39:53,350
sets between pairwise
blue sets, including

469
00:39:53,350 --> 00:39:58,950
the blue sets with themselves,
it is always very regular.

470
00:39:58,950 --> 00:40:04,620
And also, the edge densities
between the blue sets

471
00:40:04,620 --> 00:40:07,590
is mostly very
similar to the edge

472
00:40:07,590 --> 00:40:10,110
density between their
ambient white sets.

473
00:40:20,040 --> 00:40:21,910
OK, so let me say a few words--

474
00:40:21,910 --> 00:40:23,880
I won't go into
too many details--

475
00:40:23,880 --> 00:40:26,280
about how you might
deduce this corollary

476
00:40:26,280 --> 00:40:28,470
from the strong
regularity lemma.

477
00:40:31,280 --> 00:40:33,320
So first let me
do something which

478
00:40:33,320 --> 00:40:39,830
is slightly simpler, which
is to not yet require

479
00:40:39,830 --> 00:40:44,326
that the blue sets, Wi's,
are regular with themselves.

480
00:40:51,480 --> 00:40:59,630
So without requiring this
as regular so we can obtain

481
00:40:59,630 --> 00:41:13,660
the Wi's by picking
a uniform random part

482
00:41:13,660 --> 00:41:27,870
of the final partition,
Q, inside each part of P

483
00:41:27,870 --> 00:41:29,640
in the strong regularity lemma.

484
00:41:35,920 --> 00:41:37,840
So you have the strong
regularity lemma,

485
00:41:37,840 --> 00:41:44,080
which produces for you a
pair of partitions like that.

486
00:41:44,080 --> 00:41:47,020
So it produces for you
a pair of partitions.

487
00:41:47,020 --> 00:41:53,560
And what we will do is to pick
one of these guys as my W,

488
00:41:53,560 --> 00:41:55,048
pick one of these
guys at random,

489
00:41:55,048 --> 00:41:56,590
and pick one of
those guys at random.

490
00:42:00,730 --> 00:42:06,620
Because W is so extremely
regular, most of these pairs

491
00:42:06,620 --> 00:42:09,740
will be regular.

492
00:42:09,740 --> 00:42:13,610
So with high
probability, you will not

493
00:42:13,610 --> 00:42:18,800
encounter any
irregular pairs if you

494
00:42:18,800 --> 00:42:25,570
pick the W's randomly as parts
of Q. So that's the key point.

495
00:42:25,570 --> 00:42:28,840
Here we're using that
Q is extremely regular.

496
00:42:40,930 --> 00:42:47,420
So all the Wi Wj is
regular for all i not equal

497
00:42:47,420 --> 00:42:49,718
to j with high probability.

498
00:42:53,550 --> 00:42:56,490
But the other thing that we
would like is that the edge

499
00:42:56,490 --> 00:43:00,810
densities between the W's
are similar to those between

500
00:43:00,810 --> 00:43:02,740
the V's.

501
00:43:02,740 --> 00:43:06,020
And for that, we will use this
condition about their energies

502
00:43:06,020 --> 00:43:07,550
being very similar
to each other.

503
00:43:10,650 --> 00:43:16,800
So the third
consequence, C, is--

504
00:43:16,800 --> 00:43:22,200
it's a consequence
of the energy bound.

505
00:43:30,820 --> 00:43:34,510
Because recall that in our proof
of the Szemeredi regularity

506
00:43:34,510 --> 00:43:36,730
lemma there was
an interpretation

507
00:43:36,730 --> 00:43:43,860
of the energy as
the second moment

508
00:43:43,860 --> 00:43:47,340
of a certain random
variable which we called z.

509
00:43:51,640 --> 00:43:54,830
And using that interpretation,
I can write down

510
00:43:54,830 --> 00:44:00,320
this expression like that.

511
00:44:00,320 --> 00:44:03,860
We are here assuming
for simplicity

512
00:44:03,860 --> 00:44:08,030
that Q is completely
equitable, so all the parts

513
00:44:08,030 --> 00:44:09,650
have exactly the same size.

514
00:44:09,650 --> 00:44:15,740
Z of Q is defined to be the
edge density between Vi and Vj

515
00:44:15,740 --> 00:44:21,010
for random ij.

516
00:44:21,010 --> 00:44:23,770
So this is a random variable z.

517
00:44:23,770 --> 00:44:28,410
So you pick pair
of parts uniformly,

518
00:44:28,410 --> 00:44:31,110
or maybe with some weights
if they're not exactly equal.

519
00:44:31,110 --> 00:44:34,480
And you evaluate
the edge density.

520
00:44:34,480 --> 00:44:37,650
So this energy difference
is the difference

521
00:44:37,650 --> 00:44:39,330
between the second moments.

522
00:44:39,330 --> 00:44:46,710
And because Q is
a refinement of P,

523
00:44:46,710 --> 00:44:55,870
it is the case that this
difference of L2 norms

524
00:44:55,870 --> 00:45:00,860
is equal to the second
moment of the difference

525
00:45:00,860 --> 00:45:02,480
of the random variables.

526
00:45:02,480 --> 00:45:04,760
So we saw a version
of this earlier

527
00:45:04,760 --> 00:45:07,910
when we were discussing
variance in the context

528
00:45:07,910 --> 00:45:10,520
of the proof of the
similar irregularity lemma.

529
00:45:10,520 --> 00:45:11,810
Here it's basically the same.

530
00:45:11,810 --> 00:45:16,430
You can either look at this
inequality part by part of V,

531
00:45:16,430 --> 00:45:21,050
or if you like to be
a bit more abstract

532
00:45:21,050 --> 00:45:24,170
then this is actually a
case of Pythagorean theorem.

533
00:45:29,910 --> 00:45:34,350
If you view these as vectors
in a certain vector space,

534
00:45:34,350 --> 00:45:36,100
then you have some
orthogonality.

535
00:45:36,100 --> 00:45:40,378
So you have this sum
of squares identity.

536
00:45:45,860 --> 00:45:47,792
Where does part A come from?

537
00:45:47,792 --> 00:45:52,340
So part A, we want the parts,
that Wi's to be not too small,

538
00:45:52,340 --> 00:46:15,561
but that comes from a bound
on the number of parts of Q.

539
00:46:15,561 --> 00:46:18,810
So so far this more or
less proves the corollary

540
00:46:18,810 --> 00:46:23,050
except for that we
simplified our lives

541
00:46:23,050 --> 00:46:29,680
by requiring just that the i
not equal to j, the Vi Vj's are

542
00:46:29,680 --> 00:46:31,100
regular.

543
00:46:31,100 --> 00:46:33,800
But in the statement
up there, we also want

544
00:46:33,800 --> 00:46:37,650
the Vi's-- so the Wi's ice to
be regular with themselves,

545
00:46:37,650 --> 00:46:41,440
which will be important
for application.

546
00:46:41,440 --> 00:46:45,670
So I won't explain how to do
that, and part of the reason

547
00:46:45,670 --> 00:46:49,170
is that this is also one
of your homework problems.

548
00:46:49,170 --> 00:46:51,920
So in one of the homework
problems problem set 3,

549
00:46:51,920 --> 00:46:55,430
you were asked to prove
that every graph has

550
00:46:55,430 --> 00:47:01,100
a subset of vertices that is of
least constant proportion such

551
00:47:01,100 --> 00:47:04,880
that it is regular with itself.

552
00:47:04,880 --> 00:47:06,770
And the methods
you use there will

553
00:47:06,770 --> 00:47:12,210
be applicable to handle the
situation over here, as well.

554
00:47:12,210 --> 00:47:15,030
So putting all of these
ingredients together,

555
00:47:15,030 --> 00:47:20,020
we get the corollary whereby
you have this picture,

556
00:47:20,020 --> 00:47:21,610
you have this partition.

557
00:47:21,610 --> 00:47:24,190
I don't even require
the Vi's to be regular.

558
00:47:24,190 --> 00:47:25,640
That doesn't matter anymore.

559
00:47:25,640 --> 00:47:29,230
All that matters is that between
the Wi's they are very regular,

560
00:47:29,230 --> 00:47:34,450
and that there are no irregular
parts between these Wi's.

561
00:47:34,450 --> 00:47:41,610
And now we'll be able to go back
to the induced graph removal

562
00:47:41,610 --> 00:47:46,920
lemma where previously we had
an issue with the existence

563
00:47:46,920 --> 00:47:51,880
of irregular pairs in the use of
Szemeredi regularity partition,

564
00:47:51,880 --> 00:47:55,250
and now we have a tool
to get around that.

565
00:47:55,250 --> 00:48:00,300
So next we will see how
to execute this proof,

566
00:48:00,300 --> 00:48:03,840
but at this point hopefully
you already see an outline.

567
00:48:03,840 --> 00:48:10,678
Because you no longer need to
worry about this thing here.

568
00:48:10,678 --> 00:48:11,720
Let's take a quick break.

569
00:48:14,760 --> 00:48:15,830
Any questions so far?

570
00:48:20,600 --> 00:48:21,480
Yes?

571
00:48:21,480 --> 00:48:31,540
AUDIENCE: Why are we
able to [INAUDIBLE]

572
00:48:31,540 --> 00:48:33,040
YUFEI ZHAO: OK, so
the question was,

573
00:48:33,040 --> 00:48:37,170
there was a step where we were
looking at some expectations

574
00:48:37,170 --> 00:48:39,300
of squares.

575
00:48:39,300 --> 00:48:43,670
And so why was
that identity true?

576
00:48:43,670 --> 00:48:46,250
So if you look back to the
proof of Szemeredi's regularity

577
00:48:46,250 --> 00:48:48,932
lemma, we already saw an
instance of that inequality

578
00:48:48,932 --> 00:48:50,390
in the computation
of the variance.

579
00:48:57,860 --> 00:49:01,270
So you know that the
variance of x, on one

580
00:49:01,270 --> 00:49:10,470
hand it is equal to where
mu is the mean of x.

581
00:49:10,470 --> 00:49:15,670
And on the other hand, it
is equal to this quantity.

582
00:49:20,410 --> 00:49:23,360
So you agree with this formula?

583
00:49:23,360 --> 00:49:28,360
And you can expand it to
prove it, and the thing that--

584
00:49:28,360 --> 00:49:30,990
the question that you
raised basically you

585
00:49:30,990 --> 00:49:34,680
can prove by looking at
this formula part by part.

586
00:49:39,250 --> 00:49:40,602
Any more questions?

587
00:49:49,760 --> 00:49:54,350
So let's now prove the
induced graph removal lemma.

588
00:49:54,350 --> 00:49:57,110
And we'll follow the
regularity partition,

589
00:49:57,110 --> 00:49:59,510
but with a small
twist that Instead

590
00:49:59,510 --> 00:50:01,850
of using Szemeredi's
regularity lemma,

591
00:50:01,850 --> 00:50:03,740
we will use that
corollary up there.

592
00:50:11,560 --> 00:50:14,880
So let's prove the induced
graph removal lemma.

593
00:50:20,820 --> 00:50:21,800
So the three steps.

594
00:50:21,800 --> 00:50:23,451
First, we do partition.

595
00:50:30,050 --> 00:50:33,370
So let's suppose you have a--

596
00:50:36,850 --> 00:50:39,100
so we suppose g is like above.

597
00:50:39,100 --> 00:50:45,200
You have very few
induced copies of H.

598
00:50:45,200 --> 00:50:51,170
Let's apply the corollary to get
a partition of the vertex set

599
00:50:51,170 --> 00:50:57,660
of g into k parts.

600
00:50:57,660 --> 00:51:04,690
And inside each part
I have a W. Satisfying

601
00:51:04,690 --> 00:51:11,950
the following properties
that each Wi Wj

602
00:51:11,950 --> 00:51:18,517
is regular with the
following parameter which

603
00:51:18,517 --> 00:51:21,100
will come out of later when we
need to use the counting lemma.

604
00:51:21,100 --> 00:51:23,740
But it's some number, but
don't worry too much about it.

605
00:51:27,050 --> 00:51:30,310
So here I'm going to--

606
00:51:30,310 --> 00:51:35,685
so let's say H has
little H vertices.

607
00:51:45,405 --> 00:51:48,060
So between Wi Wj
it is this regular.

608
00:51:48,060 --> 00:51:50,730
So we actually have not
yet used the full strength

609
00:51:50,730 --> 00:51:58,740
of the corollary where I can
make the regularity even depend

610
00:51:58,740 --> 00:51:59,520
on k.

611
00:51:59,520 --> 00:52:01,650
So we will not need
that here, but we'll

612
00:52:01,650 --> 00:52:04,090
need it in a later application.

613
00:52:04,090 --> 00:52:11,120
So the exponent is little H.

614
00:52:11,120 --> 00:52:18,890
OK, so other properties are that
the densities between the Vi's

615
00:52:18,890 --> 00:52:28,910
and the Wi's do not differ
by more than epsilon over 2

616
00:52:28,910 --> 00:52:33,700
for all but a small fraction--

617
00:52:33,700 --> 00:52:36,480
so epsilon k squared over 2--

618
00:52:36,480 --> 00:52:36,980
pairs.

619
00:52:45,150 --> 00:52:53,010
And finally, the sizes of the
Wi's are at least delta 0 times

620
00:52:53,010 --> 00:52:57,010
n where delta 0 depends
only on epsilon.

621
00:53:04,310 --> 00:53:07,600
Epsilon and H.

622
00:53:21,210 --> 00:53:25,320
This is the partition step,
so now let's do the cleaning.

623
00:53:28,210 --> 00:53:34,830
In the cleaning step,
basically we're not going to--

624
00:53:34,830 --> 00:53:37,620
I mean, there is no longer an
issue of irregular pairs if we

625
00:53:37,620 --> 00:53:40,100
only look at the Wi's.

626
00:53:40,100 --> 00:53:43,560
So we just need to think
about the low density pairs

627
00:53:43,560 --> 00:53:46,710
or whatever the
corresponding analog is.

628
00:53:46,710 --> 00:53:52,200
And what happens here is
that for every i less than j,

629
00:53:52,200 --> 00:53:57,790
and crucially including
when i equals to j,

630
00:53:57,790 --> 00:54:06,330
if the edge densities
between the W's is too small

631
00:54:06,330 --> 00:54:18,790
then we remove all
edges between Vi and Vj.

632
00:54:23,070 --> 00:54:31,810
And if the edge density
between the Wi's is too big,

633
00:54:31,810 --> 00:54:35,395
then we remove all edges.

634
00:54:38,280 --> 00:54:41,560
So we add all edges
between Vi and Vj.

635
00:55:00,900 --> 00:55:02,970
How many edges do we end
up adding or removing?

636
00:55:06,560 --> 00:55:21,460
So the total number of edges
added or removed from g is--

637
00:55:21,460 --> 00:55:26,680
in this case, so if
the edges density

638
00:55:26,680 --> 00:55:32,160
in g between the Vi's and
Vj's is also very small,

639
00:55:32,160 --> 00:55:36,110
then you do not remove
very many edges.

640
00:55:36,110 --> 00:55:41,190
But most pairs of Vi and
Vj have that property.

641
00:55:41,190 --> 00:55:43,650
So you tidy up
what kind of errors

642
00:55:43,650 --> 00:55:48,140
you can get from here
and there, and you

643
00:55:48,140 --> 00:55:51,890
find that the total number of
edges that are added or removed

644
00:55:51,890 --> 00:55:59,150
from g is less than, let's
say, epsilon n squared.

645
00:55:59,150 --> 00:56:00,830
Maybe even get an
extra factor of 2,

646
00:56:00,830 --> 00:56:04,310
but you know, upon changing
some constant factors,

647
00:56:04,310 --> 00:56:08,180
it's less than
epsilon n squared.

648
00:56:08,180 --> 00:56:13,470
So this is some small
details you can work out.

649
00:56:13,470 --> 00:56:15,780
Here we're using--
asking, how is

650
00:56:15,780 --> 00:56:19,410
the density between Vi and
Vj related to Wi and Wj?

651
00:56:19,410 --> 00:56:23,540
Well, for most pairs of i
and j they're very similar.

652
00:56:23,540 --> 00:56:26,180
And there's a small fraction
of them that are not similar,

653
00:56:26,180 --> 00:56:32,310
but then you lump everything
in to this bound over here.

654
00:56:40,290 --> 00:56:43,212
So maybe I need to--

655
00:56:43,212 --> 00:56:44,920
let me just put a 2
here just to be safe.

656
00:56:48,690 --> 00:56:50,780
All right.

657
00:56:50,780 --> 00:56:55,590
So we deleted a very
small number of edges,

658
00:56:55,590 --> 00:56:57,750
and now we want to show
that the graph that

659
00:56:57,750 --> 00:57:01,740
has resulted from
this modification

660
00:57:01,740 --> 00:57:05,250
does not have any
induced H sub-graphs.

661
00:57:11,480 --> 00:57:15,110
And the final step
is the counting step.

662
00:57:15,110 --> 00:57:20,960
So suppose there
were any induced

663
00:57:20,960 --> 00:57:26,860
H left after the modification.

664
00:57:26,860 --> 00:57:30,020
So I want to show that, in fact,
there must be a lot of H's--

665
00:57:30,020 --> 00:57:32,160
induced H's originally
in the graph,

666
00:57:32,160 --> 00:57:34,336
thereby contradicting
the hypothesis.

667
00:57:41,690 --> 00:57:45,210
So where does this
induced H sit?

668
00:57:45,210 --> 00:57:55,070
Well, you have the V's,
and inside the V's you have

669
00:57:55,070 --> 00:57:55,570
the W's.

670
00:58:04,170 --> 00:58:13,200
So suppose my H is that
graph for illustration.

671
00:58:13,200 --> 00:58:16,770
And in particular,
I have a non-edge.

672
00:58:16,770 --> 00:58:20,370
So I have an edge, and
I also have a non-edge.

673
00:58:20,370 --> 00:58:22,895
So between these two,
that's the non-edge.

674
00:58:27,950 --> 00:58:34,590
So suppose you find a copy
of H in the cleaned-up graph.

675
00:58:34,590 --> 00:58:36,110
Where can that cleaned up--

676
00:58:36,110 --> 00:58:37,940
this copy of H sit?

677
00:58:37,940 --> 00:58:39,790
Suppose you find it here.

678
00:58:43,440 --> 00:58:52,130
The claim now is that if this
copy of H existed here, then

679
00:58:52,130 --> 00:58:57,050
I must be able to find
many such copies of H

680
00:58:57,050 --> 00:58:59,090
in the corresponding
yellow parts.

681
00:59:01,940 --> 00:59:10,050
Because between the yellow
parts you have regularity,

682
00:59:10,050 --> 00:59:15,450
and you also have the
right kinds of densities.

683
00:59:15,450 --> 00:59:17,710
Because if they didn't have
the right kind of density,

684
00:59:17,710 --> 00:59:19,210
we would have cleaned
it up already.

685
00:59:22,900 --> 00:59:26,000
So that's the ideal.

686
00:59:26,000 --> 00:59:29,720
If you had a copy
of this H somewhere,

687
00:59:29,720 --> 00:59:31,820
then I zoom into
the yellow parts,

688
00:59:31,820 --> 00:59:37,160
zoom into these W's, and I find
lots of copies of H in between

689
00:59:37,160 --> 00:59:39,430
the W's.

690
00:59:39,430 --> 00:59:40,930
So suppose-- let
me write this down.

691
00:59:40,930 --> 00:59:57,690
So suppose the little V's, so
the vertices, lies in the--

692
00:59:57,690 --> 01:00:00,540
so I'm just indexing
where a little v lies.

693
01:00:00,540 --> 01:00:03,225
The little v lies
in big V sub phi

694
01:00:03,225 --> 01:00:14,600
V for some phi which since the
vertices of H2 went through k.

695
01:00:14,600 --> 01:00:34,650
So now we apply counting lemma
to embed induced copies of H

696
01:00:34,650 --> 01:00:45,710
in g where the vertex
V in H is mapped

697
01:00:45,710 --> 01:00:54,455
to a vertex in the
corresponding W.

698
01:01:00,630 --> 01:01:05,440
And we would like to know that
there are lots of such copies.

699
01:01:05,440 --> 01:01:06,660
And the counting Lemma--

700
01:01:06,660 --> 01:01:11,730
or rather, some variant, but I
should read the counting lemma

701
01:01:11,730 --> 01:01:16,940
that we did last time and view
it as a multi-partite version.

702
01:01:16,940 --> 01:01:21,440
Apply this so far part to part.

703
01:01:21,440 --> 01:01:30,340
So we find that the number
of such induced copies

704
01:01:30,340 --> 01:01:32,180
is within a small error.

705
01:01:36,630 --> 01:01:47,700
So that regularity parameter
multiplied by the number

706
01:01:47,700 --> 01:01:51,690
of edges of H, which we
already canceled out,

707
01:01:51,690 --> 01:02:01,015
multiplied by the
product of these Wi's.

708
01:02:04,490 --> 01:02:08,660
So it's within
this error of what

709
01:02:08,660 --> 01:02:13,250
you would suspect if you
naively multiply the edge

710
01:02:13,250 --> 01:02:16,953
densities together along
with the vertex densities.

711
01:02:29,280 --> 01:02:35,790
So these factors are for the
edges that you want to embed,

712
01:02:35,790 --> 01:02:39,460
and then I also need to
multiply the densities

713
01:02:39,460 --> 01:02:40,510
for the long edges.

714
01:02:51,378 --> 01:02:54,542
So 1 minus these edge densities.

715
01:02:54,542 --> 01:02:56,500
So one way you can think
of it is just consider

716
01:02:56,500 --> 01:02:58,510
the complement in g.

717
01:02:58,510 --> 01:03:02,500
So consider the complement of
g to get this version here.

718
01:03:02,500 --> 01:03:06,240
And then finally, the product
of the vertex set sizes.

719
01:03:18,000 --> 01:03:22,500
And the point is that this
is not a small number.

720
01:03:22,500 --> 01:03:35,870
So hence the number of
induced copies of H in g

721
01:03:35,870 --> 01:03:41,450
is at least on the order of--

722
01:03:41,450 --> 01:03:42,370
well, OK?

723
01:03:44,910 --> 01:03:50,590
So it's at least some
number, which is basically

724
01:03:50,590 --> 01:03:51,880
this guy over here.

725
01:03:51,880 --> 01:03:55,820
So epsilon over 4 raised to--

726
01:03:55,820 --> 01:03:57,820
all of these are constants,
so that's the point.

727
01:03:57,820 --> 01:03:59,740
All of these guys are
constants, minus--

728
01:04:05,050 --> 01:04:08,103
so here is the main term,
and then the error term.

729
01:04:13,790 --> 01:04:17,210
And then the product of
these vertex set sizes,

730
01:04:17,210 --> 01:04:19,680
and we saw that each vertex
set is not too small.

731
01:04:25,840 --> 01:04:30,035
So you have lots of
induced copies of H in g.

732
01:04:30,035 --> 01:04:31,430
Yep?

733
01:04:31,430 --> 01:04:34,600
AUDIENCE: How do you
do in the case where

734
01:04:34,600 --> 01:04:44,740
the density between [INAUDIBLE]

735
01:04:44,740 --> 01:04:47,360
YUFEI ZHAO: OK, so can
you repeat your question?

736
01:04:47,360 --> 01:04:53,045
AUDIENCE: How are you
dealing with the [INAUDIBLE]

737
01:04:53,045 --> 01:04:53,670
YUFEI ZHAO: OK.

738
01:04:53,670 --> 01:04:55,860
So question, how do we deal
with the all but epsilon

739
01:04:55,860 --> 01:04:57,080
over two pairs?

740
01:04:57,080 --> 01:04:59,375
So that comes up in
the cleaning step

741
01:04:59,375 --> 01:05:01,760
in what I wrote
in red in dealing

742
01:05:01,760 --> 01:05:07,780
with the number of total edges
that are added or removed.

743
01:05:07,780 --> 01:05:10,330
So think about how many
edges are added or removed.

744
01:05:10,330 --> 01:05:15,440
In these non-exceptional pairs,
the number of edges that are

745
01:05:15,440 --> 01:05:16,310
added or removed--

746
01:05:26,470 --> 01:05:28,670
let's just think
about added edges.

747
01:05:28,670 --> 01:05:45,930
So if the density of V is
controlled by that of W,

748
01:05:45,930 --> 01:05:48,910
then the number of edges added--

749
01:05:48,910 --> 01:05:50,730
or removed, in that case--

750
01:05:50,730 --> 01:05:56,230
from all such pairs along with--

751
01:05:56,230 --> 01:05:56,730
yeah.

752
01:05:56,730 --> 01:06:01,410
So you have epsilon n
squared edges changed.

753
01:06:06,540 --> 01:06:17,780
On the other hand, if this is
not true then you only have

754
01:06:17,780 --> 01:06:23,000
epsilon k squared such pairs ij
for which this cannot be true.

755
01:06:23,000 --> 01:06:26,380
So you also only have at
most epsilon n squared edges

756
01:06:26,380 --> 01:06:29,850
added or removed in such cases.

757
01:06:29,850 --> 01:06:32,642
That answers your question?

758
01:06:32,642 --> 01:06:33,907
Yes?

759
01:06:33,907 --> 01:06:35,032
AUDIENCE: Is that number 0?

760
01:06:37,755 --> 01:06:39,624
YUFEI ZHAO: Is which number 0?

761
01:06:39,624 --> 01:06:45,320
AUDIENCE: The number of induced
edges for the [INAUDIBLE]

762
01:06:45,320 --> 01:06:46,520
YUFEI ZHAO: The--

763
01:06:46,520 --> 01:06:47,951
AUDIENCE: Yeah, the top board.

764
01:06:47,951 --> 01:06:48,868
YUFEI ZHAO: Top board?

765
01:06:58,840 --> 01:06:59,400
Good.

766
01:06:59,400 --> 01:07:01,695
So asking about this number.

767
01:07:01,695 --> 01:07:02,820
So that should have been 2.

768
01:07:08,580 --> 01:07:10,020
Yes?

769
01:07:10,020 --> 01:07:12,173
AUDIENCE: I don't
see k anywhere.

770
01:07:12,173 --> 01:07:14,840
YUFEI ZHAO: OK, so question, you
don't see k appearing anywhere.

771
01:07:14,840 --> 01:07:16,893
So the k in the
corollary, do you mean?

772
01:07:16,893 --> 01:07:17,820
AUDIENCE: Yeah.

773
01:07:17,820 --> 01:07:19,530
YUFEI ZHAO: So that
hasn't come up yet.

774
01:07:19,530 --> 01:07:24,180
So it comes up implicitly
because we need to lower bound

775
01:07:24,180 --> 01:07:26,715
the sizes of these W's.

776
01:07:31,820 --> 01:07:34,880
So this is partly why we need
a bound on the number of parts,

777
01:07:34,880 --> 01:07:38,150
but it is true that we do
not need epsilon k to depend

778
01:07:38,150 --> 01:07:39,860
on k in this application yet.

779
01:07:39,860 --> 01:07:42,140
I will mention a different
application in the second

780
01:07:42,140 --> 01:07:43,230
where you do need that k.

781
01:07:48,810 --> 01:07:52,920
OK, so the number of induced H
in g is at least this amount.

782
01:07:52,920 --> 01:07:54,080
And that's a small lie.

783
01:07:54,080 --> 01:07:59,780
You need to maybe consider this
is the number of homomorphic.

784
01:07:59,780 --> 01:08:01,730
Well, actually, no, we're OK.

785
01:08:01,730 --> 01:08:02,514
Never mind.

786
01:08:11,120 --> 01:08:19,609
So you can set delta to
be this quantity here,

787
01:08:19,609 --> 01:08:21,210
and then that
finishes the proof.

788
01:08:21,210 --> 01:08:23,500
So you have lots of
induced copies of H

789
01:08:23,500 --> 01:08:27,765
in your graph which
contradicts the hypothesis.

790
01:08:27,765 --> 01:08:30,600
So that finishes the proof
of the induced removal lemma,

791
01:08:30,600 --> 01:08:34,109
and basically the proof is
the same as the usual graph

792
01:08:34,109 --> 01:08:36,510
removal lemma except
that now we need

793
01:08:36,510 --> 01:08:40,260
some strengthened
regularity lemma which

794
01:08:40,260 --> 01:08:43,290
allows us to get rid
of irregular parts

795
01:08:43,290 --> 01:08:44,890
but in a more
restricted setting.

796
01:08:44,890 --> 01:08:47,729
Because we saw you cannot
completely get rid of irregular

797
01:08:47,729 --> 01:08:48,229
parts.

798
01:08:51,720 --> 01:08:52,500
Any questions?

799
01:08:56,109 --> 01:08:56,849
Yes?

800
01:08:56,849 --> 01:09:01,473
AUDIENCE: [INAUDIBLE]

801
01:09:01,473 --> 01:09:03,640
YUFEI ZHAO: So I want to
address the question of why

802
01:09:03,640 --> 01:09:05,950
did I state this
corollary in this more

803
01:09:05,950 --> 01:09:09,520
general form of a decreasing
sequence of epsilons?

804
01:09:09,520 --> 01:09:11,770
So first of all, with
strong regularity lemmas,

805
01:09:11,770 --> 01:09:15,189
the strength is sometimes
always nice to--

806
01:09:15,189 --> 01:09:17,439
it's always nice to state
it with this extra strength.

807
01:09:17,439 --> 01:09:20,080
Because it's the
right way to think

808
01:09:20,080 --> 01:09:22,770
about these types of theorems.

809
01:09:22,770 --> 01:09:25,330
That the regularity
on the parts depends--

810
01:09:25,330 --> 01:09:28,779
you can make it depend
on the number of parts

811
01:09:28,779 --> 01:09:32,170
so that you get much stronger
control on the regularity.

812
01:09:32,170 --> 01:09:33,740
But there are also
some applications.

813
01:09:33,740 --> 01:09:36,970
For example, whether
I will state next,

814
01:09:36,970 --> 01:09:40,210
an application where you do
need that kind of strength.

815
01:09:40,210 --> 01:09:43,899
So here's what's known as
the infinite removal lemma.

816
01:09:47,260 --> 01:09:49,689
Here we have not
just a single pattern

817
01:09:49,689 --> 01:09:52,660
or a finite number of patterns
we want to get rid of.

818
01:09:52,660 --> 01:09:55,330
For now we have
infinitely many patterns.

819
01:09:55,330 --> 01:10:08,880
So for every curly H, which
is a possibly infinite set

820
01:10:08,880 --> 01:10:11,040
of graphs.

821
01:10:11,040 --> 01:10:13,090
The graphs themselves
are always finite,

822
01:10:13,090 --> 01:10:15,480
but this may be
an infinite list.

823
01:10:15,480 --> 01:10:19,830
And an epsilon parameter.

824
01:10:19,830 --> 01:10:26,350
There exists an H0 and a
delta positive parameter

825
01:10:26,350 --> 01:10:38,230
such that every n vertex
graph with at most delta--

826
01:10:38,230 --> 01:10:41,180
so less than delta--

827
01:10:41,180 --> 01:10:52,580
V to the H induced
copies of H for every H

828
01:10:52,580 --> 01:11:00,205
in this family with
fewer than H0 vertices.

829
01:11:03,670 --> 01:11:07,670
So every graph
with this property

830
01:11:07,670 --> 01:11:14,910
can be made curly H free.

831
01:11:14,910 --> 01:11:17,580
So it means free of--

832
01:11:17,580 --> 01:11:32,390
induced curly H free by
adding or removing fewer

833
01:11:32,390 --> 01:11:34,380
than epsilon n squared edges.

834
01:11:38,230 --> 01:11:39,660
So now instead of
a single pattern

835
01:11:39,660 --> 01:11:43,370
you have a possibly infinite
set of induced patterns and a

836
01:11:43,370 --> 01:11:49,700
want to make your
graph curly H free--

837
01:11:49,700 --> 01:11:51,930
induced curly H free.

838
01:11:51,930 --> 01:11:55,910
And the theorem is
that if there exists

839
01:11:55,910 --> 01:12:03,320
some finite bound, H0, such
that if you have few copies--

840
01:12:03,320 --> 01:12:06,460
so for all the patterns
up to that point--

841
01:12:06,460 --> 01:12:08,050
then you can do
what you need to do.

842
01:12:11,050 --> 01:12:13,390
So take some time to even
digest this statement,

843
01:12:13,390 --> 01:12:16,000
but it's somehow infinite
versions-- the correct infinite

844
01:12:16,000 --> 01:12:18,220
version of the
removal lemma if you

845
01:12:18,220 --> 01:12:21,347
have infinitely many patterns
that you need to remove.

846
01:12:21,347 --> 01:12:22,930
And I claim that the
proof is actually

847
01:12:22,930 --> 01:12:25,640
more or less the same proof
as the one that we did here,

848
01:12:25,640 --> 01:12:27,760
except now you need
to take your epsilon

849
01:12:27,760 --> 01:12:31,900
case, as in this
corollary, to depend on k.

850
01:12:31,900 --> 01:12:36,670
You need to in some way look
ahead in this infinite pattern.

851
01:12:36,670 --> 01:12:50,160
So here in proof, this epsilon
k from corollary depends on k.

852
01:12:50,160 --> 01:13:02,790
And also it depends on
your family of patterns H.

853
01:13:02,790 --> 01:13:05,340
Finally, I want to
mention a perspective--

854
01:13:05,340 --> 01:13:09,022
a computer science
perspective on these removal

855
01:13:09,022 --> 01:13:10,730
lemmas that we've been
discussing so far.

856
01:13:14,750 --> 01:13:16,510
And that's in the
context of something

857
01:13:16,510 --> 01:13:17,590
called property testing.

858
01:13:37,830 --> 01:13:43,350
And basically, we would
like an efficient--

859
01:13:43,350 --> 01:13:51,900
efficient meaning fast--
randomized algorithm

860
01:13:51,900 --> 01:14:11,820
to distinguish graphs that
are triangle-free from those

861
01:14:11,820 --> 01:14:15,747
that are epsilon far
from triangle-free.

862
01:14:19,330 --> 01:14:22,370
Where being epsilon far
from triangle-free means

863
01:14:22,370 --> 01:14:32,560
that you need to change more
than epsilon n squared edges

864
01:14:32,560 --> 01:14:45,310
here. n is, as usual, the number
of vertices to make the graph

865
01:14:45,310 --> 01:14:45,940
triangle-free.

866
01:14:45,940 --> 01:14:48,040
So the distance, the
[INAUDIBLE] distance

867
01:14:48,040 --> 01:14:52,750
is more than epsilon away
from being triangle-free.

868
01:14:52,750 --> 01:14:55,450
So somebody gives you
a very large graphing.

869
01:14:55,450 --> 01:14:56,830
n is very large.

870
01:14:56,830 --> 01:15:00,400
You cannot search through
every triple vertices.

871
01:15:00,400 --> 01:15:01,600
That's too expensive.

872
01:15:01,600 --> 01:15:06,280
But you want some way to test
if a graph is triangle-free

873
01:15:06,280 --> 01:15:09,551
versus very far away
from being triangle-free.

874
01:15:14,270 --> 01:15:16,670
So there's a very simple
randomized algorithm

875
01:15:16,670 --> 01:15:24,000
to do this, which is
to just try randomly

876
01:15:24,000 --> 01:15:30,850
sample a random
triple of vertices

877
01:15:30,850 --> 01:15:33,090
and check if it's a triangle.

878
01:15:41,860 --> 01:15:43,710
So you do this.

879
01:15:43,710 --> 01:15:49,330
And just to make our
life a bit more secure,

880
01:15:49,330 --> 01:15:53,320
let's try it some
larger number of times.

881
01:15:53,320 --> 01:15:59,160
So some c of epsilon some
constant number of times.

882
01:15:59,160 --> 01:16:02,890
And if you find a triangle--

883
01:16:02,890 --> 01:16:08,770
so if you don't find
a triangle, then we

884
01:16:08,770 --> 01:16:14,084
return that it's triangle-free.

885
01:16:18,350 --> 01:16:23,444
Otherwise we return that it is
epsilon far from triangle-free.

886
01:16:31,980 --> 01:16:33,296
So that's the algorithm.

887
01:16:36,560 --> 01:16:39,470
So it's a very
intuitive algorithm,

888
01:16:39,470 --> 01:16:42,180
but why does it work?

889
01:16:42,180 --> 01:16:45,050
So we want to know that,
indeed, somebody gives you

890
01:16:45,050 --> 01:16:46,370
one of these two possibilities.

891
01:16:46,370 --> 01:16:50,340
You run that algorithm, you can
succeed with high probability.

892
01:16:50,340 --> 01:16:51,544
Question?

893
01:16:51,544 --> 01:16:54,380
AUDIENCE: [INAUDIBLE]

894
01:16:54,380 --> 01:16:58,500
YUFEI ZHAO: So let's talk
about why this works.

895
01:16:58,500 --> 01:17:02,340
So theorem, for
every epsilon, there

896
01:17:02,340 --> 01:17:08,324
exists a c such that
algorithm succeeds

897
01:17:08,324 --> 01:17:17,760
with probability bigger than
2/3, and 2/3 can be any number.

898
01:17:17,760 --> 01:17:20,000
So any number that you
like because you can always

899
01:17:20,000 --> 01:17:22,901
repeat it to boost that
constant probability.

900
01:17:26,200 --> 01:17:28,320
So there are two cases.

901
01:17:28,320 --> 01:17:35,070
If g is triangle-free,
then it always succeeds.

902
01:17:35,070 --> 01:17:36,720
You'll never find
this triangle, and it

903
01:17:36,720 --> 01:17:38,435
would return triangle-free.

904
01:17:44,260 --> 01:17:55,340
On the other hand, if g is
epsilon far from triangle-free,

905
01:17:55,340 --> 01:18:00,230
then triangle removal
lemma tells us

906
01:18:00,230 --> 01:18:03,913
that g has lots of triangles.

907
01:18:07,230 --> 01:18:08,850
Delta n cubed triangles.

908
01:18:12,100 --> 01:18:23,710
So if we sample c being, let's
say, 1 over delta times--

909
01:18:23,710 --> 01:18:26,770
delta here is a function of
epsilon from the triangle

910
01:18:26,770 --> 01:18:28,640
removal lemma.

911
01:18:28,640 --> 01:18:34,960
So we find that the probability
that the algorithm fails

912
01:18:34,960 --> 01:18:36,790
is at most--

913
01:18:51,710 --> 01:18:53,820
so you have a lot of triangles.

914
01:18:53,820 --> 01:18:56,580
So very likely you will
hit one of these triangles.

915
01:18:56,580 --> 01:19:00,020
So the probability that the
algorithm fails is at most 1

916
01:19:00,020 --> 01:19:05,780
minus delta n cubed divided
by total number of triples

917
01:19:05,780 --> 01:19:08,240
raised to 1 over delta.

918
01:19:08,240 --> 01:19:13,090
And this is 1 minus at
most 1 minus 6 delta raised

919
01:19:13,090 --> 01:19:17,900
to 1 over delta, and it's
at most e to the minus 6.

920
01:19:17,900 --> 01:19:21,890
So less than 1/3 in particular.

921
01:19:21,890 --> 01:19:25,040
So this algorithm succeeds
with high probability.

922
01:19:25,040 --> 01:19:26,850
Now, how big of a c do you need?

923
01:19:26,850 --> 01:19:30,090
Well, that depends on the
triangle removal lemma.

924
01:19:30,090 --> 01:19:32,290
So it's a constant.

925
01:19:32,290 --> 01:19:34,410
So it's a constant,
does not depend

926
01:19:34,410 --> 01:19:37,520
on the size of the graph.

927
01:19:37,520 --> 01:19:39,360
But it's a large
constant, because we

928
01:19:39,360 --> 01:19:41,560
saw in the proof
of regularity lemma

929
01:19:41,560 --> 01:19:42,690
that it can be very large.

930
01:19:46,800 --> 01:19:49,500
But you know, this
theorem here is basically

931
01:19:49,500 --> 01:19:53,230
the same as a triangle
removal lemma.

932
01:19:53,230 --> 01:19:55,630
So it's highly
non-trivial if it's true.

933
01:19:55,630 --> 01:19:59,470
Even though the algorithm is
extremely naive and simple.

934
01:19:59,470 --> 01:20:01,420
I just want to finish
off with one more thing.

935
01:20:01,420 --> 01:20:03,220
Instead of testing
for triangle-freeness,

936
01:20:03,220 --> 01:20:06,100
you can ask what other
properties can you test?

937
01:20:06,100 --> 01:20:12,590
So which graph
properties are testable

938
01:20:12,590 --> 01:20:13,770
in default in that sense?

939
01:20:16,950 --> 01:20:19,040
So distinguishing
something which

940
01:20:19,040 --> 01:20:27,070
has the property, so P versus
epsilon far from this property

941
01:20:27,070 --> 01:20:32,120
P.

942
01:20:32,120 --> 01:20:34,060
And you have this
tester which is you

943
01:20:34,060 --> 01:20:37,020
sample some number of vertices.

944
01:20:37,020 --> 01:20:38,800
So this is called
the oblivious tester.

945
01:20:42,120 --> 01:20:48,760
So you sample k
vertices, and you try

946
01:20:48,760 --> 01:20:52,710
to see if it has that property.

947
01:20:52,710 --> 01:20:56,637
So there's a class of
properties called hereditary.

948
01:21:00,050 --> 01:21:02,560
So hereditary properties
are properties

949
01:21:02,560 --> 01:21:05,710
that are closed under
vertex deletion.

950
01:21:11,320 --> 01:21:13,730
And these properties are--

951
01:21:13,730 --> 01:21:16,660
lots of properties that you're
seeing are of this form.

952
01:21:16,660 --> 01:21:24,780
So for example, being H3 is this
form being planar so this one

953
01:21:24,780 --> 01:21:31,706
being induced H3, so this
one being three-colorable,

954
01:21:31,706 --> 01:21:33,950
being perfect,
they're all examples

955
01:21:33,950 --> 01:21:35,600
of hereditary properties.

956
01:21:35,600 --> 01:21:38,210
Properties that if your
graph is three-colorable,

957
01:21:38,210 --> 01:21:41,690
you take out some vertices,
it's still three-colorable.

958
01:21:41,690 --> 01:21:44,120
And all the
discussions that we've

959
01:21:44,120 --> 01:21:47,960
done so far, in particular
the infinite removal lemma.

960
01:21:47,960 --> 01:21:52,010
If you phrase it in the form
of property testing given

961
01:21:52,010 --> 01:22:03,060
the above discussion, it implies
that every hereditary property

962
01:22:03,060 --> 01:22:03,630
is testable.

963
01:22:06,860 --> 01:22:10,970
In fact, it's testable
in the above sense

964
01:22:10,970 --> 01:22:16,110
with a one-sided error
using an oblivious tester.

965
01:22:16,110 --> 01:22:19,830
One-sided error means that up
there if it's triangle-free,

966
01:22:19,830 --> 01:22:21,260
then it always succeeds.

967
01:22:21,260 --> 01:22:26,110
So here one of the cases
that always succeeds.

968
01:22:26,110 --> 01:22:28,150
And the reason is that
you can characterize

969
01:22:28,150 --> 01:22:37,350
a hereditary property
by a curly H induced H3

970
01:22:37,350 --> 01:22:41,790
for some curly H. Namely,
you're putting everything

971
01:22:41,790 --> 01:22:45,800
into H that do not
have this property.

972
01:22:53,970 --> 01:22:58,580
This is a possibly
infinite set of graphs,

973
01:22:58,580 --> 01:23:00,500
and that completely
characterizes

974
01:23:00,500 --> 01:23:02,700
this hereditary property.

975
01:23:02,700 --> 01:23:05,460
And if you read out the
infinite removal lemma,

976
01:23:05,460 --> 01:23:09,950
it says precisely, using
above this interpretation,

977
01:23:09,950 --> 01:23:14,350
that you have a property
testing algorithm.