1 00:00:16,602 --> 00:00:18,810 YUFEI ZHAO: We've been spending the past few lectures 2 00:00:18,810 --> 00:00:21,750 discussing Szemeredi's Regularity Lemma. 3 00:00:21,750 --> 00:00:24,060 And one of the first applications 4 00:00:24,060 --> 00:00:26,730 that we discussed of the Regularity Lemma 5 00:00:26,730 --> 00:00:29,580 is the triangle removal Lemma. 6 00:00:29,580 --> 00:00:31,740 So today, I want to revisit this topic 7 00:00:31,740 --> 00:00:34,860 and show you a strengthening of the Removal Lemma 8 00:00:34,860 --> 00:00:37,860 for which new regularity techniques are needed. 9 00:00:42,030 --> 00:00:45,090 But first, recall the graph removal Lemma. 10 00:00:57,690 --> 00:01:03,590 In the graph removal Lemma, we have that for every graph H 11 00:01:03,590 --> 00:01:10,040 and epsilon bigger than zero, there exists some delta such 12 00:01:10,040 --> 00:01:24,730 that if an N vertex graph has fewer than delta 13 00:01:24,730 --> 00:01:35,270 and to the number of vertices of H, many copies of H, 14 00:01:35,270 --> 00:01:53,620 then it can be made H-free by removing fewer than epsilon N 15 00:01:53,620 --> 00:01:54,520 squared edges. 16 00:01:58,230 --> 00:02:01,065 Even in the case when H is a triangle, when 17 00:02:01,065 --> 00:02:02,940 this is called a triangle removal Lemma, even 18 00:02:02,940 --> 00:02:06,330 in that case, basically the regularity method 19 00:02:06,330 --> 00:02:08,280 is more or less the only way that we currently 20 00:02:08,280 --> 00:02:10,380 know how to prove this theorem. 21 00:02:10,380 --> 00:02:14,310 So we saw this a few lectures ago. 22 00:02:14,310 --> 00:02:16,320 What I would like to discuss today 23 00:02:16,320 --> 00:02:19,650 is a variant of this result where 24 00:02:19,650 --> 00:02:24,030 instead of considering copies of H, 25 00:02:24,030 --> 00:02:29,460 we're now considering induced copies of H. OK? 26 00:02:29,460 --> 00:02:35,010 So this is the induced graph removal Lemma 27 00:02:35,010 --> 00:02:39,780 where the only difference is that the hypothesis is now 28 00:02:39,780 --> 00:02:44,910 going to be changed to induced copies of H. 29 00:02:44,910 --> 00:02:46,590 And the conclusion is that you can 30 00:02:46,590 --> 00:02:51,330 make the graph induced H-free. 31 00:02:51,330 --> 00:02:52,920 So let me remind you, the difference 32 00:02:52,920 --> 00:02:55,860 between the induced graph subgraph 33 00:02:55,860 --> 00:02:59,040 and the usual subgraph. 34 00:02:59,040 --> 00:03:12,220 So we say that H is an induced copy of G, induced subgraph 35 00:03:12,220 --> 00:03:42,842 of G. If one can obtain H from G by deleting vertices of G. 36 00:03:42,842 --> 00:03:44,300 You're not allowed to delete edges, 37 00:03:44,300 --> 00:03:46,910 but only allowed to delete vertices. 38 00:03:46,910 --> 00:03:55,520 So in other words, the four cycle 39 00:03:55,520 --> 00:04:05,210 is not an induced subgraph because, well, 40 00:04:05,210 --> 00:04:08,270 if you select four vertices, you don't generate this four cycle. 41 00:04:08,270 --> 00:04:09,290 You get extra edges. 42 00:04:09,290 --> 00:04:12,060 So it is a subgraph, but not an induced subgraph. 43 00:04:20,050 --> 00:04:23,290 So it is a theorem, the induced graph removal Lemma. 44 00:04:23,290 --> 00:04:24,790 So it's a theorem, and let's discuss 45 00:04:24,790 --> 00:04:26,220 how we may prove that theorem. 46 00:04:26,220 --> 00:04:26,720 Question. 47 00:04:30,550 --> 00:04:33,000 OK, question is, why is it stronger than the graph removal 48 00:04:33,000 --> 00:04:34,810 lemma? 49 00:04:34,810 --> 00:04:40,060 So it's not stronger, but we'll see the relationship 50 00:04:40,060 --> 00:04:41,270 between the two. 51 00:04:41,270 --> 00:04:46,330 So I claim that it is more difficult to do this theorem. 52 00:04:46,330 --> 00:04:48,940 Any more questions? 53 00:04:48,940 --> 00:04:57,320 So let's pretend for a second that whatever's in here 54 00:04:57,320 --> 00:04:58,605 is not quite true. 55 00:04:58,605 --> 00:04:59,480 So here's an example. 56 00:05:06,110 --> 00:05:14,360 For example, if your H is three isolated vertices. 57 00:05:14,360 --> 00:05:15,890 So what is that saying? 58 00:05:15,890 --> 00:05:18,230 We're looking at copies of H which 59 00:05:18,230 --> 00:05:19,700 are three isolated vertices. 60 00:05:19,700 --> 00:05:26,330 So really you are looking at triangles in g complement. 61 00:05:26,330 --> 00:05:30,040 So this is exactly the triangle removal lemma 62 00:05:30,040 --> 00:05:35,300 in the complement of g, but you can't get rid of these guys 63 00:05:35,300 --> 00:05:36,680 by removing edges. 64 00:05:36,680 --> 00:05:38,360 So we need to make the modification 65 00:05:38,360 --> 00:05:40,580 where instead of removing these edges, 66 00:05:40,580 --> 00:05:48,080 we need to both remove and add by adding or deleting. 67 00:05:51,490 --> 00:05:52,758 So maybe at the same time. 68 00:05:52,758 --> 00:05:55,050 So you're allowed to add some edges, delete some edges. 69 00:05:55,050 --> 00:05:58,788 But in total, you change no more than epsilon n squared edges. 70 00:05:58,788 --> 00:06:01,080 So those are sometimes also known as the edit distance. 71 00:06:05,690 --> 00:06:08,530 You're allowed to change edges. 72 00:06:08,530 --> 00:06:10,810 So you can add edges and delete edges. 73 00:06:16,090 --> 00:06:17,940 Any questions about the statement? 74 00:06:20,880 --> 00:06:24,550 All right, so let's think about how would you 75 00:06:24,550 --> 00:06:27,760 prove this result following the proof that we did 76 00:06:27,760 --> 00:06:30,660 for the triangle removal lemma. 77 00:06:30,660 --> 00:06:33,340 So let's pretend that we go through this proof 78 00:06:33,340 --> 00:06:35,230 and think about what could go wrong. 79 00:06:35,230 --> 00:06:37,840 So remember in the application of the removal lemma, 80 00:06:37,840 --> 00:06:40,250 so the recipe has three steps. 81 00:06:40,250 --> 00:06:42,454 The first step we do a partition. 82 00:06:45,420 --> 00:06:49,220 So we partition applying Szemeredi's regularity lemma 83 00:06:49,220 --> 00:06:50,890 to this partition. 84 00:06:50,890 --> 00:06:56,400 And the second step is do a cleaning, 85 00:06:56,400 --> 00:06:59,820 and the two key things that happen in the cleaning 86 00:06:59,820 --> 00:07:11,130 is we remove low density pairs of parts and irregular pairs. 87 00:07:13,940 --> 00:07:16,380 And the third step we claim that once we 88 00:07:16,380 --> 00:07:19,890 do the cleaning, once we remove those edges, 89 00:07:19,890 --> 00:07:22,230 the resulting graphs should be H3. 90 00:07:22,230 --> 00:07:26,970 Because if we're not H3, then by considering the vertex parts 91 00:07:26,970 --> 00:07:29,460 where H lie and applying the counting lemma, 92 00:07:29,460 --> 00:07:33,430 you can generate many more copies of H. 93 00:07:33,430 --> 00:07:35,550 So these were the three main steps in the proof 94 00:07:35,550 --> 00:07:37,530 of the triangle removal lemma. 95 00:07:37,530 --> 00:07:40,050 So let's see what happens when we 96 00:07:40,050 --> 00:07:43,530 try to apply this strategy to the induced version. 97 00:07:43,530 --> 00:07:47,760 I mean, the partition you still do the regularity partition. 98 00:07:47,760 --> 00:07:50,370 Nothing really changes there. 99 00:07:50,370 --> 00:07:54,420 So let's see in the cleaning step what happens. 100 00:07:54,420 --> 00:07:56,460 For low density pairs-- 101 00:07:56,460 --> 00:07:59,250 well, so now we need to think about not just low density 102 00:07:59,250 --> 00:08:01,700 pairs, but also high density pairs. 103 00:08:01,700 --> 00:08:04,770 Because in the induced, we think about edges and non-edges 104 00:08:04,770 --> 00:08:05,950 at the same time. 105 00:08:05,950 --> 00:08:09,030 So you might think of a strategy which is like the edge 106 00:08:09,030 --> 00:08:11,460 density is less than n. 107 00:08:11,460 --> 00:08:15,480 So less than epsilon, then you remove all those edges. 108 00:08:15,480 --> 00:08:18,450 And if the edge density is bigger than 1 plus epsilon, 109 00:08:18,450 --> 00:08:22,100 then you add all of those edges in. 110 00:08:22,100 --> 00:08:23,860 So this is the natural generalization 111 00:08:23,860 --> 00:08:25,820 of our strategy for triangle removal 112 00:08:25,820 --> 00:08:27,400 lemma for the induced setting. 113 00:08:27,400 --> 00:08:30,570 So so far, everything's still OK. 114 00:08:30,570 --> 00:08:33,490 But now what would you do for the irregular pairs? 115 00:08:37,970 --> 00:08:41,030 That's problematic. 116 00:08:41,030 --> 00:08:43,820 Previously for triangle removal lemma, 117 00:08:43,820 --> 00:08:47,270 we just said if a pair is irregular, get rid of that pair 118 00:08:47,270 --> 00:08:51,440 and it will never show up in the counting stage. 119 00:08:51,440 --> 00:08:54,540 But that strategy no longer works. 120 00:08:54,540 --> 00:08:59,870 Because for example, if your graph H being counted 121 00:08:59,870 --> 00:09:08,630 is this here, you do the regularity partition, 122 00:09:08,630 --> 00:09:12,550 and one of your pairs is irregular. 123 00:09:12,550 --> 00:09:16,240 So you, let's say, get rid of all those edges in between. 124 00:09:16,240 --> 00:09:20,680 Then maybe you have some embedding of H 125 00:09:20,680 --> 00:09:25,905 where you are going to use the removed edges. 126 00:09:30,010 --> 00:09:34,040 And now you don't have a counting lemma. 127 00:09:34,040 --> 00:09:41,070 You cannot say, I found this copy of H in my changed graph. 128 00:09:41,070 --> 00:09:43,650 And by the counting lemma I could get many copies of H 129 00:09:43,650 --> 00:09:47,620 because you have no control over this irregular pair anymore. 130 00:09:47,620 --> 00:09:50,880 So the fact that you have to add and remove 131 00:09:50,880 --> 00:09:52,660 makes it unclear what to do here, 132 00:09:52,660 --> 00:09:54,810 and this is a big obstacle in the application 133 00:09:54,810 --> 00:09:59,370 of the regularity lemma to the induced removal lemma 134 00:09:59,370 --> 00:10:02,310 application. 135 00:10:02,310 --> 00:10:04,350 Any questions about this obstacle? 136 00:10:08,550 --> 00:10:10,940 So make sure you understand why this is an issue. 137 00:10:10,940 --> 00:10:13,840 Otherwise you won't really appreciate 138 00:10:13,840 --> 00:10:16,550 what will happen next. 139 00:10:16,550 --> 00:10:21,630 So somehow we need to find some kind of regularity partition 140 00:10:21,630 --> 00:10:25,170 to get no irregular pairs. 141 00:10:25,170 --> 00:10:29,780 So the question is, is there a way 142 00:10:29,780 --> 00:10:36,580 to partition so that there are no irregular pairs? 143 00:10:42,130 --> 00:10:44,530 For those of you who have started your homework 144 00:10:44,530 --> 00:10:49,150 problem on time, you realize that the answer is no. 145 00:10:49,150 --> 00:10:50,770 So one of the homework problems is 146 00:10:50,770 --> 00:10:53,700 for you to show that for the specific graph known 147 00:10:53,700 --> 00:10:54,700 as the half graph. 148 00:10:58,110 --> 00:11:00,588 So there was an example in homework 149 00:11:00,588 --> 00:11:01,630 that for the half graph-- 150 00:11:08,665 --> 00:11:11,800 so you'll see in the homework what this graph is-- 151 00:11:11,800 --> 00:11:15,050 you cannot partition it so that you get rid of all irregular 152 00:11:15,050 --> 00:11:15,550 pairs. 153 00:11:15,550 --> 00:11:18,340 Irregular pairs are necessary in the statement 154 00:11:18,340 --> 00:11:19,650 of regularity lemma. 155 00:11:22,240 --> 00:11:24,340 So what I want to show you today is a way 156 00:11:24,340 --> 00:11:29,400 to do what's called a strong regularity lemma in which you 157 00:11:29,400 --> 00:11:33,000 obtain a somewhat different consequence that will allow 158 00:11:33,000 --> 00:11:35,438 you to get rid of irregular pairs 159 00:11:35,438 --> 00:11:36,730 in the more restricted setting. 160 00:11:39,730 --> 00:11:42,100 So this is the issue, the irregular pairs. 161 00:11:48,420 --> 00:11:50,790 Before telling you what this regularity lemma is, 162 00:11:50,790 --> 00:11:55,050 I want to give you a small generalization 163 00:11:55,050 --> 00:11:58,350 of the induced graph removal lemma, or just a different way 164 00:11:58,350 --> 00:12:00,540 to think about the statement. 165 00:12:00,540 --> 00:12:03,720 And you can think of it as a colorful version instead 166 00:12:03,720 --> 00:12:08,970 of induced where you have edges and no edges. 167 00:12:08,970 --> 00:12:11,070 You can also have colored edges. 168 00:12:11,070 --> 00:12:14,190 So colorful removal lemma, although this name 169 00:12:14,190 --> 00:12:15,300 is not standard. 170 00:12:21,580 --> 00:12:25,610 So colorful-- so when we talk about graphs, 171 00:12:25,610 --> 00:12:29,040 it's colorful graph removal lemma. 172 00:12:29,040 --> 00:12:37,490 So for every k, r, and epsilon, there exists delta such 173 00:12:37,490 --> 00:12:58,020 that if curly H is a set of r edge of the complete graph 174 00:12:58,020 --> 00:13:00,590 on little k vertices. 175 00:13:00,590 --> 00:13:03,800 So edge coloring just means using r colors 176 00:13:03,800 --> 00:13:04,700 to color the edges. 177 00:13:04,700 --> 00:13:07,530 So there are no restrictions about what are allowed, 178 00:13:07,530 --> 00:13:08,450 what are not allowed. 179 00:13:08,450 --> 00:13:12,293 So just a set of possible r colorings. 180 00:13:12,293 --> 00:13:22,000 Then if the complete graph-- 181 00:13:27,823 --> 00:13:28,990 say it slightly differently. 182 00:13:28,990 --> 00:13:44,650 So then every r edge coloring of the complete graph 183 00:13:44,650 --> 00:14:02,850 on n vertices with fewer than delta fraction of its k vertex 184 00:14:02,850 --> 00:14:20,310 subsets, say k vertex subgraphs, belonging to the script H. 185 00:14:20,310 --> 00:14:34,670 So every such graph can be made curly H free by recoloring, 186 00:14:34,670 --> 00:14:49,710 so using the same r colors, a fewer than epsilon fraction. 187 00:14:49,710 --> 00:15:00,710 So less than epsilon fraction of the edges of this kn. 188 00:15:03,360 --> 00:15:06,540 So in particular, the version that we just stated, 189 00:15:06,540 --> 00:15:16,260 the induced version, so the induced graph removal lemma, 190 00:15:16,260 --> 00:15:31,450 is the same as having two colors and H having exactly one 191 00:15:31,450 --> 00:15:44,620 red-blue coloring of k of the complete graph 192 00:15:44,620 --> 00:15:48,780 on the same number of vertices as H. 193 00:15:48,780 --> 00:15:51,930 So you color red the edges and blue the non-edges, 194 00:15:51,930 --> 00:15:53,650 for instance. 195 00:15:53,650 --> 00:15:58,680 And you're saying, I want to color the big complete graph 196 00:15:58,680 --> 00:16:02,760 with red and blue in such a way that there are very few copies 197 00:16:02,760 --> 00:16:03,900 of that pattern. 198 00:16:03,900 --> 00:16:05,730 So then I can recolor the red and blue 199 00:16:05,730 --> 00:16:09,750 in a small number of places to get rid of all such patterns. 200 00:16:09,750 --> 00:16:11,940 So having a colored pattern somewhere 201 00:16:11,940 --> 00:16:14,610 in your graph in this complete graph coloring 202 00:16:14,610 --> 00:16:19,170 is the same as having an induced subgraph. 203 00:16:19,170 --> 00:16:19,750 Yeah? 204 00:16:19,750 --> 00:16:21,130 AUDIENCE: So after done-- 205 00:16:21,130 --> 00:16:23,976 like the statement after done is a really long sentence. 206 00:16:23,976 --> 00:16:24,560 Can I-- 207 00:16:24,560 --> 00:16:28,020 YUFEI ZHAO: Yeah, OK. 208 00:16:28,020 --> 00:16:37,380 So every r edge coloring of kn with a small number of patterns 209 00:16:37,380 --> 00:16:43,710 can be made h-free by recoloring a small fraction of the edges. 210 00:16:43,710 --> 00:16:45,630 So like in a triangle removal lemma, 211 00:16:45,630 --> 00:16:49,170 every graph with a small number of triangles 212 00:16:49,170 --> 00:16:51,540 can be made triangle-free by removing 213 00:16:51,540 --> 00:16:52,710 a small number of edges. 214 00:16:58,020 --> 00:17:01,710 Any other questions? 215 00:17:01,710 --> 00:17:07,079 So this is a restatement of the induced removal lemma 216 00:17:07,079 --> 00:17:10,349 with a bit more generality. 217 00:17:10,349 --> 00:17:12,960 It's OK if you like this one more or less, 218 00:17:12,960 --> 00:17:15,810 but let's talk about the induced version from now on. 219 00:17:15,810 --> 00:17:18,450 But the same proofs that I will talk about also 220 00:17:18,450 --> 00:17:22,710 applies to this version where you have somewhat more colors. 221 00:17:26,349 --> 00:17:30,940 So the variant of the regularity lemma that we'll need 222 00:17:30,940 --> 00:17:33,370 is known as a strong regularity lemma. 223 00:17:47,320 --> 00:17:49,210 To state the strong regularity lemma, 224 00:17:49,210 --> 00:17:52,420 let me recall a notion that came up in the proof of Szemeredi's 225 00:17:52,420 --> 00:17:54,820 regularity lemma. 226 00:17:54,820 --> 00:17:57,740 And this was the notion of an energy. 227 00:17:57,740 --> 00:18:04,930 So recall that if you have a partition, denoted P. So 228 00:18:04,930 --> 00:18:11,980 if this is a partition of the vertex set of a graph, G, 229 00:18:11,980 --> 00:18:17,770 and here n is the number of vertices, 230 00:18:17,770 --> 00:18:29,390 we defined this notion of energy to be this quantity denoted 231 00:18:29,390 --> 00:18:36,790 q, which is basically a squared mean of the densities 232 00:18:36,790 --> 00:18:42,010 between vertex parts appropriately normalized 233 00:18:42,010 --> 00:18:46,930 if the vertexes do not all have the same size. 234 00:18:52,860 --> 00:18:58,690 In the proof of Szemeredi's regularity lemma, 235 00:18:58,690 --> 00:19:00,760 there was an important energy increment 236 00:19:00,760 --> 00:19:12,250 step which says that if you have some partition p that is not 237 00:19:12,250 --> 00:19:24,318 epsilon regular, then there exists a refinement, Q. 238 00:19:24,318 --> 00:19:29,920 And this refinement has the property that Q has 239 00:19:29,920 --> 00:19:37,440 a small number of pieces, or not too large as a function of P 240 00:19:37,440 --> 00:19:41,170 So it's bounded at least in terms of P. 241 00:19:41,170 --> 00:19:48,370 But also if P is not epsilon regular, then the energy of Q 242 00:19:48,370 --> 00:19:55,160 is significantly larger than the energy of P. So remember, 243 00:19:55,160 --> 00:19:58,580 this was an important step in the proof of regularity lemma. 244 00:20:03,130 --> 00:20:07,380 So to state the strong regularity lemma, 245 00:20:07,380 --> 00:20:09,298 we need that notion of energy. 246 00:20:09,298 --> 00:20:11,340 And the statement of the strong regularity lemma, 247 00:20:11,340 --> 00:20:13,298 if you've never seen this kind of thing before, 248 00:20:13,298 --> 00:20:14,880 will seem a bit intimidating at first 249 00:20:14,880 --> 00:20:19,230 because it involves a whole sequence of parameters. 250 00:20:19,230 --> 00:20:20,640 But we'll get used to it. 251 00:20:23,880 --> 00:20:26,790 So instead of one epsilon parameter, 252 00:20:26,790 --> 00:20:35,945 now you have a sequence of positive epsilons. 253 00:20:35,945 --> 00:20:38,030 And part of the strength of this regularity lemma 254 00:20:38,030 --> 00:20:41,120 is that depending on the application you have in mind, 255 00:20:41,120 --> 00:20:44,570 you can make the sequence go to zero pretty quickly. 256 00:20:44,570 --> 00:20:48,810 Thereby increasing the strength of the regularity lemma. 257 00:20:48,810 --> 00:20:53,750 So there exists some m bound, which depends only 258 00:20:53,750 --> 00:21:07,960 on your epsilons such that every graph has not just one, 259 00:21:07,960 --> 00:21:16,170 but now we're going to get a pair of vertex partitions P 260 00:21:16,170 --> 00:21:22,930 and Q with the following properties. 261 00:21:22,930 --> 00:21:27,180 So first, P refines-- 262 00:21:27,180 --> 00:21:36,990 so Q refines P. So it's a pair of partitions, 263 00:21:36,990 --> 00:21:38,650 one refining the other. 264 00:21:42,400 --> 00:21:45,250 The number of parts of Q is bounded 265 00:21:45,250 --> 00:21:47,260 just like in the usual regularity lemma. 266 00:21:50,070 --> 00:21:54,960 The partition P epsilon 0 regular. 267 00:21:57,720 --> 00:22:02,350 And here is the new part that's the most important one. 268 00:22:02,350 --> 00:22:07,220 Q is very epsilon regular. 269 00:22:07,220 --> 00:22:09,250 So it's not just epsilon 0 regular, 270 00:22:09,250 --> 00:22:13,410 it's epsilon sub the number of parts of P regular. 271 00:22:16,330 --> 00:22:23,380 So you should think of this as extremely regular 272 00:22:23,380 --> 00:22:28,280 because you get to choose what the sequence of epsilon is. 273 00:22:28,280 --> 00:22:32,720 And finally, the energy difference between P and Q 274 00:22:32,720 --> 00:22:33,980 is not too big. 275 00:22:43,690 --> 00:22:46,830 This is the statement of the strong regularity lemma. 276 00:22:46,830 --> 00:22:48,750 It produces for you not just one partition, 277 00:22:48,750 --> 00:22:50,780 but a pair of partitions. 278 00:22:50,780 --> 00:22:52,800 And in this pair of partitions, you 279 00:22:52,800 --> 00:22:57,480 have one partition, P, which is similar to the one 280 00:22:57,480 --> 00:22:59,660 that we obtained from Szemeredi's regularity lemma 281 00:22:59,660 --> 00:23:06,150 is some epsilon 0 regular, but we also get a refinement Q. 282 00:23:06,150 --> 00:23:11,250 And this Q is extremely regular. 283 00:23:11,250 --> 00:23:19,510 So you can think that is P, then Q 284 00:23:19,510 --> 00:23:22,040 is an extremely regular refinement 285 00:23:22,040 --> 00:23:28,210 of P. Any questions about the statement 286 00:23:28,210 --> 00:23:29,960 of the strong regularity lemma? 287 00:23:33,860 --> 00:23:35,610 So the sequence of epsilons gives you 288 00:23:35,610 --> 00:23:39,810 flexibility on how to apply it, but let's see how to prove it. 289 00:23:39,810 --> 00:23:43,800 And the proof is once you understand how this works, 290 00:23:43,800 --> 00:23:47,070 conceptually it's pretty short. 291 00:23:47,070 --> 00:23:50,370 But let me do it slowly so that we can appreciate 292 00:23:50,370 --> 00:23:54,970 this sequence of epsilons. 293 00:23:54,970 --> 00:23:58,270 And the idea is that we will repeatedly apply 294 00:23:58,270 --> 00:23:59,570 Szemeredi's regularity lemma. 295 00:24:06,030 --> 00:24:08,510 So start with the regularity lemma. 296 00:24:08,510 --> 00:24:12,980 We'll apply it repeatedly to generate 297 00:24:12,980 --> 00:24:16,540 a sequence of partitions. 298 00:24:16,540 --> 00:24:21,620 So first, let me remind you a statement of Szemeredi's 299 00:24:21,620 --> 00:24:22,610 regularity lemma. 300 00:24:22,610 --> 00:24:24,320 This is slightly different from the one 301 00:24:24,320 --> 00:24:28,330 that we stated, but comes out of the same proof. 302 00:24:28,330 --> 00:24:33,440 So for every epsilon, there exists some m0 303 00:24:33,440 --> 00:24:41,500 which depends on epsilon such that for every partition P0, 304 00:24:41,500 --> 00:24:46,530 so starting with some partition-- 305 00:24:46,530 --> 00:24:48,510 so actually, let me start with just P. 306 00:24:48,510 --> 00:24:54,920 So if you start with some partition of the vertex set 307 00:24:54,920 --> 00:25:10,590 of g, there exists a refinement P prime of P into at most-- 308 00:25:10,590 --> 00:25:27,520 OK, so the refinement has is such that with each part of P 309 00:25:27,520 --> 00:25:38,960 refined into at most m0 parts such 310 00:25:38,960 --> 00:25:44,650 that P prime, the new partition, is epsilon regular. 311 00:25:52,470 --> 00:25:54,900 So this is a statement of Szemeredi's regularity lemma 312 00:25:54,900 --> 00:25:56,850 that we will apply repeatedly. 313 00:25:56,850 --> 00:25:58,870 So in the version that we've seen before, 314 00:25:58,870 --> 00:26:02,500 we would start with a trivial partition. 315 00:26:02,500 --> 00:26:05,740 And applying refinements repeatedly 316 00:26:05,740 --> 00:26:09,880 in the proof to get a partition into a bounded number of parts 317 00:26:09,880 --> 00:26:14,080 such that the final partition is epsilon regular. 318 00:26:14,080 --> 00:26:17,020 But instead, in the proof of the regularity lemma 319 00:26:17,020 --> 00:26:19,180 if you start with not a trivial partition 320 00:26:19,180 --> 00:26:24,100 but start with a given partition and run this exact same proof, 321 00:26:24,100 --> 00:26:26,380 you find this consequence. 322 00:26:26,380 --> 00:26:29,320 Except now you can guarantee that the final partition 323 00:26:29,320 --> 00:26:31,570 is a refinement of the one that you are given. 324 00:26:35,880 --> 00:26:42,940 So let's apply the statement, and we obtain 325 00:26:42,940 --> 00:26:50,640 a sequence of partitions of g-- 326 00:26:50,640 --> 00:26:52,460 the vertex set of g-- 327 00:26:52,460 --> 00:27:07,250 starting with P0 being a trivial partition, and so on. 328 00:27:07,250 --> 00:27:17,330 Such that each partition, each P sub i plus 1 329 00:27:17,330 --> 00:27:29,940 refines the previous one, and such 330 00:27:29,940 --> 00:27:39,370 that each P sub i plus 1 is epsilon sub a P sub i regular. 331 00:27:42,360 --> 00:27:45,460 So you apply the regularity lemma with parameter 332 00:27:45,460 --> 00:27:49,150 based on the number of parts you currently have. 333 00:27:49,150 --> 00:27:51,010 Applied to the current partition, 334 00:27:51,010 --> 00:27:56,820 you get a finer partition that's extremely regular. 335 00:27:56,820 --> 00:27:58,920 And you also know that the number 336 00:27:58,920 --> 00:28:04,350 of parts of the new partition is bounded in terms 337 00:28:04,350 --> 00:28:06,737 of the previous partition. 338 00:28:23,138 --> 00:28:26,115 All right. 339 00:28:26,115 --> 00:28:26,990 Any questions so far? 340 00:28:31,510 --> 00:28:35,790 So now we get this sequence of partitions. 341 00:28:35,790 --> 00:28:37,320 We can keep on doing this. 342 00:28:37,320 --> 00:28:43,150 So g could be arbitrarily large, but eventually we 343 00:28:43,150 --> 00:28:45,980 will be able to obtain the last condition here, 344 00:28:45,980 --> 00:28:49,170 which is the only thing that is missing so far. 345 00:28:49,170 --> 00:28:59,510 So since the energy is bounded between 0 and 1, 346 00:28:59,510 --> 00:29:06,650 there exists some i at most 1 over epsilon 0 347 00:29:06,650 --> 00:29:15,860 such that the energy goes up by less than epsilon 0. 348 00:29:26,390 --> 00:29:28,700 Because otherwise your energy would exceed 1. 349 00:29:33,370 --> 00:29:38,590 So now let's set P to be this Pi, 350 00:29:38,590 --> 00:29:43,920 and Q to be this, the refinement-- 351 00:29:43,920 --> 00:29:45,670 the next term in the partition. 352 00:29:50,470 --> 00:29:54,680 And what we find is that the-- 353 00:29:54,680 --> 00:29:57,110 so then you have basically all the conditions. 354 00:29:57,110 --> 00:30:03,080 So p is epsilon 0 regular, because it is epsilon-- 355 00:30:03,080 --> 00:30:07,870 the previous term, which is at most epsilon 0 regular. 356 00:30:07,870 --> 00:30:11,970 And you have this one as well, and this one as well. 357 00:30:11,970 --> 00:30:14,700 And we want to show that the number of parts of Q 358 00:30:14,700 --> 00:30:16,960 is bounded. 359 00:30:16,960 --> 00:30:19,310 And that's basically because each time there 360 00:30:19,310 --> 00:30:22,790 was a bound on the number of parts which depends only 361 00:30:22,790 --> 00:30:24,890 on the regularity parameters, and you're 362 00:30:24,890 --> 00:30:27,980 repeating that bound a bounded number of times. 363 00:30:30,520 --> 00:30:44,610 So Q is-- so it's bounded as a function of the sequence 364 00:30:44,610 --> 00:30:47,010 of epsilons-- this infinite vector of epsilons, 365 00:30:47,010 --> 00:30:49,180 but it is a bounded number. 366 00:30:49,180 --> 00:30:54,323 You're only iterating this bound a bounded number of times. 367 00:30:54,323 --> 00:30:55,490 And that finishes the proof. 368 00:31:00,946 --> 00:31:02,363 Any questions? 369 00:31:09,770 --> 00:31:13,180 It may be somewhat mysterious to you right now why we do this, 370 00:31:13,180 --> 00:31:14,840 so we'll get that application a second. 371 00:31:14,840 --> 00:31:18,010 But for now, I just want to comment a bit on the bounds. 372 00:31:26,090 --> 00:31:32,130 Of course, the bounds depend on what epsilon i's do you use. 373 00:31:32,130 --> 00:31:33,900 And typically, you want the epsilon i's 374 00:31:33,900 --> 00:31:37,350 to decrease with more parts that you have. 375 00:31:37,350 --> 00:31:40,020 And with almost all reasonable applications 376 00:31:40,020 --> 00:31:44,450 of this regularity lemma, the strong regularity lemma-- 377 00:31:44,450 --> 00:31:52,960 so for example, with epsilon i being some epsilon divided 378 00:31:52,960 --> 00:31:58,140 by, let's say, i plus 1 or any polynomial of the i's-- 379 00:31:58,140 --> 00:32:02,230 or you can even let it decay quicker than that, as well. 380 00:32:02,230 --> 00:32:06,310 You see, basically what happens is that you 381 00:32:06,310 --> 00:32:09,310 are applying this m0 bound. 382 00:32:15,300 --> 00:32:25,360 m0 applied in succession 1 over epsilon times. 383 00:32:29,250 --> 00:32:33,800 In the regularity lemma, we saw that the m0 that comes out 384 00:32:33,800 --> 00:32:37,130 of Szemeredi's graph regularity lemma is the tower function. 385 00:32:39,872 --> 00:32:43,970 So the tower function, that's a tower of i 386 00:32:43,970 --> 00:32:52,326 is defined to be the exponential function iterated i times. 387 00:32:52,326 --> 00:32:53,920 So of course, I'm being somewhat loose 388 00:32:53,920 --> 00:32:57,250 here with the exact dependence, but you get the idea 389 00:32:57,250 --> 00:33:02,050 that now we want to apply the tower function i times. 390 00:33:15,220 --> 00:33:17,350 Instead of iterating the exponential i times, 391 00:33:17,350 --> 00:33:20,380 now you iterate the tower function i times. 392 00:33:20,380 --> 00:33:23,292 And some of you laughing, this is an incredibly large number. 393 00:33:23,292 --> 00:33:25,000 It's even larger than the tower function. 394 00:33:29,490 --> 00:33:33,530 So in literature, especially around the regularity lemma, 395 00:33:33,530 --> 00:33:37,020 this function where you iterate the tower function i 396 00:33:37,020 --> 00:33:39,873 times is given the name wowzer. 397 00:33:39,873 --> 00:33:47,280 [LAUGHTER] As in, wow, this is a huge number. 398 00:33:47,280 --> 00:33:50,400 So it's a step up in the Ackerman hierarchy. 399 00:33:50,400 --> 00:33:53,480 So if you repeat the wowzer function i times, 400 00:33:53,480 --> 00:33:56,640 you move up one ladder in the Ackerman hierarchy 401 00:33:56,640 --> 00:34:01,060 and this hierarchy of rapidly growing functions. 402 00:34:01,060 --> 00:34:03,560 But in any case, it's bounded and that's good enough for us. 403 00:34:10,695 --> 00:34:11,570 Any questions so far? 404 00:34:16,380 --> 00:34:16,950 Yeah? 405 00:34:16,950 --> 00:34:19,857 AUDIENCE: What do you call like [INAUDIBLE] 406 00:34:19,857 --> 00:34:21,690 YUFEI ZHAO: Yes, so question is, what do you 407 00:34:21,690 --> 00:34:23,920 call wowzer iterated? 408 00:34:23,920 --> 00:34:27,170 I'm not aware of a standard name for that. 409 00:34:27,170 --> 00:34:28,980 Actually, even the name wowzer somehow 410 00:34:28,980 --> 00:34:32,139 is very common in the combinatorics community, 411 00:34:32,139 --> 00:34:34,199 but I think most people outside this community 412 00:34:34,199 --> 00:34:35,510 will not recognize this word. 413 00:34:40,139 --> 00:34:42,389 Any more questions? 414 00:34:42,389 --> 00:34:46,145 So another way it's a step up in Ackerman hierarchy. 415 00:34:46,145 --> 00:34:48,270 So it's enumerated one, two, three, four, you know, 416 00:34:48,270 --> 00:34:49,170 if you keep going up. 417 00:34:52,994 --> 00:34:56,340 All right. 418 00:34:56,340 --> 00:35:01,290 Another remark about this strong regularity lemma 419 00:35:01,290 --> 00:35:04,910 is that it will be convenient for us-- actually, 420 00:35:04,910 --> 00:35:07,950 some are more essential compared to our previous applications-- 421 00:35:07,950 --> 00:35:09,912 to make the parts equitable. 422 00:35:13,150 --> 00:35:18,710 So P and Q equitable. 423 00:35:18,710 --> 00:35:22,175 And basically, the parts are such that all the-- 424 00:35:22,175 --> 00:35:25,068 the partitions are such that all the parts have basically 425 00:35:25,068 --> 00:35:26,235 the same number of vertices. 426 00:35:26,235 --> 00:35:30,485 So I won't make it precise, but you can do it. 427 00:35:30,485 --> 00:35:31,610 It's not too hard to do it. 428 00:35:31,610 --> 00:35:35,180 And you can prove it similar to how 429 00:35:35,180 --> 00:35:39,030 I described how to modify the proof of the regularity level. 430 00:35:39,030 --> 00:35:41,240 So I won't belabor that point, but we'll 431 00:35:41,240 --> 00:35:42,470 use the equitable version. 432 00:35:45,790 --> 00:35:50,760 All right, so how does one use this regularity lemma? 433 00:35:50,760 --> 00:35:53,460 Let me state a corollary, and let 434 00:35:53,460 --> 00:35:55,850 me call this a corollary star because you actually 435 00:35:55,850 --> 00:35:58,430 need to do some work to get it to follow 436 00:35:58,430 --> 00:35:59,880 from the strong regularity lemma. 437 00:35:59,880 --> 00:36:01,547 But the corollary is the version that we 438 00:36:01,547 --> 00:36:06,420 will apply that if you start with a decreasing 439 00:36:06,420 --> 00:36:17,215 sequence of this epsilon, then there exists a delta such 440 00:36:17,215 --> 00:36:18,340 that the following is true. 441 00:36:22,650 --> 00:36:41,360 Every n vertex graph has an equitable vertex partition, 442 00:36:41,360 --> 00:36:51,530 call it i through the k, and a subset Wi 443 00:36:51,530 --> 00:36:59,580 of each Vi such that the following properties hold. 444 00:36:59,580 --> 00:37:04,960 First, all the W's are fairly large. 445 00:37:04,960 --> 00:37:07,070 They're at least constant proportion 446 00:37:07,070 --> 00:37:08,540 of the total vertex set. 447 00:37:13,040 --> 00:37:22,520 Between every pair of Wi Wj, it is epsilon sub k regular. 448 00:37:30,890 --> 00:37:32,640 And this is the point I want to emphasize. 449 00:37:32,640 --> 00:37:35,360 So here there are not you regular pairs anymore. 450 00:37:35,360 --> 00:37:37,170 So it is every. 451 00:37:41,274 --> 00:37:44,870 So no irregular pairs between the Wi's, 452 00:37:44,870 --> 00:37:48,740 and also we need to include the case 453 00:37:48,740 --> 00:37:50,900 when i equals the j, as well. 454 00:37:50,900 --> 00:37:53,495 So each Wi is regular with itself. 455 00:37:57,060 --> 00:38:05,850 And furthermore, the edge densities between the V's are 456 00:38:05,850 --> 00:38:11,820 similar to the edge densities between the corresponding W's. 457 00:38:11,820 --> 00:38:18,060 And here it is for most pairs for all 458 00:38:18,060 --> 00:38:23,550 but at most epsilon k square pairs. 459 00:38:30,320 --> 00:38:31,745 Epsilon 0, yeah. 460 00:38:31,745 --> 00:38:32,860 At most epsilon 0. 461 00:38:40,660 --> 00:38:42,405 Any questions about the statement? 462 00:38:57,440 --> 00:39:00,950 So let me show you how you could deduce 463 00:39:00,950 --> 00:39:04,520 the corollary from the strong regularity lemma. 464 00:39:16,950 --> 00:39:18,720 So first, let me draw your picture. 465 00:39:23,070 --> 00:39:26,346 So here you have a regularity partition. 466 00:39:31,010 --> 00:39:38,155 And so these are your V's, and inside each V 467 00:39:38,155 --> 00:39:50,340 I find a W such that if I look at the edge 468 00:39:50,340 --> 00:39:53,350 sets between pairwise blue sets, including 469 00:39:53,350 --> 00:39:58,950 the blue sets with themselves, it is always very regular. 470 00:39:58,950 --> 00:40:04,620 And also, the edge densities between the blue sets 471 00:40:04,620 --> 00:40:07,590 is mostly very similar to the edge 472 00:40:07,590 --> 00:40:10,110 density between their ambient white sets. 473 00:40:20,040 --> 00:40:21,910 OK, so let me say a few words-- 474 00:40:21,910 --> 00:40:23,880 I won't go into too many details-- 475 00:40:23,880 --> 00:40:26,280 about how you might deduce this corollary 476 00:40:26,280 --> 00:40:28,470 from the strong regularity lemma. 477 00:40:31,280 --> 00:40:33,320 So first let me do something which 478 00:40:33,320 --> 00:40:39,830 is slightly simpler, which is to not yet require 479 00:40:39,830 --> 00:40:44,326 that the blue sets, Wi's, are regular with themselves. 480 00:40:51,480 --> 00:40:59,630 So without requiring this as regular so we can obtain 481 00:40:59,630 --> 00:41:13,660 the Wi's by picking a uniform random part 482 00:41:13,660 --> 00:41:27,870 of the final partition, Q, inside each part of P 483 00:41:27,870 --> 00:41:29,640 in the strong regularity lemma. 484 00:41:35,920 --> 00:41:37,840 So you have the strong regularity lemma, 485 00:41:37,840 --> 00:41:44,080 which produces for you a pair of partitions like that. 486 00:41:44,080 --> 00:41:47,020 So it produces for you a pair of partitions. 487 00:41:47,020 --> 00:41:53,560 And what we will do is to pick one of these guys as my W, 488 00:41:53,560 --> 00:41:55,048 pick one of these guys at random, 489 00:41:55,048 --> 00:41:56,590 and pick one of those guys at random. 490 00:42:00,730 --> 00:42:06,620 Because W is so extremely regular, most of these pairs 491 00:42:06,620 --> 00:42:09,740 will be regular. 492 00:42:09,740 --> 00:42:13,610 So with high probability, you will not 493 00:42:13,610 --> 00:42:18,800 encounter any irregular pairs if you 494 00:42:18,800 --> 00:42:25,570 pick the W's randomly as parts of Q. So that's the key point. 495 00:42:25,570 --> 00:42:28,840 Here we're using that Q is extremely regular. 496 00:42:40,930 --> 00:42:47,420 So all the Wi Wj is regular for all i not equal 497 00:42:47,420 --> 00:42:49,718 to j with high probability. 498 00:42:53,550 --> 00:42:56,490 But the other thing that we would like is that the edge 499 00:42:56,490 --> 00:43:00,810 densities between the W's are similar to those between 500 00:43:00,810 --> 00:43:02,740 the V's. 501 00:43:02,740 --> 00:43:06,020 And for that, we will use this condition about their energies 502 00:43:06,020 --> 00:43:07,550 being very similar to each other. 503 00:43:10,650 --> 00:43:16,800 So the third consequence, C, is-- 504 00:43:16,800 --> 00:43:22,200 it's a consequence of the energy bound. 505 00:43:30,820 --> 00:43:34,510 Because recall that in our proof of the Szemeredi regularity 506 00:43:34,510 --> 00:43:36,730 lemma there was an interpretation 507 00:43:36,730 --> 00:43:43,860 of the energy as the second moment 508 00:43:43,860 --> 00:43:47,340 of a certain random variable which we called z. 509 00:43:51,640 --> 00:43:54,830 And using that interpretation, I can write down 510 00:43:54,830 --> 00:44:00,320 this expression like that. 511 00:44:00,320 --> 00:44:03,860 We are here assuming for simplicity 512 00:44:03,860 --> 00:44:08,030 that Q is completely equitable, so all the parts 513 00:44:08,030 --> 00:44:09,650 have exactly the same size. 514 00:44:09,650 --> 00:44:15,740 Z of Q is defined to be the edge density between Vi and Vj 515 00:44:15,740 --> 00:44:21,010 for random ij. 516 00:44:21,010 --> 00:44:23,770 So this is a random variable z. 517 00:44:23,770 --> 00:44:28,410 So you pick pair of parts uniformly, 518 00:44:28,410 --> 00:44:31,110 or maybe with some weights if they're not exactly equal. 519 00:44:31,110 --> 00:44:34,480 And you evaluate the edge density. 520 00:44:34,480 --> 00:44:37,650 So this energy difference is the difference 521 00:44:37,650 --> 00:44:39,330 between the second moments. 522 00:44:39,330 --> 00:44:46,710 And because Q is a refinement of P, 523 00:44:46,710 --> 00:44:55,870 it is the case that this difference of L2 norms 524 00:44:55,870 --> 00:45:00,860 is equal to the second moment of the difference 525 00:45:00,860 --> 00:45:02,480 of the random variables. 526 00:45:02,480 --> 00:45:04,760 So we saw a version of this earlier 527 00:45:04,760 --> 00:45:07,910 when we were discussing variance in the context 528 00:45:07,910 --> 00:45:10,520 of the proof of the similar irregularity lemma. 529 00:45:10,520 --> 00:45:11,810 Here it's basically the same. 530 00:45:11,810 --> 00:45:16,430 You can either look at this inequality part by part of V, 531 00:45:16,430 --> 00:45:21,050 or if you like to be a bit more abstract 532 00:45:21,050 --> 00:45:24,170 then this is actually a case of Pythagorean theorem. 533 00:45:29,910 --> 00:45:34,350 If you view these as vectors in a certain vector space, 534 00:45:34,350 --> 00:45:36,100 then you have some orthogonality. 535 00:45:36,100 --> 00:45:40,378 So you have this sum of squares identity. 536 00:45:45,860 --> 00:45:47,792 Where does part A come from? 537 00:45:47,792 --> 00:45:52,340 So part A, we want the parts, that Wi's to be not too small, 538 00:45:52,340 --> 00:46:15,561 but that comes from a bound on the number of parts of Q. 539 00:46:15,561 --> 00:46:18,810 So so far this more or less proves the corollary 540 00:46:18,810 --> 00:46:23,050 except for that we simplified our lives 541 00:46:23,050 --> 00:46:29,680 by requiring just that the i not equal to j, the Vi Vj's are 542 00:46:29,680 --> 00:46:31,100 regular. 543 00:46:31,100 --> 00:46:33,800 But in the statement up there, we also want 544 00:46:33,800 --> 00:46:37,650 the Vi's-- so the Wi's ice to be regular with themselves, 545 00:46:37,650 --> 00:46:41,440 which will be important for application. 546 00:46:41,440 --> 00:46:45,670 So I won't explain how to do that, and part of the reason 547 00:46:45,670 --> 00:46:49,170 is that this is also one of your homework problems. 548 00:46:49,170 --> 00:46:51,920 So in one of the homework problems problem set 3, 549 00:46:51,920 --> 00:46:55,430 you were asked to prove that every graph has 550 00:46:55,430 --> 00:47:01,100 a subset of vertices that is of least constant proportion such 551 00:47:01,100 --> 00:47:04,880 that it is regular with itself. 552 00:47:04,880 --> 00:47:06,770 And the methods you use there will 553 00:47:06,770 --> 00:47:12,210 be applicable to handle the situation over here, as well. 554 00:47:12,210 --> 00:47:15,030 So putting all of these ingredients together, 555 00:47:15,030 --> 00:47:20,020 we get the corollary whereby you have this picture, 556 00:47:20,020 --> 00:47:21,610 you have this partition. 557 00:47:21,610 --> 00:47:24,190 I don't even require the Vi's to be regular. 558 00:47:24,190 --> 00:47:25,640 That doesn't matter anymore. 559 00:47:25,640 --> 00:47:29,230 All that matters is that between the Wi's they are very regular, 560 00:47:29,230 --> 00:47:34,450 and that there are no irregular parts between these Wi's. 561 00:47:34,450 --> 00:47:41,610 And now we'll be able to go back to the induced graph removal 562 00:47:41,610 --> 00:47:46,920 lemma where previously we had an issue with the existence 563 00:47:46,920 --> 00:47:51,880 of irregular pairs in the use of Szemeredi regularity partition, 564 00:47:51,880 --> 00:47:55,250 and now we have a tool to get around that. 565 00:47:55,250 --> 00:48:00,300 So next we will see how to execute this proof, 566 00:48:00,300 --> 00:48:03,840 but at this point hopefully you already see an outline. 567 00:48:03,840 --> 00:48:10,678 Because you no longer need to worry about this thing here. 568 00:48:10,678 --> 00:48:11,720 Let's take a quick break. 569 00:48:14,760 --> 00:48:15,830 Any questions so far? 570 00:48:20,600 --> 00:48:21,480 Yes? 571 00:48:21,480 --> 00:48:31,540 AUDIENCE: Why are we able to [INAUDIBLE] 572 00:48:31,540 --> 00:48:33,040 YUFEI ZHAO: OK, so the question was, 573 00:48:33,040 --> 00:48:37,170 there was a step where we were looking at some expectations 574 00:48:37,170 --> 00:48:39,300 of squares. 575 00:48:39,300 --> 00:48:43,670 And so why was that identity true? 576 00:48:43,670 --> 00:48:46,250 So if you look back to the proof of Szemeredi's regularity 577 00:48:46,250 --> 00:48:48,932 lemma, we already saw an instance of that inequality 578 00:48:48,932 --> 00:48:50,390 in the computation of the variance. 579 00:48:57,860 --> 00:49:01,270 So you know that the variance of x, on one 580 00:49:01,270 --> 00:49:10,470 hand it is equal to where mu is the mean of x. 581 00:49:10,470 --> 00:49:15,670 And on the other hand, it is equal to this quantity. 582 00:49:20,410 --> 00:49:23,360 So you agree with this formula? 583 00:49:23,360 --> 00:49:28,360 And you can expand it to prove it, and the thing that-- 584 00:49:28,360 --> 00:49:30,990 the question that you raised basically you 585 00:49:30,990 --> 00:49:34,680 can prove by looking at this formula part by part. 586 00:49:39,250 --> 00:49:40,602 Any more questions? 587 00:49:49,760 --> 00:49:54,350 So let's now prove the induced graph removal lemma. 588 00:49:54,350 --> 00:49:57,110 And we'll follow the regularity partition, 589 00:49:57,110 --> 00:49:59,510 but with a small twist that Instead 590 00:49:59,510 --> 00:50:01,850 of using Szemeredi's regularity lemma, 591 00:50:01,850 --> 00:50:03,740 we will use that corollary up there. 592 00:50:11,560 --> 00:50:14,880 So let's prove the induced graph removal lemma. 593 00:50:20,820 --> 00:50:21,800 So the three steps. 594 00:50:21,800 --> 00:50:23,451 First, we do partition. 595 00:50:30,050 --> 00:50:33,370 So let's suppose you have a-- 596 00:50:36,850 --> 00:50:39,100 so we suppose g is like above. 597 00:50:39,100 --> 00:50:45,200 You have very few induced copies of H. 598 00:50:45,200 --> 00:50:51,170 Let's apply the corollary to get a partition of the vertex set 599 00:50:51,170 --> 00:50:57,660 of g into k parts. 600 00:50:57,660 --> 00:51:04,690 And inside each part I have a W. Satisfying 601 00:51:04,690 --> 00:51:11,950 the following properties that each Wi Wj 602 00:51:11,950 --> 00:51:18,517 is regular with the following parameter which 603 00:51:18,517 --> 00:51:21,100 will come out of later when we need to use the counting lemma. 604 00:51:21,100 --> 00:51:23,740 But it's some number, but don't worry too much about it. 605 00:51:27,050 --> 00:51:30,310 So here I'm going to-- 606 00:51:30,310 --> 00:51:35,685 so let's say H has little H vertices. 607 00:51:45,405 --> 00:51:48,060 So between Wi Wj it is this regular. 608 00:51:48,060 --> 00:51:50,730 So we actually have not yet used the full strength 609 00:51:50,730 --> 00:51:58,740 of the corollary where I can make the regularity even depend 610 00:51:58,740 --> 00:51:59,520 on k. 611 00:51:59,520 --> 00:52:01,650 So we will not need that here, but we'll 612 00:52:01,650 --> 00:52:04,090 need it in a later application. 613 00:52:04,090 --> 00:52:11,120 So the exponent is little H. 614 00:52:11,120 --> 00:52:18,890 OK, so other properties are that the densities between the Vi's 615 00:52:18,890 --> 00:52:28,910 and the Wi's do not differ by more than epsilon over 2 616 00:52:28,910 --> 00:52:33,700 for all but a small fraction-- 617 00:52:33,700 --> 00:52:36,480 so epsilon k squared over 2-- 618 00:52:36,480 --> 00:52:36,980 pairs. 619 00:52:45,150 --> 00:52:53,010 And finally, the sizes of the Wi's are at least delta 0 times 620 00:52:53,010 --> 00:52:57,010 n where delta 0 depends only on epsilon. 621 00:53:04,310 --> 00:53:07,600 Epsilon and H. 622 00:53:21,210 --> 00:53:25,320 This is the partition step, so now let's do the cleaning. 623 00:53:28,210 --> 00:53:34,830 In the cleaning step, basically we're not going to-- 624 00:53:34,830 --> 00:53:37,620 I mean, there is no longer an issue of irregular pairs if we 625 00:53:37,620 --> 00:53:40,100 only look at the Wi's. 626 00:53:40,100 --> 00:53:43,560 So we just need to think about the low density pairs 627 00:53:43,560 --> 00:53:46,710 or whatever the corresponding analog is. 628 00:53:46,710 --> 00:53:52,200 And what happens here is that for every i less than j, 629 00:53:52,200 --> 00:53:57,790 and crucially including when i equals to j, 630 00:53:57,790 --> 00:54:06,330 if the edge densities between the W's is too small 631 00:54:06,330 --> 00:54:18,790 then we remove all edges between Vi and Vj. 632 00:54:23,070 --> 00:54:31,810 And if the edge density between the Wi's is too big, 633 00:54:31,810 --> 00:54:35,395 then we remove all edges. 634 00:54:38,280 --> 00:54:41,560 So we add all edges between Vi and Vj. 635 00:55:00,900 --> 00:55:02,970 How many edges do we end up adding or removing? 636 00:55:06,560 --> 00:55:21,460 So the total number of edges added or removed from g is-- 637 00:55:21,460 --> 00:55:26,680 in this case, so if the edges density 638 00:55:26,680 --> 00:55:32,160 in g between the Vi's and Vj's is also very small, 639 00:55:32,160 --> 00:55:36,110 then you do not remove very many edges. 640 00:55:36,110 --> 00:55:41,190 But most pairs of Vi and Vj have that property. 641 00:55:41,190 --> 00:55:43,650 So you tidy up what kind of errors 642 00:55:43,650 --> 00:55:48,140 you can get from here and there, and you 643 00:55:48,140 --> 00:55:51,890 find that the total number of edges that are added or removed 644 00:55:51,890 --> 00:55:59,150 from g is less than, let's say, epsilon n squared. 645 00:55:59,150 --> 00:56:00,830 Maybe even get an extra factor of 2, 646 00:56:00,830 --> 00:56:04,310 but you know, upon changing some constant factors, 647 00:56:04,310 --> 00:56:08,180 it's less than epsilon n squared. 648 00:56:08,180 --> 00:56:13,470 So this is some small details you can work out. 649 00:56:13,470 --> 00:56:15,780 Here we're using-- asking, how is 650 00:56:15,780 --> 00:56:19,410 the density between Vi and Vj related to Wi and Wj? 651 00:56:19,410 --> 00:56:23,540 Well, for most pairs of i and j they're very similar. 652 00:56:23,540 --> 00:56:26,180 And there's a small fraction of them that are not similar, 653 00:56:26,180 --> 00:56:32,310 but then you lump everything in to this bound over here. 654 00:56:40,290 --> 00:56:43,212 So maybe I need to-- 655 00:56:43,212 --> 00:56:44,920 let me just put a 2 here just to be safe. 656 00:56:48,690 --> 00:56:50,780 All right. 657 00:56:50,780 --> 00:56:55,590 So we deleted a very small number of edges, 658 00:56:55,590 --> 00:56:57,750 and now we want to show that the graph that 659 00:56:57,750 --> 00:57:01,740 has resulted from this modification 660 00:57:01,740 --> 00:57:05,250 does not have any induced H sub-graphs. 661 00:57:11,480 --> 00:57:15,110 And the final step is the counting step. 662 00:57:15,110 --> 00:57:20,960 So suppose there were any induced 663 00:57:20,960 --> 00:57:26,860 H left after the modification. 664 00:57:26,860 --> 00:57:30,020 So I want to show that, in fact, there must be a lot of H's-- 665 00:57:30,020 --> 00:57:32,160 induced H's originally in the graph, 666 00:57:32,160 --> 00:57:34,336 thereby contradicting the hypothesis. 667 00:57:41,690 --> 00:57:45,210 So where does this induced H sit? 668 00:57:45,210 --> 00:57:55,070 Well, you have the V's, and inside the V's you have 669 00:57:55,070 --> 00:57:55,570 the W's. 670 00:58:04,170 --> 00:58:13,200 So suppose my H is that graph for illustration. 671 00:58:13,200 --> 00:58:16,770 And in particular, I have a non-edge. 672 00:58:16,770 --> 00:58:20,370 So I have an edge, and I also have a non-edge. 673 00:58:20,370 --> 00:58:22,895 So between these two, that's the non-edge. 674 00:58:27,950 --> 00:58:34,590 So suppose you find a copy of H in the cleaned-up graph. 675 00:58:34,590 --> 00:58:36,110 Where can that cleaned up-- 676 00:58:36,110 --> 00:58:37,940 this copy of H sit? 677 00:58:37,940 --> 00:58:39,790 Suppose you find it here. 678 00:58:43,440 --> 00:58:52,130 The claim now is that if this copy of H existed here, then 679 00:58:52,130 --> 00:58:57,050 I must be able to find many such copies of H 680 00:58:57,050 --> 00:58:59,090 in the corresponding yellow parts. 681 00:59:01,940 --> 00:59:10,050 Because between the yellow parts you have regularity, 682 00:59:10,050 --> 00:59:15,450 and you also have the right kinds of densities. 683 00:59:15,450 --> 00:59:17,710 Because if they didn't have the right kind of density, 684 00:59:17,710 --> 00:59:19,210 we would have cleaned it up already. 685 00:59:22,900 --> 00:59:26,000 So that's the ideal. 686 00:59:26,000 --> 00:59:29,720 If you had a copy of this H somewhere, 687 00:59:29,720 --> 00:59:31,820 then I zoom into the yellow parts, 688 00:59:31,820 --> 00:59:37,160 zoom into these W's, and I find lots of copies of H in between 689 00:59:37,160 --> 00:59:39,430 the W's. 690 00:59:39,430 --> 00:59:40,930 So suppose-- let me write this down. 691 00:59:40,930 --> 00:59:57,690 So suppose the little V's, so the vertices, lies in the-- 692 00:59:57,690 --> 01:00:00,540 so I'm just indexing where a little v lies. 693 01:00:00,540 --> 01:00:03,225 The little v lies in big V sub phi 694 01:00:03,225 --> 01:00:14,600 V for some phi which since the vertices of H2 went through k. 695 01:00:14,600 --> 01:00:34,650 So now we apply counting lemma to embed induced copies of H 696 01:00:34,650 --> 01:00:45,710 in g where the vertex V in H is mapped 697 01:00:45,710 --> 01:00:54,455 to a vertex in the corresponding W. 698 01:01:00,630 --> 01:01:05,440 And we would like to know that there are lots of such copies. 699 01:01:05,440 --> 01:01:06,660 And the counting Lemma-- 700 01:01:06,660 --> 01:01:11,730 or rather, some variant, but I should read the counting lemma 701 01:01:11,730 --> 01:01:16,940 that we did last time and view it as a multi-partite version. 702 01:01:16,940 --> 01:01:21,440 Apply this so far part to part. 703 01:01:21,440 --> 01:01:30,340 So we find that the number of such induced copies 704 01:01:30,340 --> 01:01:32,180 is within a small error. 705 01:01:36,630 --> 01:01:47,700 So that regularity parameter multiplied by the number 706 01:01:47,700 --> 01:01:51,690 of edges of H, which we already canceled out, 707 01:01:51,690 --> 01:02:01,015 multiplied by the product of these Wi's. 708 01:02:04,490 --> 01:02:08,660 So it's within this error of what 709 01:02:08,660 --> 01:02:13,250 you would suspect if you naively multiply the edge 710 01:02:13,250 --> 01:02:16,953 densities together along with the vertex densities. 711 01:02:29,280 --> 01:02:35,790 So these factors are for the edges that you want to embed, 712 01:02:35,790 --> 01:02:39,460 and then I also need to multiply the densities 713 01:02:39,460 --> 01:02:40,510 for the long edges. 714 01:02:51,378 --> 01:02:54,542 So 1 minus these edge densities. 715 01:02:54,542 --> 01:02:56,500 So one way you can think of it is just consider 716 01:02:56,500 --> 01:02:58,510 the complement in g. 717 01:02:58,510 --> 01:03:02,500 So consider the complement of g to get this version here. 718 01:03:02,500 --> 01:03:06,240 And then finally, the product of the vertex set sizes. 719 01:03:18,000 --> 01:03:22,500 And the point is that this is not a small number. 720 01:03:22,500 --> 01:03:35,870 So hence the number of induced copies of H in g 721 01:03:35,870 --> 01:03:41,450 is at least on the order of-- 722 01:03:41,450 --> 01:03:42,370 well, OK? 723 01:03:44,910 --> 01:03:50,590 So it's at least some number, which is basically 724 01:03:50,590 --> 01:03:51,880 this guy over here. 725 01:03:51,880 --> 01:03:55,820 So epsilon over 4 raised to-- 726 01:03:55,820 --> 01:03:57,820 all of these are constants, so that's the point. 727 01:03:57,820 --> 01:03:59,740 All of these guys are constants, minus-- 728 01:04:05,050 --> 01:04:08,103 so here is the main term, and then the error term. 729 01:04:13,790 --> 01:04:17,210 And then the product of these vertex set sizes, 730 01:04:17,210 --> 01:04:19,680 and we saw that each vertex set is not too small. 731 01:04:25,840 --> 01:04:30,035 So you have lots of induced copies of H in g. 732 01:04:30,035 --> 01:04:31,430 Yep? 733 01:04:31,430 --> 01:04:34,600 AUDIENCE: How do you do in the case where 734 01:04:34,600 --> 01:04:44,740 the density between [INAUDIBLE] 735 01:04:44,740 --> 01:04:47,360 YUFEI ZHAO: OK, so can you repeat your question? 736 01:04:47,360 --> 01:04:53,045 AUDIENCE: How are you dealing with the [INAUDIBLE] 737 01:04:53,045 --> 01:04:53,670 YUFEI ZHAO: OK. 738 01:04:53,670 --> 01:04:55,860 So question, how do we deal with the all but epsilon 739 01:04:55,860 --> 01:04:57,080 over two pairs? 740 01:04:57,080 --> 01:04:59,375 So that comes up in the cleaning step 741 01:04:59,375 --> 01:05:01,760 in what I wrote in red in dealing 742 01:05:01,760 --> 01:05:07,780 with the number of total edges that are added or removed. 743 01:05:07,780 --> 01:05:10,330 So think about how many edges are added or removed. 744 01:05:10,330 --> 01:05:15,440 In these non-exceptional pairs, the number of edges that are 745 01:05:15,440 --> 01:05:16,310 added or removed-- 746 01:05:26,470 --> 01:05:28,670 let's just think about added edges. 747 01:05:28,670 --> 01:05:45,930 So if the density of V is controlled by that of W, 748 01:05:45,930 --> 01:05:48,910 then the number of edges added-- 749 01:05:48,910 --> 01:05:50,730 or removed, in that case-- 750 01:05:50,730 --> 01:05:56,230 from all such pairs along with-- 751 01:05:56,230 --> 01:05:56,730 yeah. 752 01:05:56,730 --> 01:06:01,410 So you have epsilon n squared edges changed. 753 01:06:06,540 --> 01:06:17,780 On the other hand, if this is not true then you only have 754 01:06:17,780 --> 01:06:23,000 epsilon k squared such pairs ij for which this cannot be true. 755 01:06:23,000 --> 01:06:26,380 So you also only have at most epsilon n squared edges 756 01:06:26,380 --> 01:06:29,850 added or removed in such cases. 757 01:06:29,850 --> 01:06:32,642 That answers your question? 758 01:06:32,642 --> 01:06:33,907 Yes? 759 01:06:33,907 --> 01:06:35,032 AUDIENCE: Is that number 0? 760 01:06:37,755 --> 01:06:39,624 YUFEI ZHAO: Is which number 0? 761 01:06:39,624 --> 01:06:45,320 AUDIENCE: The number of induced edges for the [INAUDIBLE] 762 01:06:45,320 --> 01:06:46,520 YUFEI ZHAO: The-- 763 01:06:46,520 --> 01:06:47,951 AUDIENCE: Yeah, the top board. 764 01:06:47,951 --> 01:06:48,868 YUFEI ZHAO: Top board? 765 01:06:58,840 --> 01:06:59,400 Good. 766 01:06:59,400 --> 01:07:01,695 So asking about this number. 767 01:07:01,695 --> 01:07:02,820 So that should have been 2. 768 01:07:08,580 --> 01:07:10,020 Yes? 769 01:07:10,020 --> 01:07:12,173 AUDIENCE: I don't see k anywhere. 770 01:07:12,173 --> 01:07:14,840 YUFEI ZHAO: OK, so question, you don't see k appearing anywhere. 771 01:07:14,840 --> 01:07:16,893 So the k in the corollary, do you mean? 772 01:07:16,893 --> 01:07:17,820 AUDIENCE: Yeah. 773 01:07:17,820 --> 01:07:19,530 YUFEI ZHAO: So that hasn't come up yet. 774 01:07:19,530 --> 01:07:24,180 So it comes up implicitly because we need to lower bound 775 01:07:24,180 --> 01:07:26,715 the sizes of these W's. 776 01:07:31,820 --> 01:07:34,880 So this is partly why we need a bound on the number of parts, 777 01:07:34,880 --> 01:07:38,150 but it is true that we do not need epsilon k to depend 778 01:07:38,150 --> 01:07:39,860 on k in this application yet. 779 01:07:39,860 --> 01:07:42,140 I will mention a different application in the second 780 01:07:42,140 --> 01:07:43,230 where you do need that k. 781 01:07:48,810 --> 01:07:52,920 OK, so the number of induced H in g is at least this amount. 782 01:07:52,920 --> 01:07:54,080 And that's a small lie. 783 01:07:54,080 --> 01:07:59,780 You need to maybe consider this is the number of homomorphic. 784 01:07:59,780 --> 01:08:01,730 Well, actually, no, we're OK. 785 01:08:01,730 --> 01:08:02,514 Never mind. 786 01:08:11,120 --> 01:08:19,609 So you can set delta to be this quantity here, 787 01:08:19,609 --> 01:08:21,210 and then that finishes the proof. 788 01:08:21,210 --> 01:08:23,500 So you have lots of induced copies of H 789 01:08:23,500 --> 01:08:27,765 in your graph which contradicts the hypothesis. 790 01:08:27,765 --> 01:08:30,600 So that finishes the proof of the induced removal lemma, 791 01:08:30,600 --> 01:08:34,109 and basically the proof is the same as the usual graph 792 01:08:34,109 --> 01:08:36,510 removal lemma except that now we need 793 01:08:36,510 --> 01:08:40,260 some strengthened regularity lemma which 794 01:08:40,260 --> 01:08:43,290 allows us to get rid of irregular parts 795 01:08:43,290 --> 01:08:44,890 but in a more restricted setting. 796 01:08:44,890 --> 01:08:47,729 Because we saw you cannot completely get rid of irregular 797 01:08:47,729 --> 01:08:48,229 parts. 798 01:08:51,720 --> 01:08:52,500 Any questions? 799 01:08:56,109 --> 01:08:56,849 Yes? 800 01:08:56,849 --> 01:09:01,473 AUDIENCE: [INAUDIBLE] 801 01:09:01,473 --> 01:09:03,640 YUFEI ZHAO: So I want to address the question of why 802 01:09:03,640 --> 01:09:05,950 did I state this corollary in this more 803 01:09:05,950 --> 01:09:09,520 general form of a decreasing sequence of epsilons? 804 01:09:09,520 --> 01:09:11,770 So first of all, with strong regularity lemmas, 805 01:09:11,770 --> 01:09:15,189 the strength is sometimes always nice to-- 806 01:09:15,189 --> 01:09:17,439 it's always nice to state it with this extra strength. 807 01:09:17,439 --> 01:09:20,080 Because it's the right way to think 808 01:09:20,080 --> 01:09:22,770 about these types of theorems. 809 01:09:22,770 --> 01:09:25,330 That the regularity on the parts depends-- 810 01:09:25,330 --> 01:09:28,779 you can make it depend on the number of parts 811 01:09:28,779 --> 01:09:32,170 so that you get much stronger control on the regularity. 812 01:09:32,170 --> 01:09:33,740 But there are also some applications. 813 01:09:33,740 --> 01:09:36,970 For example, whether I will state next, 814 01:09:36,970 --> 01:09:40,210 an application where you do need that kind of strength. 815 01:09:40,210 --> 01:09:43,899 So here's what's known as the infinite removal lemma. 816 01:09:47,260 --> 01:09:49,689 Here we have not just a single pattern 817 01:09:49,689 --> 01:09:52,660 or a finite number of patterns we want to get rid of. 818 01:09:52,660 --> 01:09:55,330 For now we have infinitely many patterns. 819 01:09:55,330 --> 01:10:08,880 So for every curly H, which is a possibly infinite set 820 01:10:08,880 --> 01:10:11,040 of graphs. 821 01:10:11,040 --> 01:10:13,090 The graphs themselves are always finite, 822 01:10:13,090 --> 01:10:15,480 but this may be an infinite list. 823 01:10:15,480 --> 01:10:19,830 And an epsilon parameter. 824 01:10:19,830 --> 01:10:26,350 There exists an H0 and a delta positive parameter 825 01:10:26,350 --> 01:10:38,230 such that every n vertex graph with at most delta-- 826 01:10:38,230 --> 01:10:41,180 so less than delta-- 827 01:10:41,180 --> 01:10:52,580 V to the H induced copies of H for every H 828 01:10:52,580 --> 01:11:00,205 in this family with fewer than H0 vertices. 829 01:11:03,670 --> 01:11:07,670 So every graph with this property 830 01:11:07,670 --> 01:11:14,910 can be made curly H free. 831 01:11:14,910 --> 01:11:17,580 So it means free of-- 832 01:11:17,580 --> 01:11:32,390 induced curly H free by adding or removing fewer 833 01:11:32,390 --> 01:11:34,380 than epsilon n squared edges. 834 01:11:38,230 --> 01:11:39,660 So now instead of a single pattern 835 01:11:39,660 --> 01:11:43,370 you have a possibly infinite set of induced patterns and a 836 01:11:43,370 --> 01:11:49,700 want to make your graph curly H free-- 837 01:11:49,700 --> 01:11:51,930 induced curly H free. 838 01:11:51,930 --> 01:11:55,910 And the theorem is that if there exists 839 01:11:55,910 --> 01:12:03,320 some finite bound, H0, such that if you have few copies-- 840 01:12:03,320 --> 01:12:06,460 so for all the patterns up to that point-- 841 01:12:06,460 --> 01:12:08,050 then you can do what you need to do. 842 01:12:11,050 --> 01:12:13,390 So take some time to even digest this statement, 843 01:12:13,390 --> 01:12:16,000 but it's somehow infinite versions-- the correct infinite 844 01:12:16,000 --> 01:12:18,220 version of the removal lemma if you 845 01:12:18,220 --> 01:12:21,347 have infinitely many patterns that you need to remove. 846 01:12:21,347 --> 01:12:22,930 And I claim that the proof is actually 847 01:12:22,930 --> 01:12:25,640 more or less the same proof as the one that we did here, 848 01:12:25,640 --> 01:12:27,760 except now you need to take your epsilon 849 01:12:27,760 --> 01:12:31,900 case, as in this corollary, to depend on k. 850 01:12:31,900 --> 01:12:36,670 You need to in some way look ahead in this infinite pattern. 851 01:12:36,670 --> 01:12:50,160 So here in proof, this epsilon k from corollary depends on k. 852 01:12:50,160 --> 01:13:02,790 And also it depends on your family of patterns H. 853 01:13:02,790 --> 01:13:05,340 Finally, I want to mention a perspective-- 854 01:13:05,340 --> 01:13:09,022 a computer science perspective on these removal 855 01:13:09,022 --> 01:13:10,730 lemmas that we've been discussing so far. 856 01:13:14,750 --> 01:13:16,510 And that's in the context of something 857 01:13:16,510 --> 01:13:17,590 called property testing. 858 01:13:37,830 --> 01:13:43,350 And basically, we would like an efficient-- 859 01:13:43,350 --> 01:13:51,900 efficient meaning fast-- randomized algorithm 860 01:13:51,900 --> 01:14:11,820 to distinguish graphs that are triangle-free from those 861 01:14:11,820 --> 01:14:15,747 that are epsilon far from triangle-free. 862 01:14:19,330 --> 01:14:22,370 Where being epsilon far from triangle-free means 863 01:14:22,370 --> 01:14:32,560 that you need to change more than epsilon n squared edges 864 01:14:32,560 --> 01:14:45,310 here. n is, as usual, the number of vertices to make the graph 865 01:14:45,310 --> 01:14:45,940 triangle-free. 866 01:14:45,940 --> 01:14:48,040 So the distance, the [INAUDIBLE] distance 867 01:14:48,040 --> 01:14:52,750 is more than epsilon away from being triangle-free. 868 01:14:52,750 --> 01:14:55,450 So somebody gives you a very large graphing. 869 01:14:55,450 --> 01:14:56,830 n is very large. 870 01:14:56,830 --> 01:15:00,400 You cannot search through every triple vertices. 871 01:15:00,400 --> 01:15:01,600 That's too expensive. 872 01:15:01,600 --> 01:15:06,280 But you want some way to test if a graph is triangle-free 873 01:15:06,280 --> 01:15:09,551 versus very far away from being triangle-free. 874 01:15:14,270 --> 01:15:16,670 So there's a very simple randomized algorithm 875 01:15:16,670 --> 01:15:24,000 to do this, which is to just try randomly 876 01:15:24,000 --> 01:15:30,850 sample a random triple of vertices 877 01:15:30,850 --> 01:15:33,090 and check if it's a triangle. 878 01:15:41,860 --> 01:15:43,710 So you do this. 879 01:15:43,710 --> 01:15:49,330 And just to make our life a bit more secure, 880 01:15:49,330 --> 01:15:53,320 let's try it some larger number of times. 881 01:15:53,320 --> 01:15:59,160 So some c of epsilon some constant number of times. 882 01:15:59,160 --> 01:16:02,890 And if you find a triangle-- 883 01:16:02,890 --> 01:16:08,770 so if you don't find a triangle, then we 884 01:16:08,770 --> 01:16:14,084 return that it's triangle-free. 885 01:16:18,350 --> 01:16:23,444 Otherwise we return that it is epsilon far from triangle-free. 886 01:16:31,980 --> 01:16:33,296 So that's the algorithm. 887 01:16:36,560 --> 01:16:39,470 So it's a very intuitive algorithm, 888 01:16:39,470 --> 01:16:42,180 but why does it work? 889 01:16:42,180 --> 01:16:45,050 So we want to know that, indeed, somebody gives you 890 01:16:45,050 --> 01:16:46,370 one of these two possibilities. 891 01:16:46,370 --> 01:16:50,340 You run that algorithm, you can succeed with high probability. 892 01:16:50,340 --> 01:16:51,544 Question? 893 01:16:51,544 --> 01:16:54,380 AUDIENCE: [INAUDIBLE] 894 01:16:54,380 --> 01:16:58,500 YUFEI ZHAO: So let's talk about why this works. 895 01:16:58,500 --> 01:17:02,340 So theorem, for every epsilon, there 896 01:17:02,340 --> 01:17:08,324 exists a c such that algorithm succeeds 897 01:17:08,324 --> 01:17:17,760 with probability bigger than 2/3, and 2/3 can be any number. 898 01:17:17,760 --> 01:17:20,000 So any number that you like because you can always 899 01:17:20,000 --> 01:17:22,901 repeat it to boost that constant probability. 900 01:17:26,200 --> 01:17:28,320 So there are two cases. 901 01:17:28,320 --> 01:17:35,070 If g is triangle-free, then it always succeeds. 902 01:17:35,070 --> 01:17:36,720 You'll never find this triangle, and it 903 01:17:36,720 --> 01:17:38,435 would return triangle-free. 904 01:17:44,260 --> 01:17:55,340 On the other hand, if g is epsilon far from triangle-free, 905 01:17:55,340 --> 01:18:00,230 then triangle removal lemma tells us 906 01:18:00,230 --> 01:18:03,913 that g has lots of triangles. 907 01:18:07,230 --> 01:18:08,850 Delta n cubed triangles. 908 01:18:12,100 --> 01:18:23,710 So if we sample c being, let's say, 1 over delta times-- 909 01:18:23,710 --> 01:18:26,770 delta here is a function of epsilon from the triangle 910 01:18:26,770 --> 01:18:28,640 removal lemma. 911 01:18:28,640 --> 01:18:34,960 So we find that the probability that the algorithm fails 912 01:18:34,960 --> 01:18:36,790 is at most-- 913 01:18:51,710 --> 01:18:53,820 so you have a lot of triangles. 914 01:18:53,820 --> 01:18:56,580 So very likely you will hit one of these triangles. 915 01:18:56,580 --> 01:19:00,020 So the probability that the algorithm fails is at most 1 916 01:19:00,020 --> 01:19:05,780 minus delta n cubed divided by total number of triples 917 01:19:05,780 --> 01:19:08,240 raised to 1 over delta. 918 01:19:08,240 --> 01:19:13,090 And this is 1 minus at most 1 minus 6 delta raised 919 01:19:13,090 --> 01:19:17,900 to 1 over delta, and it's at most e to the minus 6. 920 01:19:17,900 --> 01:19:21,890 So less than 1/3 in particular. 921 01:19:21,890 --> 01:19:25,040 So this algorithm succeeds with high probability. 922 01:19:25,040 --> 01:19:26,850 Now, how big of a c do you need? 923 01:19:26,850 --> 01:19:30,090 Well, that depends on the triangle removal lemma. 924 01:19:30,090 --> 01:19:32,290 So it's a constant. 925 01:19:32,290 --> 01:19:34,410 So it's a constant, does not depend 926 01:19:34,410 --> 01:19:37,520 on the size of the graph. 927 01:19:37,520 --> 01:19:39,360 But it's a large constant, because we 928 01:19:39,360 --> 01:19:41,560 saw in the proof of regularity lemma 929 01:19:41,560 --> 01:19:42,690 that it can be very large. 930 01:19:46,800 --> 01:19:49,500 But you know, this theorem here is basically 931 01:19:49,500 --> 01:19:53,230 the same as a triangle removal lemma. 932 01:19:53,230 --> 01:19:55,630 So it's highly non-trivial if it's true. 933 01:19:55,630 --> 01:19:59,470 Even though the algorithm is extremely naive and simple. 934 01:19:59,470 --> 01:20:01,420 I just want to finish off with one more thing. 935 01:20:01,420 --> 01:20:03,220 Instead of testing for triangle-freeness, 936 01:20:03,220 --> 01:20:06,100 you can ask what other properties can you test? 937 01:20:06,100 --> 01:20:12,590 So which graph properties are testable 938 01:20:12,590 --> 01:20:13,770 in default in that sense? 939 01:20:16,950 --> 01:20:19,040 So distinguishing something which 940 01:20:19,040 --> 01:20:27,070 has the property, so P versus epsilon far from this property 941 01:20:27,070 --> 01:20:32,120 P. 942 01:20:32,120 --> 01:20:34,060 And you have this tester which is you 943 01:20:34,060 --> 01:20:37,020 sample some number of vertices. 944 01:20:37,020 --> 01:20:38,800 So this is called the oblivious tester. 945 01:20:42,120 --> 01:20:48,760 So you sample k vertices, and you try 946 01:20:48,760 --> 01:20:52,710 to see if it has that property. 947 01:20:52,710 --> 01:20:56,637 So there's a class of properties called hereditary. 948 01:21:00,050 --> 01:21:02,560 So hereditary properties are properties 949 01:21:02,560 --> 01:21:05,710 that are closed under vertex deletion. 950 01:21:11,320 --> 01:21:13,730 And these properties are-- 951 01:21:13,730 --> 01:21:16,660 lots of properties that you're seeing are of this form. 952 01:21:16,660 --> 01:21:24,780 So for example, being H3 is this form being planar so this one 953 01:21:24,780 --> 01:21:31,706 being induced H3, so this one being three-colorable, 954 01:21:31,706 --> 01:21:33,950 being perfect, they're all examples 955 01:21:33,950 --> 01:21:35,600 of hereditary properties. 956 01:21:35,600 --> 01:21:38,210 Properties that if your graph is three-colorable, 957 01:21:38,210 --> 01:21:41,690 you take out some vertices, it's still three-colorable. 958 01:21:41,690 --> 01:21:44,120 And all the discussions that we've 959 01:21:44,120 --> 01:21:47,960 done so far, in particular the infinite removal lemma. 960 01:21:47,960 --> 01:21:52,010 If you phrase it in the form of property testing given 961 01:21:52,010 --> 01:22:03,060 the above discussion, it implies that every hereditary property 962 01:22:03,060 --> 01:22:03,630 is testable. 963 01:22:06,860 --> 01:22:10,970 In fact, it's testable in the above sense 964 01:22:10,970 --> 01:22:16,110 with a one-sided error using an oblivious tester. 965 01:22:16,110 --> 01:22:19,830 One-sided error means that up there if it's triangle-free, 966 01:22:19,830 --> 01:22:21,260 then it always succeeds. 967 01:22:21,260 --> 01:22:26,110 So here one of the cases that always succeeds. 968 01:22:26,110 --> 01:22:28,150 And the reason is that you can characterize 969 01:22:28,150 --> 01:22:37,350 a hereditary property by a curly H induced H3 970 01:22:37,350 --> 01:22:41,790 for some curly H. Namely, you're putting everything 971 01:22:41,790 --> 01:22:45,800 into H that do not have this property. 972 01:22:53,970 --> 01:22:58,580 This is a possibly infinite set of graphs, 973 01:22:58,580 --> 01:23:00,500 and that completely characterizes 974 01:23:00,500 --> 01:23:02,700 this hereditary property. 975 01:23:02,700 --> 01:23:05,460 And if you read out the infinite removal lemma, 976 01:23:05,460 --> 01:23:09,950 it says precisely, using above this interpretation, 977 01:23:09,950 --> 01:23:14,350 that you have a property testing algorithm.