1 00:00:01,550 --> 00:00:03,920 The following content is provided under a Creative 2 00:00:03,920 --> 00:00:05,310 Commons license. 3 00:00:05,310 --> 00:00:07,520 Your support will help MIT OpenCourseWare 4 00:00:07,520 --> 00:00:11,610 continue to offer high quality educational resources for free. 5 00:00:11,610 --> 00:00:14,180 To make a donation or to view additional materials 6 00:00:14,180 --> 00:00:18,140 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,140 --> 00:00:19,026 at ocw.mit.edu. 8 00:00:23,170 --> 00:00:27,920 GILBERT STRANG: OK, so kind of a few things in mind for today. 9 00:00:27,920 --> 00:00:31,340 One is to answer those two questions on the second line. 10 00:00:33,860 --> 00:00:38,080 We found those two formulas on the first line last time, 11 00:00:38,080 --> 00:00:40,850 the derivative of a inverse. 12 00:00:40,850 --> 00:00:43,460 So the derivative of A squared ought to be easy. 13 00:00:43,460 --> 00:00:48,020 But if we can't do that, we need to be sure we can. 14 00:00:48,020 --> 00:00:51,560 And then this was the derivative of an eigenvalue. 15 00:00:51,560 --> 00:00:54,980 And then it's natural to ask about the derivative 16 00:00:54,980 --> 00:00:56,870 of the singular value. 17 00:00:56,870 --> 00:01:00,110 And I had a happy day yesterday in the snow, 18 00:01:00,110 --> 00:01:03,380 realizing that that has a nice formula too. 19 00:01:03,380 --> 00:01:06,230 Of course, I'm not the first. 20 00:01:06,230 --> 00:01:13,260 I'm sure that Wikipedia already knows this formula. 21 00:01:13,260 --> 00:01:14,970 But it was new to me. 22 00:01:14,970 --> 00:01:19,460 And I should say Professor Edelman has carried it 23 00:01:19,460 --> 00:01:21,120 to the second derivative. 24 00:01:21,120 --> 00:01:27,320 Again, not new, but it's more difficult to find 25 00:01:27,320 --> 00:01:31,040 second derivatives, and interesting. 26 00:01:31,040 --> 00:01:34,740 But we'll just stay with first derivatives. 27 00:01:34,740 --> 00:01:39,050 OK, so that's my first item of sort 28 00:01:39,050 --> 00:01:41,260 of business from last time. 29 00:01:41,260 --> 00:01:44,840 And then I'd like to say something about the lab 30 00:01:44,840 --> 00:01:49,160 homeworks and ask your advice and begin to say something 31 00:01:49,160 --> 00:01:51,050 about a project. 32 00:01:51,050 --> 00:01:58,550 And then I will move to these topics in Section 4.4 33 00:01:58,550 --> 00:02:01,430 that you have already. 34 00:02:01,430 --> 00:02:06,860 And you might notice I skipped 4.3. 35 00:02:06,860 --> 00:02:10,610 And the reason is that on Friday, 36 00:02:10,610 --> 00:02:13,070 actually arriving at MIT tomorrow 37 00:02:13,070 --> 00:02:19,910 is Professor Townsend, 4.3 is all about his work. 38 00:02:19,910 --> 00:02:24,320 And he's the best lecturer I know. 39 00:02:24,320 --> 00:02:29,390 He was here as an instructor and did 18.06 and was 40 00:02:29,390 --> 00:02:31,540 a big success. 41 00:02:31,540 --> 00:02:36,110 Actually, he's also just won a prize 42 00:02:36,110 --> 00:02:44,570 for the SIAG/LA, international prize for young investigators, 43 00:02:44,570 --> 00:02:48,800 young faculty in applied linear algebra. 44 00:02:48,800 --> 00:02:53,300 So he goes to Hong Kong to get that prize too. 45 00:02:53,300 --> 00:02:58,700 Anyway, he will be on the videos and in here in class Friday, 46 00:02:58,700 --> 00:03:00,860 if all goes well. 47 00:03:00,860 --> 00:03:06,110 OK, so in order then, the first thing 48 00:03:06,110 --> 00:03:09,260 is the derivative of A squared. 49 00:03:09,260 --> 00:03:16,750 And you might think it's 2A dA dt, but it's not. 50 00:03:16,750 --> 00:03:18,760 And if you realize that it's not, 51 00:03:18,760 --> 00:03:22,480 then you realize what it is, you will get these things right 52 00:03:22,480 --> 00:03:23,750 in the future. 53 00:03:23,750 --> 00:03:32,250 So the answer to the derivative of A squared is not 2A dA dt. 54 00:03:36,340 --> 00:03:37,690 And why isn't it? 55 00:03:37,690 --> 00:03:40,720 And what is the right answer? 56 00:03:40,720 --> 00:03:43,450 So I do that maybe just below here. 57 00:03:50,590 --> 00:03:53,030 Well, I could ask you to guess the right answer, 58 00:03:53,030 --> 00:03:55,660 but why don't we do it systematically. 59 00:03:55,660 --> 00:03:59,800 So how do you find the derivative? 60 00:03:59,800 --> 00:04:01,180 It's a limit. 61 00:04:01,180 --> 00:04:03,700 First you have a delta A, right. 62 00:04:03,700 --> 00:04:05,210 And then you take a limit. 63 00:04:05,210 --> 00:04:15,700 So I look at A plus delta A squared minus A squared. 64 00:04:15,700 --> 00:04:18,279 So that's the change in A squared. 65 00:04:18,279 --> 00:04:21,820 And I divide it by delta t. 66 00:04:21,820 --> 00:04:24,370 And then delta t goes to 0. 67 00:04:24,370 --> 00:04:26,920 So that's the derivative I'm looking for, 68 00:04:26,920 --> 00:04:29,020 the derivative of A squared. 69 00:04:29,020 --> 00:04:34,210 And now, if I write that out, you'll see why this is wrong, 70 00:04:34,210 --> 00:04:38,080 but something very close to it, of course-- can't be far away-- 71 00:04:38,080 --> 00:04:38,980 is right. 72 00:04:38,980 --> 00:04:41,100 So what happens if I write this out? 73 00:04:41,100 --> 00:04:44,770 The A squared will cancel the A squared. 74 00:04:44,770 --> 00:04:45,480 What will I have? 75 00:04:45,480 --> 00:04:49,540 Will I have 2A delta A? 76 00:04:49,540 --> 00:04:52,795 Why don't I write 2A delta A next? 77 00:04:55,900 --> 00:05:01,090 Because when you're squaring a sum of two matrices, 78 00:05:01,090 --> 00:05:11,180 one term is A delta A, and another term is delta A A. 79 00:05:11,180 --> 00:05:15,230 And those are different in general. 80 00:05:15,230 --> 00:05:20,210 And then plus delta A squared. 81 00:05:20,210 --> 00:05:24,680 And now I divide it all by delta t. 82 00:05:24,680 --> 00:05:31,550 So you're now seeing my point that now I let delta t go to 0. 83 00:05:31,550 --> 00:05:34,760 So I'm just doing matrix calculus. 84 00:05:34,760 --> 00:05:40,490 And it's not altogether simple, but if you follow the rules, 85 00:05:40,490 --> 00:05:42,200 it comes out right. 86 00:05:42,200 --> 00:05:48,770 So now what answer do I get as delta t goes to 0? 87 00:05:48,770 --> 00:05:51,950 I get A dA dt-- 88 00:05:51,950 --> 00:05:56,240 that's the definition of the-- 89 00:05:56,240 --> 00:05:58,550 that ratio goes to dA dt. 90 00:05:58,550 --> 00:06:01,460 That's the whole idea of the derivative of A. 91 00:06:01,460 --> 00:06:04,730 And now what's the other term? 92 00:06:04,730 --> 00:06:13,190 It's dA dt A. So it was simply that point 93 00:06:13,190 --> 00:06:20,510 that I wanted you to pick up on, that the derivative might not 94 00:06:20,510 --> 00:06:24,770 commute with A. Matrices don't commute in general. 95 00:06:24,770 --> 00:06:31,415 And so you'll notice that we had a similar expression there. 96 00:06:35,350 --> 00:06:38,480 We had to pay attention to the order of things there. 97 00:06:38,480 --> 00:06:39,770 And now we get it right. 98 00:06:39,770 --> 00:06:51,810 It's not this, but A dA dt plus dA dt A. OK. 99 00:06:51,810 --> 00:06:52,970 Good. 100 00:06:52,970 --> 00:06:54,990 Now, can I do the other one? 101 00:06:54,990 --> 00:07:01,200 Which is a little more serious, but it's a beautiful formula. 102 00:07:01,200 --> 00:07:04,470 And it's parallel to this guy. 103 00:07:04,470 --> 00:07:07,050 You might even guess it. 104 00:07:07,050 --> 00:07:10,050 So I'm looking for the derivative of a singular value. 105 00:07:10,050 --> 00:07:12,780 The matrix A is changing. 106 00:07:12,780 --> 00:07:17,830 dA dt tells me how it's changing at the moment, at the instant. 107 00:07:17,830 --> 00:07:22,400 And I want to know how is sigma changing at that same instant. 108 00:07:22,400 --> 00:07:26,700 And sort of in parallel with this is a nice-- 109 00:07:26,700 --> 00:07:27,870 the nice formula-- 110 00:07:27,870 --> 00:07:36,150 u transpose dA dt v of t. 111 00:07:36,150 --> 00:07:39,030 Boy, you couldn't ask for a nicer formula than that, right? 112 00:07:43,310 --> 00:07:46,440 You remember this is the eigenvector. 113 00:07:46,440 --> 00:07:50,050 And that's the eigenvector of A transpose. 114 00:07:50,050 --> 00:07:52,650 So this is the singular vector of A. 115 00:07:52,650 --> 00:07:56,280 And you could say this is a singular vector of A transpose, 116 00:07:56,280 --> 00:08:04,470 or it's the left singular vector of A. So that's our formula. 117 00:08:04,470 --> 00:08:07,360 And if we can just recall how to prove it, 118 00:08:07,360 --> 00:08:10,260 which is going to be parallel to the proof of that one, 119 00:08:10,260 --> 00:08:14,870 then I'm a happy person and we can get on with life. 120 00:08:14,870 --> 00:08:20,340 So let's remember this, because it will help us 121 00:08:20,340 --> 00:08:22,140 to remember the other one, too. 122 00:08:22,140 --> 00:08:25,620 OK, so where do I start? 123 00:08:25,620 --> 00:08:28,410 I start with a formula for sigma. 124 00:08:28,410 --> 00:08:35,340 So I believe that sigma is u transpose times A times 125 00:08:35,340 --> 00:08:41,780 v. Everybody agree with that? 126 00:08:41,780 --> 00:08:45,530 Everything's depending on t in this formula. 127 00:08:45,530 --> 00:08:48,890 As time changes, everything changes. 128 00:08:48,890 --> 00:08:52,340 But I didn't write in the parentheses, 129 00:08:52,340 --> 00:08:56,390 t three more times. 130 00:08:56,390 --> 00:08:59,500 Can we just remember about the SVD. 131 00:08:59,500 --> 00:09:04,756 The SVD says that A times v equals-- 132 00:09:04,756 --> 00:09:05,710 AUDIENCE: Sigma u. 133 00:09:05,710 --> 00:09:06,770 GILBERT STRANG: Sigma u. 134 00:09:06,770 --> 00:09:08,030 Thanks. 135 00:09:08,030 --> 00:09:09,490 Av is sigma u. 136 00:09:09,490 --> 00:09:10,410 That's the SVD. 137 00:09:13,290 --> 00:09:18,380 So when I put in for Av, I put in sigma u. 138 00:09:18,380 --> 00:09:19,760 Sigma is just a number. 139 00:09:19,760 --> 00:09:21,700 So I bring it outside. 140 00:09:21,700 --> 00:09:24,110 And I'm left with u transpose u. 141 00:09:24,110 --> 00:09:26,970 And what's u transpose u? 142 00:09:26,970 --> 00:09:28,660 1. 143 00:09:28,660 --> 00:09:30,160 So I've used these two facts. 144 00:09:32,980 --> 00:09:35,890 Or I could have gone the other way 145 00:09:35,890 --> 00:09:39,580 and said that this is the transpose of-- 146 00:09:39,580 --> 00:09:43,060 this is A transpose u transpose. 147 00:09:43,060 --> 00:09:49,060 I could look at it that way times v. 148 00:09:49,060 --> 00:09:50,860 And if I look at it that way, I'm 149 00:09:50,860 --> 00:09:53,530 interested in what is A transpose u. 150 00:09:53,530 --> 00:09:57,370 And what is A transpose u? 151 00:09:57,370 --> 00:10:04,900 It's sigma v. And it's transpose, so sigma v 152 00:10:04,900 --> 00:10:07,090 transpose v. 153 00:10:07,090 --> 00:10:09,643 And what is sigma v transpose v? 154 00:10:09,643 --> 00:10:10,310 AUDIENCE: Sigma. 155 00:10:10,310 --> 00:10:12,400 GILBERT STRANG: It's sigma again, of course. 156 00:10:12,400 --> 00:10:14,070 Got sigma both ways. 157 00:10:14,070 --> 00:10:15,520 OK. 158 00:10:15,520 --> 00:10:18,910 Now, I'm ready to take the derivative. 159 00:10:18,910 --> 00:10:23,680 That's the formula I have for sigma, 160 00:10:23,680 --> 00:10:25,540 completely parallel to the formula 161 00:10:25,540 --> 00:10:27,970 that we started out with for lambda. 162 00:10:27,970 --> 00:10:31,900 The eigenvalue was y transpose Ax. 163 00:10:31,900 --> 00:10:34,510 And now we've got u transpose Av. 164 00:10:34,510 --> 00:10:38,170 And, by the way, when would those two formulas 165 00:10:38,170 --> 00:10:40,410 be one and the same? 166 00:10:40,410 --> 00:10:45,060 When does the SVD just tell us nothing new 167 00:10:45,060 --> 00:10:50,520 beyond the eigenvalue stuff for what matrices are the singular 168 00:10:50,520 --> 00:10:53,430 values, the same as the eigenvalues, and singular 169 00:10:53,430 --> 00:10:57,870 vectors the same as this as the eigenvectors for-- 170 00:10:57,870 --> 00:10:58,755 For? 171 00:10:58,755 --> 00:11:00,240 AUDIENCE: Symmetric. 172 00:11:00,240 --> 00:11:02,340 GILBERT STRANG: Symmetric, good. 173 00:11:02,340 --> 00:11:08,490 Symmetric, square, and-- the two words 174 00:11:08,490 --> 00:11:11,820 that I'm always looking for in this course. 175 00:11:11,820 --> 00:11:13,560 If you want an A in this course, just 176 00:11:13,560 --> 00:11:17,970 write down positive definite in the answer to any question, 177 00:11:17,970 --> 00:11:21,510 because sigmas are by definition positive. 178 00:11:21,510 --> 00:11:24,570 And if they're going to agree totally with the lambdas, 179 00:11:24,570 --> 00:11:26,460 then the lambdas have to be positive. 180 00:11:26,460 --> 00:11:30,237 Or could be 0, so positive semidefinite definite 181 00:11:30,237 --> 00:11:31,320 would be the right answer. 182 00:11:31,320 --> 00:11:33,060 Anyway, this is our start. 183 00:11:36,170 --> 00:11:38,460 And what do we do with that formula? 184 00:11:38,460 --> 00:11:42,340 So this was all the same, because v transpose v was 1. 185 00:11:45,870 --> 00:11:48,000 Here I had v transpose v. And that's 1. 186 00:11:48,000 --> 00:11:49,260 So it gave me sigma. 187 00:11:49,260 --> 00:11:50,240 Yeah, good. 188 00:11:50,240 --> 00:11:52,190 Everybody's with us. 189 00:11:52,190 --> 00:11:53,580 OK, what do I do? 190 00:11:53,580 --> 00:11:55,110 Take the derivative. 191 00:11:55,110 --> 00:11:58,160 Takes the derivative of that equation in the box. 192 00:11:58,160 --> 00:12:00,570 It's exactly what I did last time 193 00:12:00,570 --> 00:12:03,480 with the corresponding equation for lambda. 194 00:12:03,480 --> 00:12:04,620 Same thing. 195 00:12:04,620 --> 00:12:07,140 And I'm going to get again-- 196 00:12:07,140 --> 00:12:11,130 it's a product rule, because I have three things multiplied 197 00:12:11,130 --> 00:12:12,760 on the right-hand side. 198 00:12:12,760 --> 00:12:15,700 So I've got three terms from the product rule. 199 00:12:15,700 --> 00:12:21,780 So d sigma dt, coming from the box, 200 00:12:21,780 --> 00:12:35,430 is du transpose dt Av plus u transpose dA dt v 201 00:12:35,430 --> 00:12:41,370 plus the third guy, which will be u transpose A dv dt. 202 00:12:44,530 --> 00:12:46,090 Did I get the three terms there? 203 00:12:46,090 --> 00:12:48,200 Yep. 204 00:12:48,200 --> 00:12:49,980 And which term do I want? 205 00:12:49,980 --> 00:12:54,300 Which term do I believe is going to survive and be the answer? 206 00:12:57,090 --> 00:12:59,620 Well, this is what I'm after. 207 00:12:59,620 --> 00:13:01,750 So it's the middle term. 208 00:13:01,750 --> 00:13:03,220 The middle term is just right. 209 00:13:06,320 --> 00:13:09,770 And the other two terms had better be zero. 210 00:13:09,770 --> 00:13:12,200 So that will be the proof. 211 00:13:12,200 --> 00:13:14,540 The other two terms will be zero. 212 00:13:14,540 --> 00:13:17,840 So can we just take one of those two terms 213 00:13:17,840 --> 00:13:22,100 and show that it's zero like this one? 214 00:13:22,100 --> 00:13:24,140 OK, what have I got here? 215 00:13:24,140 --> 00:13:27,030 I want to know that that term is 0. 216 00:13:27,030 --> 00:13:28,060 So what have I got. 217 00:13:28,060 --> 00:13:37,370 I've got du transpose dt times Av. 218 00:13:37,370 --> 00:13:43,200 And everybody says, OK, in place of Av, write in sigma u. 219 00:13:43,200 --> 00:13:47,760 And sigma's a number, so I don't mind putting it there. 220 00:13:47,760 --> 00:13:53,310 So I've got sigma, a number of times the derivative of u times 221 00:13:53,310 --> 00:13:55,140 u itself, the dot product-- 222 00:13:55,140 --> 00:13:58,410 the derivative of u with dot product with u. 223 00:13:58,410 --> 00:14:00,990 And that equals? 224 00:14:00,990 --> 00:14:05,280 0, I hope, because of this. 225 00:14:08,100 --> 00:14:09,120 Because of that. 226 00:14:12,240 --> 00:14:14,790 This comes from the derivative of that. 227 00:14:17,580 --> 00:14:23,210 But you see, now we've got dot products, ordinary dot 228 00:14:23,210 --> 00:14:26,800 products, and a number on the right-hand side. 229 00:14:26,800 --> 00:14:29,750 We're in dimension 1, you could say. 230 00:14:29,750 --> 00:14:34,300 So this tells me immediately that the derivative 231 00:14:34,300 --> 00:14:44,870 of u with u plus u transpose times the derivative of u 232 00:14:44,870 --> 00:14:51,680 is the derivative of 1, which is 0. 233 00:14:51,680 --> 00:14:56,120 All I'm saying is that these are the same. 234 00:14:56,120 --> 00:15:01,535 You know, vectors, x transpose y is the same as y transpose 235 00:15:01,535 --> 00:15:04,460 x when I'm talking about real numbers. 236 00:15:04,460 --> 00:15:08,030 If I was doing complex things, which I could do, 237 00:15:08,030 --> 00:15:16,070 then I'd have to pay attention and take complex conjugates 238 00:15:16,070 --> 00:15:16,920 at the right moment. 239 00:15:16,920 --> 00:15:19,250 But let's not bother. 240 00:15:19,250 --> 00:15:23,600 So you see, this is just two of these. 241 00:15:23,600 --> 00:15:26,960 And it gives me 0. 242 00:15:26,960 --> 00:15:28,070 So that term's gone. 243 00:15:30,690 --> 00:15:34,470 And similarly, totally similarly, this term is gone. 244 00:15:34,470 --> 00:15:40,820 This is A transpose u, all transpose. 245 00:15:40,820 --> 00:15:45,410 I'm just doing the same thing times dv dt. 246 00:15:45,410 --> 00:15:48,300 And what is A transpose u? 247 00:15:48,300 --> 00:15:55,580 It's sigma v. So this is sigma v transpose dv dt. 248 00:15:55,580 --> 00:15:59,090 And again 0, because of this. 249 00:16:02,630 --> 00:16:07,630 So in a way this was a slightly easier thing-- 250 00:16:07,630 --> 00:16:12,830 the last time was completely parallel computation. 251 00:16:12,830 --> 00:16:17,410 But the first and third terms had to cancel each other with 252 00:16:17,410 --> 00:16:19,840 the x's and y's. 253 00:16:19,840 --> 00:16:29,690 Now, they disappear separately, leaving the right answer. 254 00:16:29,690 --> 00:16:32,860 You might think, how did we get into derivatives 255 00:16:32,860 --> 00:16:35,470 of singular values? 256 00:16:35,470 --> 00:16:40,210 Well, I think if we're going to understand the SVD, 257 00:16:40,210 --> 00:16:45,040 then the first derivative of the sigma is-- 258 00:16:45,040 --> 00:16:47,320 well, except that I've survived all these years 259 00:16:47,320 --> 00:16:48,230 without knowing it. 260 00:16:48,230 --> 00:16:50,440 So you could say it's not-- 261 00:16:53,340 --> 00:16:58,330 you can live without it, but it's a pretty nice formula. 262 00:16:58,330 --> 00:17:05,780 OK, that completes that Section 3.1. 263 00:17:05,780 --> 00:17:09,770 And more to say about 3.2, which was the interlacing 264 00:17:09,770 --> 00:17:11,960 part that I introduced. 265 00:17:11,960 --> 00:17:14,720 OK, so where am I? 266 00:17:14,720 --> 00:17:26,220 I guess I'm thinking about the neat topics about interlacing 267 00:17:26,220 --> 00:17:28,060 of eigenvalues. 268 00:17:28,060 --> 00:17:33,810 So may I pick up on that theme, interlacing of eigenvalues 269 00:17:33,810 --> 00:17:39,730 and say what's in the notes and what's the general idea? 270 00:17:39,730 --> 00:17:40,230 OK. 271 00:17:43,290 --> 00:17:48,480 So we're leaving the derivatives and moving 272 00:17:48,480 --> 00:17:54,570 to finite changes in the eigenvalues and singular 273 00:17:54,570 --> 00:17:58,020 values, and we are recognizing that we 274 00:17:58,020 --> 00:18:02,730 can't get exact formulas for the change, 275 00:18:02,730 --> 00:18:06,200 but we can get bounds for change. 276 00:18:06,200 --> 00:18:07,990 And they are pretty cool. 277 00:18:07,990 --> 00:18:12,060 So let me remind you what that is, what they are. 278 00:18:12,060 --> 00:18:15,260 So I have a matrix-- 279 00:18:15,260 --> 00:18:18,450 let's see, a symmetric matrix S that 280 00:18:18,450 --> 00:18:22,080 has eigenvalues lambda 1, greater equal lambda 2, 281 00:18:22,080 --> 00:18:25,920 greater equal so on. 282 00:18:25,920 --> 00:18:28,680 Then I change S by some amount. 283 00:18:28,680 --> 00:18:35,520 I think in the notes there is a number, theta times 1 matrix. 284 00:18:35,520 --> 00:18:40,080 That has eigenvalues mu 1, greater equal mu 2, 285 00:18:40,080 --> 00:18:43,530 greater equal something. 286 00:18:43,530 --> 00:18:49,610 And these are what I can't give you an exact formula for. 287 00:18:49,610 --> 00:18:52,850 You just would have to compute them. 288 00:18:52,850 --> 00:18:57,410 But I can give you bounds for them. 289 00:18:57,410 --> 00:18:59,470 And the bounds come from the lambdas. 290 00:19:02,030 --> 00:19:04,100 So this was a positive. 291 00:19:04,100 --> 00:19:05,700 This is a positive change. 292 00:19:09,590 --> 00:19:14,140 So the eigenvalues will go up, or stay still, 293 00:19:14,140 --> 00:19:16,760 but they won't go down. 294 00:19:16,760 --> 00:19:20,830 So the mu's will be bigger than the lambdas. 295 00:19:20,830 --> 00:19:27,130 But the neat thing is that mu 2 will not pass up lambda 1. 296 00:19:27,130 --> 00:19:29,240 So here is the interlacing. 297 00:19:29,240 --> 00:19:32,110 Mu 1 is greater equal lambda 1. 298 00:19:32,110 --> 00:19:35,350 That says that the highest eigenvalue, the top eigenvalue 299 00:19:35,350 --> 00:19:39,690 went up, or didn't move. 300 00:19:39,690 --> 00:19:44,640 But mu 2 is below lambda 1. 301 00:19:44,640 --> 00:19:46,540 This is the new-- everybody's with me here? 302 00:19:46,540 --> 00:19:50,210 This is a new, and this is the old. 303 00:19:50,210 --> 00:19:56,510 New and old being old is S, new is with the change in S. 304 00:19:56,510 --> 00:20:01,540 And that mu 2 is greater equal lambda 2. 305 00:20:01,540 --> 00:20:04,010 So the second eigenvalues went up. 306 00:20:04,010 --> 00:20:05,158 And then so on. 307 00:20:10,910 --> 00:20:13,790 That's a great fact. 308 00:20:13,790 --> 00:20:16,810 And I guess that I sent out a puzzle question. 309 00:20:16,810 --> 00:20:19,215 Did it arrive in email? 310 00:20:25,100 --> 00:20:29,820 Did anybody see that puzzle question and think about it? 311 00:20:29,820 --> 00:20:31,060 It worried me for a while. 312 00:20:36,480 --> 00:20:41,820 Suppose this is the second eigenvalue value-- 313 00:20:41,820 --> 00:20:44,310 eigenvector. 314 00:20:44,310 --> 00:20:50,520 So I'm adding on, I'm hyping up the second eigenvector, 315 00:20:50,520 --> 00:20:52,830 hyping up the matrix in the direction 316 00:20:52,830 --> 00:20:54,370 of the second eigenvector. 317 00:20:57,890 --> 00:21:02,250 So the second eigenvalue was lambda 2. 318 00:21:02,250 --> 00:21:05,280 And its mu 2, the new second eigenvalue, 319 00:21:05,280 --> 00:21:06,860 is going to be bigger by theta. 320 00:21:11,190 --> 00:21:15,120 But then I lost a little sleep in thinking, OK, 321 00:21:15,120 --> 00:21:20,130 if the second eigenvalue is mu 2 plus theta-- 322 00:21:20,130 --> 00:21:22,980 sorry, if the second eigenvalue mu 2-- 323 00:21:22,980 --> 00:21:24,300 so let me write it here. 324 00:21:24,300 --> 00:21:35,460 If mu 2, the second eigenvalue, is the old lambda 2 plus theta 325 00:21:35,460 --> 00:21:45,390 then bad news, because theta can be as big as I want. 326 00:21:45,390 --> 00:21:48,180 It can be 20, 200, 2,000. 327 00:21:48,180 --> 00:21:57,300 And if I'm just adding theta to lambda 2 to get the second-- 328 00:21:57,300 --> 00:22:01,440 because it's a second eigenvector that's 329 00:22:01,440 --> 00:22:10,140 getting pumped up, then after a while, mu 2 will pass lambda 1. 330 00:22:10,140 --> 00:22:11,520 This will be totally true. 331 00:22:11,520 --> 00:22:13,200 I have no worries about this. 332 00:22:13,200 --> 00:22:14,610 The old lambda 1-- 333 00:22:14,610 --> 00:22:18,150 actually, the old-- 334 00:22:18,150 --> 00:22:21,000 I'll even have equality here, because 335 00:22:21,000 --> 00:22:27,600 for this particular change, it's not affecting lambda 1. 336 00:22:27,600 --> 00:22:30,430 So I think mu 1 would be lambda 1 337 00:22:30,430 --> 00:22:34,080 in my hypothetical possibility. 338 00:22:34,080 --> 00:22:35,550 What I'm trying to get you to do is 339 00:22:35,550 --> 00:22:39,210 to think through what this means, because it's quite 340 00:22:39,210 --> 00:22:43,170 easy to write that line there. 341 00:22:43,170 --> 00:22:46,950 But then when you think about it, you get some questions. 342 00:22:46,950 --> 00:22:50,810 And it looks as if it might fail, 343 00:22:50,810 --> 00:22:57,110 because if theta is really big, that mu 2 would pass up 344 00:22:57,110 --> 00:22:57,860 lambda 1. 345 00:22:57,860 --> 00:23:00,500 And the thing would fail. 346 00:23:00,500 --> 00:23:02,570 And there has to be a catch. 347 00:23:02,570 --> 00:23:05,960 There has to be a catch. 348 00:23:05,960 --> 00:23:11,540 So does anybody-- you saw that in the email. 349 00:23:11,540 --> 00:23:16,400 And I'll now explain what how I understood 350 00:23:16,400 --> 00:23:24,650 that everything can work and I'm not reaching a contradiction. 351 00:23:24,650 --> 00:23:27,110 And here's my thinking. 352 00:23:27,110 --> 00:23:32,810 So it's perfectly true that the eigenvalue that goes with u2-- 353 00:23:32,810 --> 00:23:36,320 or maybe I should be calling them x2, because usually I 354 00:23:36,320 --> 00:23:38,750 call the eigenvectors x2-- 355 00:23:38,750 --> 00:23:42,950 it's perfectly true that mu 2, that that one goes up. 356 00:23:46,020 --> 00:23:51,900 But what happens when it reaches lambda 1? 357 00:23:51,900 --> 00:23:54,435 Actually, lambda 1, the first eigenvalue, 358 00:23:54,435 --> 00:23:57,930 is staying put, because it's not getting any push from this. 359 00:23:57,930 --> 00:24:01,700 But the second eigenvalue is getting a push of size theta. 360 00:24:01,700 --> 00:24:07,290 So what happens when lambda 2 plus theta, which is mu 2-- 361 00:24:07,290 --> 00:24:09,480 mu 2 is lambda 2 plus theta-- 362 00:24:09,480 --> 00:24:12,720 what happens when it comes up to lambda 1 363 00:24:12,720 --> 00:24:15,120 and I start worrying that it passes lambda 1? 364 00:24:18,410 --> 00:24:21,740 Do you see what's happening there? 365 00:24:21,740 --> 00:24:25,250 What happens when mu 2 passes-- 366 00:24:25,250 --> 00:24:26,720 when mu 2, which is-- 367 00:24:26,720 --> 00:24:28,220 I'm just going to copy here-- 368 00:24:28,220 --> 00:24:31,875 it's the old lambda 2 plus the theta, the number. 369 00:24:31,875 --> 00:24:34,250 What happens when theta gets bigger and bigger and bigger 370 00:24:34,250 --> 00:24:37,670 and this hits this thing and then goes beyond? 371 00:24:37,670 --> 00:24:40,850 Just to see the logic here. 372 00:24:40,850 --> 00:24:46,760 What happens is that this lambda 2 plus theta, which was mu 2, 373 00:24:46,760 --> 00:24:49,070 mu 2 until they got here. 374 00:24:49,070 --> 00:24:55,570 But what is lambda 2 plus theta after it passes lambda 1? 375 00:24:55,570 --> 00:24:56,800 It's lambda 1 now. 376 00:24:59,340 --> 00:25:02,190 It passed up, so it's the top eigenvalue 377 00:25:02,190 --> 00:25:07,390 of the altered matrix. 378 00:25:07,390 --> 00:25:10,380 And therefore, it's just fine. 379 00:25:10,380 --> 00:25:11,130 It's out here. 380 00:25:11,130 --> 00:25:13,740 No problem. 381 00:25:13,740 --> 00:25:15,810 Maybe I'll just say it again. 382 00:25:15,810 --> 00:25:20,010 When theta is big enough that mu 2 reaches 383 00:25:20,010 --> 00:25:23,520 lambda 1, if I increase theta beyond that, 384 00:25:23,520 --> 00:25:30,060 then this becomes not mu 2 any more, but mu 1. 385 00:25:30,060 --> 00:25:35,130 And then totally everybody's happy. 386 00:25:35,130 --> 00:25:40,260 I won't say more on that, because that's just like a way 387 00:25:40,260 --> 00:25:44,760 that I found to make me think, what do these things mean? 388 00:25:44,760 --> 00:25:48,070 OK, enough said on that small point. 389 00:25:48,070 --> 00:25:51,730 But then the main point is, why is this true? 390 00:25:51,730 --> 00:25:59,240 This interlacing, which is really a nice, beautiful fact. 391 00:25:59,240 --> 00:26:05,500 And you could imagine that we have 392 00:26:05,500 --> 00:26:09,220 more different perturbations than just rank 1s. 393 00:26:13,300 --> 00:26:19,750 So let me tell you the inequality, so named 394 00:26:19,750 --> 00:26:23,650 after the discoverer, Weyl's inequality. 395 00:26:27,790 --> 00:26:39,400 So his inequality is for the eigenvalues of S plus T. 396 00:26:39,400 --> 00:26:41,980 So T is the change. 397 00:26:41,980 --> 00:26:43,170 S is where I start. 398 00:26:43,170 --> 00:26:45,340 It has eigenvalues lambda. 399 00:26:45,340 --> 00:26:48,520 But now, I'm looking at the eigenvalues of S plus T. 400 00:26:48,520 --> 00:26:50,860 So I'm making a change. 401 00:26:50,860 --> 00:26:53,350 Over here, in my little puzzle question, 402 00:26:53,350 --> 00:26:56,710 that was T. It was a rank 1 change. 403 00:26:56,710 --> 00:26:59,860 Now I will allow other ranks. 404 00:26:59,860 --> 00:27:03,430 So I want to estimate lambdas of S plus t 405 00:27:03,430 --> 00:27:10,880 in terms of lambdas of S and lambdas of T. 406 00:27:10,880 --> 00:27:13,710 And I want some inequality sign there. 407 00:27:17,000 --> 00:27:21,680 And it's supposed to be true for any symmetric matrices, 408 00:27:21,680 --> 00:27:26,800 symmetric S and T. 409 00:27:26,800 --> 00:27:32,360 And then a totally identical Weyl inequality-- 410 00:27:32,360 --> 00:27:33,980 actually, Weyl was one of the people 411 00:27:33,980 --> 00:27:36,380 who discovered singular values. 412 00:27:36,380 --> 00:27:39,350 And when he did it, he asked about his inequality. 413 00:27:39,350 --> 00:27:42,370 And he found that it still worked the way we've 414 00:27:42,370 --> 00:27:44,180 found this morning earlier. 415 00:27:47,210 --> 00:27:49,490 I haven't completed that yet, because I 416 00:27:49,490 --> 00:27:54,790 haven't told you which lambdas I'm talking about. 417 00:27:54,790 --> 00:27:58,420 So let me do that. 418 00:27:58,420 --> 00:28:01,050 So now, I'll tell you Weyl's inequality. 419 00:28:01,050 --> 00:28:03,280 So S and T are symmetric. 420 00:28:03,280 --> 00:28:05,770 And so the lambdas are real. 421 00:28:05,770 --> 00:28:07,670 And we want to know-- 422 00:28:07,670 --> 00:28:10,060 we want to get them in order. 423 00:28:10,060 --> 00:28:11,740 OK, so here it goes. 424 00:28:15,170 --> 00:28:21,460 Weyl allowed the i-th eigenvalue of S and the j-th eigenvalue 425 00:28:21,460 --> 00:28:27,850 of T and figured out that this was bounded by that eigenvalue 426 00:28:27,850 --> 00:28:32,650 of S plus T. So that's Weyl's great inequality, 427 00:28:32,650 --> 00:28:42,730 which reduces to the one I wrote here, 428 00:28:42,730 --> 00:28:44,680 if I make the right choice-- 429 00:28:44,680 --> 00:28:47,940 yeah, probably, if I take j equal to 1. 430 00:28:47,940 --> 00:28:51,340 So you see the beauty of this. 431 00:28:51,340 --> 00:28:56,560 It tells you about any eigenvalues of S, 432 00:28:56,560 --> 00:28:57,760 eigenvalues of T. 433 00:28:57,760 --> 00:29:00,670 So I'm using lambdas here. 434 00:29:00,670 --> 00:29:02,950 Lambda of S are the eigenvalues of S. 435 00:29:02,950 --> 00:29:07,480 I'm using lambda again for T and lambda again for S plus T. 436 00:29:07,480 --> 00:29:11,830 So you have to pay attention to which matrix I'm 437 00:29:11,830 --> 00:29:13,330 taking the eigenvalues out of. 438 00:29:13,330 --> 00:29:17,270 So let me take j equal to 1. 439 00:29:17,270 --> 00:29:21,780 And this says that lambda i, because j is 1, 440 00:29:21,780 --> 00:29:28,210 S plus T is less or equal to lambda i of S plus lambda 1, 441 00:29:28,210 --> 00:29:40,170 the top eigenvalue of T. This is lambda max of T. 442 00:29:40,170 --> 00:29:44,850 Do you see that that's totally reasonable, believable? 443 00:29:44,850 --> 00:29:49,260 That the eigenvalue when I add on T-- let's 444 00:29:49,260 --> 00:29:52,260 imagine in our minds that T is positive. 445 00:29:52,260 --> 00:29:56,640 T is like this thing. 446 00:29:56,640 --> 00:30:02,280 This could be the T, example of a T. It's what I'm adding on. 447 00:30:02,280 --> 00:30:06,570 Then the eigenvalues go up. 448 00:30:06,570 --> 00:30:09,870 But they don't pass that. 449 00:30:09,870 --> 00:30:12,510 So that tells you how much it could go up by. 450 00:30:12,510 --> 00:30:19,070 So I guess that Weyl is giving us a less than or equal here. 451 00:30:19,070 --> 00:30:22,050 Less or equal to lambda 1-- 452 00:30:22,050 --> 00:30:24,450 so I'm taking i to be 1-- 453 00:30:24,450 --> 00:30:27,320 plus theta. 454 00:30:27,320 --> 00:30:32,590 Yeah, so that any equality I've written down there-- 455 00:30:32,590 --> 00:30:37,020 there's some playing around to do to get practice. 456 00:30:37,020 --> 00:30:44,310 And it's not so essential for us to be like world grandmasters 457 00:30:44,310 --> 00:30:48,030 at this thing, but you should see it. 458 00:30:48,030 --> 00:30:52,010 And you should also see j equal to 2. 459 00:30:52,010 --> 00:30:55,230 Why will j equal to 2 tell us something? 460 00:30:55,230 --> 00:30:58,120 I hope it will. 461 00:30:58,120 --> 00:31:00,310 Let's see what it tells us. 462 00:31:00,310 --> 00:31:04,720 Lambda i plus 1 now-- j is 2-- 463 00:31:04,720 --> 00:31:12,640 of S plus T. So it's less than or equal to lambda i of S 464 00:31:12,640 --> 00:31:19,480 plus lambda 2 of T. I think that's interesting. 465 00:31:19,480 --> 00:31:34,280 And also, I think I also could get lambda i plus i minus 1. 466 00:31:34,280 --> 00:31:37,570 Let me write it and see if it's correct. 467 00:31:37,570 --> 00:31:40,755 Plus lambda i minus 1. 468 00:31:40,755 --> 00:31:43,690 So those was add up to i plus 2. 469 00:31:43,690 --> 00:31:51,120 Yeah, I guess lambda i plus 1 plus lambda 1 of T. 470 00:31:51,120 --> 00:31:55,050 That's what I got by taking-- 471 00:31:55,050 --> 00:31:57,360 yeah, did I do that right? 472 00:32:00,480 --> 00:32:03,696 I'm taking j equal to 1. 473 00:32:03,696 --> 00:32:07,480 No, well, I don't think I got it right. 474 00:32:10,520 --> 00:32:15,080 What do I want to do here to get a bound on lambda i plus 1? 475 00:32:15,080 --> 00:32:16,610 I want to take j equal to 2. 476 00:32:16,610 --> 00:32:23,210 I should just be sensible and plug in j equal to 2 477 00:32:23,210 --> 00:32:24,270 and i equal to 1. 478 00:32:29,510 --> 00:32:35,480 All I want to say is that Weyl's inequality is the great fact 479 00:32:35,480 --> 00:32:38,390 out of which all this interlacing falls 480 00:32:38,390 --> 00:32:42,650 and more and more, because the interlacing is telling me 481 00:32:42,650 --> 00:32:46,020 about neighbors. 482 00:32:46,020 --> 00:32:50,790 And actually if I use Weyl for i and j, different i's and j's, I 483 00:32:50,790 --> 00:32:56,166 even learn about ones that are not neighbors. 484 00:32:56,166 --> 00:33:00,600 And I could tell you a proof of Weyl's inequality. 485 00:33:00,600 --> 00:33:02,340 But I'll save that for the notes. 486 00:33:07,310 --> 00:33:09,100 So I think maybe that's what I want 487 00:33:09,100 --> 00:33:14,240 to do about interfacing, just to say what the notes have, 488 00:33:14,240 --> 00:33:17,550 but not repeat it all in class. 489 00:33:17,550 --> 00:33:24,230 So the notes have actually two ways to prove this interlacing. 490 00:33:24,230 --> 00:33:27,830 The standard way that every mathematician would use 491 00:33:27,830 --> 00:33:30,990 would be Weyl's inequality. 492 00:33:30,990 --> 00:33:36,600 But last year, Professor Rao, visiting, 493 00:33:36,600 --> 00:33:42,060 found a nice argument that's also in the notes. 494 00:33:42,060 --> 00:33:43,700 It ends up with a graph. 495 00:33:43,700 --> 00:33:47,540 And on that graph, you can see that this is true. 496 00:33:47,540 --> 00:33:55,530 So for what it's worth, two approaches to this interlacing 497 00:33:55,530 --> 00:33:58,440 and some examples. 498 00:33:58,440 --> 00:34:01,710 But I really don't want to spend our lives 499 00:34:01,710 --> 00:34:05,000 on this eigenvalue topic. 500 00:34:05,000 --> 00:34:08,940 It's a beautiful fact about symmetric matrices 501 00:34:08,940 --> 00:34:12,510 and the corresponding fact is true for singular values 502 00:34:12,510 --> 00:34:18,270 of any matrix, but let's think of leaving it there. 503 00:34:21,150 --> 00:34:28,670 So now, I'm moving on to the new section. 504 00:34:28,670 --> 00:34:30,710 The new section involves something 505 00:34:30,710 --> 00:34:31,897 called compressed sensing. 506 00:34:31,897 --> 00:34:33,605 I don't know if you've heard those words. 507 00:34:45,949 --> 00:34:52,699 So these are all topics in Section 4.4, which you have. 508 00:34:52,699 --> 00:34:55,880 I think we sent it out 10 days ago probably. 509 00:34:58,660 --> 00:35:04,000 OK, so first let me remember what the nuclear norm is 510 00:35:04,000 --> 00:35:06,320 of a matrix. 511 00:35:06,320 --> 00:35:19,635 The nuclear norm a matrix is the sum of the singular values, 512 00:35:19,635 --> 00:35:22,460 the sum of the singular values. 513 00:35:22,460 --> 00:35:29,170 So it's like the L1 norm for a vector. 514 00:35:29,170 --> 00:35:32,080 That's a right way to think about it. 515 00:35:32,080 --> 00:35:34,030 And do you remember what was special? 516 00:35:34,030 --> 00:35:38,230 We've talked about using the L1 norm. 517 00:35:38,230 --> 00:35:42,610 It has this special property that the ordinary L2 518 00:35:42,610 --> 00:35:45,190 norm absolutely does not have. 519 00:35:45,190 --> 00:35:48,070 What was it special about the L1 norm? 520 00:35:48,070 --> 00:35:56,080 If I minimize the L1 norm with some constraint, like ab equal 521 00:35:56,080 --> 00:36:01,320 b, what's special about the solution, the minimum in the L1 522 00:36:01,320 --> 00:36:02,040 norm? 523 00:36:02,040 --> 00:36:02,910 AUDIENCE: Sparse. 524 00:36:02,910 --> 00:36:04,160 GILBERT STRANG: Sparse, right. 525 00:36:04,160 --> 00:36:06,920 Sparse. 526 00:36:06,920 --> 00:36:10,700 So this is moving us up to matrices. 527 00:36:10,700 --> 00:36:13,670 And that's where compressed sensing comes in. 528 00:36:13,670 --> 00:36:16,230 Matrix completion comes in. 529 00:36:16,230 --> 00:36:20,580 So matrix completion would just be-- 530 00:36:20,580 --> 00:36:23,270 I mentioned-- so this is completion. 531 00:36:26,120 --> 00:36:28,590 And I'll remember the words Netflix, 532 00:36:28,590 --> 00:36:31,910 which made the problem famous. 533 00:36:31,910 --> 00:36:44,140 So I have the matrix A, 3, 2, question mark, question mark, 534 00:36:44,140 --> 00:36:47,310 question mark, 1, 4, 6, question mark-- 535 00:36:53,390 --> 00:36:55,780 missing data. 536 00:36:55,780 --> 00:36:59,650 And so I have to put it in something there, 537 00:36:59,650 --> 00:37:03,000 because if I don't put in anything, then the numbers 538 00:37:03,000 --> 00:37:07,930 I do know are useless, because no row or no column 539 00:37:07,930 --> 00:37:10,100 is complete. 540 00:37:10,100 --> 00:37:12,100 So it just would give up. 541 00:37:12,100 --> 00:37:14,710 Somebody that sent me the data, 3 and 2 542 00:37:14,710 --> 00:37:20,050 and didn't tell me a ranking for the third movie, 543 00:37:20,050 --> 00:37:22,780 I'd have to say, well, I can't use it. 544 00:37:22,780 --> 00:37:24,010 That's not possible. 545 00:37:24,010 --> 00:37:28,990 So we need to think about there. 546 00:37:28,990 --> 00:37:36,790 And the idea is that the numbers that minimized the nuclear norm 547 00:37:36,790 --> 00:37:40,180 are a good choice, a good choice. 548 00:37:40,180 --> 00:37:46,540 So that's just a connection here that we will say more about, 549 00:37:46,540 --> 00:37:49,300 but not-- 550 00:37:49,300 --> 00:37:52,240 we could have a whole course in compressed sensing 551 00:37:52,240 --> 00:37:53,470 and nuclear norm. 552 00:37:53,470 --> 00:37:59,470 Professor Parrilo in course 6 is an expert on this. 553 00:38:03,450 --> 00:38:06,980 But you see the point that-- 554 00:38:06,980 --> 00:38:18,900 so you remember v1 came from the 0 norm. 555 00:38:21,530 --> 00:38:23,585 And what is the 0 norm of the vector? 556 00:38:26,940 --> 00:38:27,860 Well, it's not a norm. 557 00:38:27,860 --> 00:38:31,760 So you could say, forget it, no answer. 558 00:38:31,760 --> 00:38:34,880 But what do we symbolically mean when 559 00:38:34,880 --> 00:38:38,310 I write the 0 norm of a vector? 560 00:38:38,310 --> 00:38:41,070 I mean the number of....? 561 00:38:41,070 --> 00:38:42,540 Non-zeros. 562 00:38:42,540 --> 00:38:44,520 The number of non-zeros. 563 00:38:44,520 --> 00:38:55,430 This was the number of non-zeros in the vector, in v. 564 00:38:55,430 --> 00:39:03,720 But it's not a norm, because if I take 2 times the vector, 565 00:39:03,720 --> 00:39:07,330 I have the same number of non-zeros, same norm. 566 00:39:07,330 --> 00:39:09,662 I can't have the norm of 2v equal the norm 567 00:39:09,662 --> 00:39:15,890 of v. That would blow away all the properties of norms. 568 00:39:15,890 --> 00:39:17,920 So v0 is not a norm. 569 00:39:17,920 --> 00:39:21,500 And then we move it to that sort of appropriate nearest norm. 570 00:39:21,500 --> 00:39:23,260 And we get v1. 571 00:39:23,260 --> 00:39:27,320 We get the L1 norm, which is the sum of-- 572 00:39:27,320 --> 00:39:32,474 everybody remembers that this is the sum of the vi. 573 00:39:32,474 --> 00:39:37,950 And you remember my pictures of diamonds touching 574 00:39:37,950 --> 00:39:41,400 planes at sharp points. 575 00:39:41,400 --> 00:39:44,430 Well, that's what is going on here. 576 00:39:44,430 --> 00:39:48,660 That problem was called basis pursuit. 577 00:39:48,660 --> 00:39:51,600 And it comes back again in this section. 578 00:39:54,120 --> 00:40:01,570 So I minimize this norm subject to the conditions. 579 00:40:01,570 --> 00:40:07,070 Now, I'm just going to take a jump to the matrix case. 580 00:40:07,070 --> 00:40:09,920 What's my idea here? 581 00:40:09,920 --> 00:40:14,420 My idea is that for a matrix, the nuclear norm 582 00:40:14,420 --> 00:40:15,440 comes from what? 583 00:40:18,830 --> 00:40:21,570 What's the norm that we sort of start with, 584 00:40:21,570 --> 00:40:24,220 but it's not a norm? 585 00:40:24,220 --> 00:40:27,280 And when I sort of take the-- 586 00:40:27,280 --> 00:40:34,000 because the requirements for a norm don't fail-- 587 00:40:34,000 --> 00:40:38,170 they fail for what I'm about to write there. 588 00:40:38,170 --> 00:40:41,920 I could put A 0, but I don't want 589 00:40:41,920 --> 00:40:44,740 the number of non-zero entries. 590 00:40:44,740 --> 00:40:47,280 That would be a good guess. 591 00:40:47,280 --> 00:40:50,480 And probably in some sense it makes sense. 592 00:40:50,480 --> 00:40:53,960 But it's not the answer I'm looking for. 593 00:40:53,960 --> 00:41:04,030 What do you think is the 0 norm of a matrix that is not a norm, 594 00:41:04,030 --> 00:41:09,400 but when I pump it up to the best, to the nearest good norm, 595 00:41:09,400 --> 00:41:11,710 I get the nuclear norm? 596 00:41:11,710 --> 00:41:14,500 So this is the question, it's what is A0? 597 00:41:18,698 --> 00:41:22,160 And it's what? 598 00:41:22,160 --> 00:41:23,060 AUDIENCE: The rank. 599 00:41:23,060 --> 00:41:24,290 GILBERT STRANG: The rank. 600 00:41:24,290 --> 00:41:27,890 The rank of matrix is the equivalent. 601 00:41:32,630 --> 00:41:34,170 So I don't know about the zero. 602 00:41:34,170 --> 00:41:35,720 Nobody else calls it A0. 603 00:41:35,720 --> 00:41:37,460 So I better not. 604 00:41:37,460 --> 00:41:39,180 It's the rank. 605 00:41:39,180 --> 00:41:40,890 So again, the rank is not a norm, 606 00:41:40,890 --> 00:41:43,820 because if I double the matrix, I don't double the rank. 607 00:41:46,600 --> 00:41:48,220 So I have to move to a norm. 608 00:41:48,220 --> 00:41:50,410 And it turns out to be the nuclear norm. 609 00:41:50,410 --> 00:41:52,540 And now, I'll just, with one minute, 610 00:41:52,540 --> 00:41:57,850 say it's the guess of some people who are working hard 611 00:41:57,850 --> 00:42:02,920 to prove it, that the deep learning 612 00:42:02,920 --> 00:42:07,120 algorithm of gradient descent finds 613 00:42:07,120 --> 00:42:12,850 the solution to the minimum problem in the nuclear norm. 614 00:42:12,850 --> 00:42:16,140 And we don't know if that's true or not yet. 615 00:42:16,140 --> 00:42:25,540 For related examples, like this thing, it's proved. 616 00:42:25,540 --> 00:42:31,870 For the exact problem of deep learning, it's a conjecture. 617 00:42:31,870 --> 00:42:35,200 So that's what in section 4.4. 618 00:42:35,200 --> 00:42:38,860 But that word lasso, you want to know what that is. 619 00:42:38,860 --> 00:42:41,140 Compressed sensing, I'll say a word about. 620 00:42:41,140 --> 00:42:46,540 So that will be Monday after Alex Townsend's lecture Friday. 621 00:42:46,540 --> 00:42:51,790 So he's coming to speak to computational 622 00:42:51,790 --> 00:42:55,960 science students all over MIT tomorrow afternoon. 623 00:42:55,960 --> 00:43:00,100 I'll certainly go to that, but then he 624 00:43:00,100 --> 00:43:03,850 said he would come in and take this class Friday. 625 00:43:03,850 --> 00:43:05,410 So I'll see you Friday. 626 00:43:05,410 --> 00:43:07,455 And he'll be here too.