1 00:00:01,550 --> 00:00:03,920 The following content is provided under a Creative 2 00:00:03,920 --> 00:00:05,310 Commons license. 3 00:00:05,310 --> 00:00:07,520 Your support will help MIT OpenCourseWare 4 00:00:07,520 --> 00:00:11,610 continue to offer high quality educational resources for free. 5 00:00:11,610 --> 00:00:14,180 To make a donation or to view additional materials 6 00:00:14,180 --> 00:00:18,140 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,140 --> 00:00:19,026 at ocw.mit.edu. 8 00:00:23,077 --> 00:00:23,660 PROFESSOR: OK. 9 00:00:23,660 --> 00:00:26,720 So I thought I'd begin today with, 10 00:00:26,720 --> 00:00:29,750 as we're coming to the end of the sort of focus 11 00:00:29,750 --> 00:00:35,340 on linear algebra and moving on to a little probability, 12 00:00:35,340 --> 00:00:42,170 a little more optimization, and a lot of deep learning. 13 00:00:42,170 --> 00:00:44,930 So this was like, by way of review, 14 00:00:44,930 --> 00:00:50,540 to write down the big factorizations of a matrix. 15 00:00:50,540 --> 00:00:55,490 And so my idea, and I kind of enjoyed it, 16 00:00:55,490 --> 00:00:59,780 is checking that the number of free parameters, 17 00:00:59,780 --> 00:01:04,640 say an L and U or a Q and R or every-- each 18 00:01:04,640 --> 00:01:07,940 of those, that the number of free parameters 19 00:01:07,940 --> 00:01:11,690 agrees with the number of parameters in A itself, 20 00:01:11,690 --> 00:01:13,890 like n squared, usually. 21 00:01:13,890 --> 00:01:16,340 So A usually has n squared. 22 00:01:16,340 --> 00:01:21,560 And then can we replace A if-- after we've computed L and U, 23 00:01:21,560 --> 00:01:23,150 can we throw away A? 24 00:01:23,150 --> 00:01:27,680 Yes, because all the information is in L and U. 25 00:01:27,680 --> 00:01:32,520 And it fills that same n by n matrix. 26 00:01:32,520 --> 00:01:39,200 Well, that's kind of obvious because L is lower triangular, 27 00:01:39,200 --> 00:01:43,520 and the diagonal, all ones, are not free parameters. 28 00:01:43,520 --> 00:01:47,780 And U is triangular, upper triangular. 29 00:01:47,780 --> 00:01:50,180 And it's diagonal to the pivots. 30 00:01:50,180 --> 00:01:52,580 Those are free parameters so that-- 31 00:01:52,580 --> 00:01:55,490 but can I just write down the count? 32 00:01:55,490 --> 00:02:00,080 So I'll go through each of these just quickly 33 00:02:00,080 --> 00:02:05,330 after I've figured out how-- these are sort of the building 34 00:02:05,330 --> 00:02:06,500 blocks. 35 00:02:06,500 --> 00:02:11,480 So how many free parameters are there in these two triangular 36 00:02:11,480 --> 00:02:12,740 matrices? 37 00:02:12,740 --> 00:02:17,660 Well, I think the answer is 1/2 n, n minus 1, 38 00:02:17,660 --> 00:02:22,350 and 1/2 n, n plus 1. 39 00:02:22,350 --> 00:02:24,500 That's a familiar number. 40 00:02:28,280 --> 00:02:32,980 You recognize that as the sum of 1 plus 2, up to n. 41 00:02:32,980 --> 00:02:35,930 And you have one free para-- 42 00:02:35,930 --> 00:02:37,910 in the upper triangular U. You've 43 00:02:37,910 --> 00:02:42,470 got one free parameter up in the corner, two in the next one. 44 00:02:42,470 --> 00:02:44,510 And as you're coming down, you end up 45 00:02:44,510 --> 00:02:46,820 with n on the main diagonal. 46 00:02:46,820 --> 00:02:48,500 And they add up to that. 47 00:02:48,500 --> 00:02:52,310 And you see that those two are different by n, 48 00:02:52,310 --> 00:02:54,620 which is what we want. 49 00:02:54,620 --> 00:02:55,520 OK. 50 00:02:55,520 --> 00:02:56,300 Diagonal. 51 00:02:56,300 --> 00:02:58,340 The answer is obviously n. 52 00:03:00,980 --> 00:03:02,705 How about the eigenvector matrix? 53 00:03:05,910 --> 00:03:08,760 This whole exercise is like something 54 00:03:08,760 --> 00:03:12,390 I've never seen in a textbook. 55 00:03:12,390 --> 00:03:17,910 But for me it brings back all these key-- 56 00:03:17,910 --> 00:03:21,900 really the condensed course in linear algebra 57 00:03:21,900 --> 00:03:23,580 is on that top line. 58 00:03:23,580 --> 00:03:27,760 So how many free parameters in an eigenvector matrix? 59 00:03:27,760 --> 00:03:28,620 OK. 60 00:03:28,620 --> 00:03:31,400 And of course, if you're sort of thinking, 61 00:03:31,400 --> 00:03:36,980 what's the rule for free parameters? 62 00:03:36,980 --> 00:03:41,010 My answer is going to be, for the number of free parameters, 63 00:03:41,010 --> 00:03:46,360 so this is an n by n matrix with the n eigenvectors in it. 64 00:03:46,360 --> 00:03:50,250 But there's a certain freedom there. 65 00:03:50,250 --> 00:03:51,290 And what is that? 66 00:03:51,290 --> 00:03:54,290 What freedom do we have in choosing the eigenvector 67 00:03:54,290 --> 00:03:56,540 matrix? 68 00:03:56,540 --> 00:04:01,730 Every eigenvector can be multiplied by a scalar. 69 00:04:01,730 --> 00:04:04,250 If x is an eigenvector, so is 2x. 70 00:04:04,250 --> 00:04:05,540 So is 3x. 71 00:04:05,540 --> 00:04:09,740 So we could make a convention that the first component 72 00:04:09,740 --> 00:04:11,570 was always 1. 73 00:04:11,570 --> 00:04:15,080 Maybe that wouldn't be the most intelligent convention 74 00:04:15,080 --> 00:04:16,140 in the world. 75 00:04:16,140 --> 00:04:19,610 But it would show that that top row of ones 76 00:04:19,610 --> 00:04:21,320 were not to be counted. 77 00:04:21,320 --> 00:04:26,070 So I get n squared minus n for that. 78 00:04:26,070 --> 00:04:26,570 Oh, yeah. 79 00:04:26,570 --> 00:04:32,240 Well, having done those two, let me look at this one. 80 00:04:32,240 --> 00:04:35,090 Does that come out a total of n squared? 81 00:04:35,090 --> 00:04:38,960 Yes, because the eigenvector x has n 82 00:04:38,960 --> 00:04:42,890 squared minus n by this reasoning, little hokey 83 00:04:42,890 --> 00:04:45,050 reasoning that I just gave. 84 00:04:45,050 --> 00:04:49,490 And then there are n more for the eigenvalue matrix. 85 00:04:49,490 --> 00:04:52,730 And there's nothing left for the eigen-- 86 00:04:52,730 --> 00:04:55,760 the inverse because it's determined by x. 87 00:04:55,760 --> 00:05:00,650 So do you see the count adding up to n squared for those? 88 00:05:00,650 --> 00:05:02,950 Now, I left open the orthogonal one. 89 00:05:02,950 --> 00:05:06,110 I think we kind of talked about that during the-- 90 00:05:06,110 --> 00:05:08,120 when we met it. 91 00:05:08,120 --> 00:05:10,970 And it's a little less obvious. 92 00:05:10,970 --> 00:05:12,140 But do you remember? 93 00:05:12,140 --> 00:05:17,510 So I'm talking about an n by n orthogonal matrix, Q. So 94 00:05:17,510 --> 00:05:20,900 how many free parameters in column one? 95 00:05:20,900 --> 00:05:24,410 That column is what we always call Q1. 96 00:05:24,410 --> 00:05:26,390 Does it have n free parameters? 97 00:05:26,390 --> 00:05:31,280 Or is there a condition that cuts that back? 98 00:05:31,280 --> 00:05:34,020 There is a condition, right? 99 00:05:34,020 --> 00:05:36,440 And what's the condition on the first column 100 00:05:36,440 --> 00:05:40,780 that removes one parameter? 101 00:05:40,780 --> 00:05:42,400 It's normalized. 102 00:05:42,400 --> 00:05:43,900 Its length is 1. 103 00:05:43,900 --> 00:05:49,560 So I only get n minus 1 from the first column. 104 00:05:49,560 --> 00:05:52,330 And now if I move over to the second column, 105 00:05:52,330 --> 00:05:55,030 how many free parameters there? 106 00:05:55,030 --> 00:05:57,350 Again, it's a unit vector. 107 00:05:57,350 --> 00:06:02,290 But also, it is orthogonal to the first. 108 00:06:02,290 --> 00:06:06,760 So two parameters got you-- two rules got imposed. 109 00:06:06,760 --> 00:06:08,860 And two parameters got removed. 110 00:06:08,860 --> 00:06:11,050 So this is n minus 2. 111 00:06:11,050 --> 00:06:12,880 And then finally, whatever. 112 00:06:12,880 --> 00:06:14,380 So I think that that-- 113 00:06:14,380 --> 00:06:18,910 sum of these guys is exactly the same that we had up here. 114 00:06:18,910 --> 00:06:26,290 I think it's also 1/2 n, n minus 1, or 1/2n squared minus n. 115 00:06:26,290 --> 00:06:26,940 Yeah. 116 00:06:26,940 --> 00:06:28,690 Yeah, so not as many as you might 117 00:06:28,690 --> 00:06:32,930 think because the matrix is size n squared. 118 00:06:32,930 --> 00:06:35,380 Now, can I use those? 119 00:06:35,380 --> 00:06:37,090 Because these are the-- 120 00:06:37,090 --> 00:06:38,770 like the building blocks. 121 00:06:38,770 --> 00:06:40,100 Can I just check these? 122 00:06:40,100 --> 00:06:40,870 Let's see. 123 00:06:40,870 --> 00:06:42,240 I'll just go along the list. 124 00:06:42,240 --> 00:06:46,120 L times U. So L had this. 125 00:06:46,120 --> 00:06:47,170 And U had that. 126 00:06:47,170 --> 00:06:50,670 And when I add those, it adds up to n squared. 127 00:06:50,670 --> 00:06:51,170 Right? 128 00:06:51,170 --> 00:06:53,560 The minus cancels the plus. 129 00:06:53,560 --> 00:06:56,750 And the 1/2n squared twice gives me n squared. 130 00:06:56,750 --> 00:06:58,750 So good for that one. 131 00:06:58,750 --> 00:07:01,750 What about QR? 132 00:07:01,750 --> 00:07:06,580 Well, R is upper triangular like so. 133 00:07:06,580 --> 00:07:09,910 And then Q, we just got it right there. 134 00:07:09,910 --> 00:07:15,790 So for Q times R, it's that plus that again, adding to n 135 00:07:15,790 --> 00:07:17,580 squared. 136 00:07:17,580 --> 00:07:18,630 Good for that one. 137 00:07:18,630 --> 00:07:20,630 n squared for that one. 138 00:07:20,630 --> 00:07:23,810 And this one we just did. 139 00:07:23,810 --> 00:07:26,690 n squared minus n in x. 140 00:07:26,690 --> 00:07:27,980 n on the diagonal. 141 00:07:27,980 --> 00:07:29,540 Total n squared. 142 00:07:29,540 --> 00:07:30,530 What about this guy? 143 00:07:33,400 --> 00:07:36,870 What about the big, really fundamental one 144 00:07:36,870 --> 00:07:41,400 that I would normally write to matrix as S instead of A 145 00:07:41,400 --> 00:07:43,260 to remind us that it-- 146 00:07:43,260 --> 00:07:47,400 that the matrix here is symmetric? 147 00:07:47,400 --> 00:07:53,040 So I'm not expecting n squared for a symmetric ma-- 148 00:07:53,040 --> 00:07:54,960 oh, I should've put that on my list. 149 00:07:54,960 --> 00:07:57,150 What's the count for a symmetric matrix? 150 00:08:00,120 --> 00:08:02,760 Because this is an S here. 151 00:08:02,760 --> 00:08:05,880 So I'm not expecting to get n squared. 152 00:08:05,880 --> 00:08:12,230 I'm only expecting to get the number of symmetric S. 153 00:08:12,230 --> 00:08:15,230 What's the number of free parameters that I would-- 154 00:08:15,230 --> 00:08:20,240 that I start with that I hope will reappear in Q and lambda? 155 00:08:24,601 --> 00:08:26,575 What's the deal for a symmetric matrix? 156 00:08:29,130 --> 00:08:30,590 Let's see. 157 00:08:30,590 --> 00:08:32,309 I'm free to choose. 158 00:08:32,309 --> 00:08:35,340 Is it the same count as this? 159 00:08:35,340 --> 00:08:38,220 Yeah, because I'm free to choose the upper triangular 160 00:08:38,220 --> 00:08:42,929 part and the diagonal, but I'm not free to choose the lower. 161 00:08:42,929 --> 00:08:48,150 So I'd say that's 1/2n times n minus 1. 162 00:08:48,150 --> 00:08:49,530 And plus 1. 163 00:08:49,530 --> 00:08:50,205 Sorry. 164 00:08:50,205 --> 00:08:52,720 The diagonal's in there. 165 00:08:52,720 --> 00:08:53,430 OK. 166 00:08:53,430 --> 00:08:58,650 So do I get that total, 1/2 of n squared plus n, 167 00:08:58,650 --> 00:09:02,220 from these guys? 168 00:09:02,220 --> 00:09:04,890 Well, I probably do. 169 00:09:04,890 --> 00:09:07,330 The diagonal guy gives me n. 170 00:09:07,330 --> 00:09:08,970 This gives me n. 171 00:09:08,970 --> 00:09:14,580 And that's a Q, which is my other favorite number there. 172 00:09:14,580 --> 00:09:20,730 And when I add that to that, that becomes a plus sign. 173 00:09:20,730 --> 00:09:22,760 And I'm good. 174 00:09:22,760 --> 00:09:24,000 Yeah. 175 00:09:24,000 --> 00:09:26,510 You see how I enjoy doing this, right? 176 00:09:26,510 --> 00:09:27,630 But I'm near the end. 177 00:09:27,630 --> 00:09:33,370 But the last one is kind of not well known. 178 00:09:33,370 --> 00:09:33,870 OK. 179 00:09:33,870 --> 00:09:37,350 Q times S. Do you remember that factorization? 180 00:09:37,350 --> 00:09:40,470 That's called the polar decomposition. 181 00:09:40,470 --> 00:09:44,730 It's an orthogonal times the symmetric. 182 00:09:44,730 --> 00:09:50,550 And it is often used in engineering as a way 183 00:09:50,550 --> 00:09:56,980 to decompose a displacement, strain matrix. 184 00:09:56,980 --> 00:09:59,470 Anyway, Q times S. And it-- 185 00:09:59,470 --> 00:10:03,370 actually, it's very, very close to the SVD. 186 00:10:03,370 --> 00:10:05,440 And I have friends who say, better 187 00:10:05,440 --> 00:10:09,870 to compute QS than the SVD and then just move along. 188 00:10:09,870 --> 00:10:16,120 Anyway, Q times S. So Q is this guy. 189 00:10:16,120 --> 00:10:19,890 And S. What's S? 190 00:10:19,890 --> 00:10:20,650 Symmetric. 191 00:10:20,650 --> 00:10:21,550 That's this guy. 192 00:10:24,220 --> 00:10:27,640 So that's Q. Let me write that letter Q and S 193 00:10:27,640 --> 00:10:30,280 so I don't lose it. 194 00:10:30,280 --> 00:10:31,690 What do those add up to? 195 00:10:34,650 --> 00:10:35,900 N squared. 196 00:10:35,900 --> 00:10:36,910 Happy. 197 00:10:36,910 --> 00:10:37,720 OK. 198 00:10:37,720 --> 00:10:40,730 So finally, the SVD. 199 00:10:40,730 --> 00:10:43,170 Finally, the SVD. 200 00:10:43,170 --> 00:10:45,560 What's the count? 201 00:10:45,560 --> 00:10:49,900 Now I've got rectangular stuff in there. 202 00:10:49,900 --> 00:10:52,360 I'm ready for this one. 203 00:10:52,360 --> 00:10:53,840 And I have to think a little bit. 204 00:10:57,450 --> 00:10:58,710 And we may have done this. 205 00:11:01,940 --> 00:11:06,920 Let's suppose that m is less or equal n. 206 00:11:06,920 --> 00:11:09,680 Suppose that. 207 00:11:09,680 --> 00:11:10,980 Yeah. 208 00:11:10,980 --> 00:11:14,100 Otherwise, we would just transpose and look at SVD. 209 00:11:14,100 --> 00:11:17,310 So let's say m less or equal n. 210 00:11:17,310 --> 00:11:19,200 So let's say it's got full rank. 211 00:11:22,230 --> 00:11:26,640 And what's the largest rank that the matrix can have? 212 00:11:26,640 --> 00:11:28,110 m, clearly. 213 00:11:28,110 --> 00:11:29,820 Full rank m. 214 00:11:29,820 --> 00:11:35,130 So the SVD will be m by m. 215 00:11:35,130 --> 00:11:40,950 Let's remember the U, the sigma, and the V transpose. 216 00:11:40,950 --> 00:11:43,440 This will be m by n. 217 00:11:43,440 --> 00:11:45,960 And this will be n by n. 218 00:11:45,960 --> 00:11:48,360 For the full scale SVD. 219 00:11:48,360 --> 00:11:55,770 And if the rank is equal to m, then I really expect to get-- 220 00:11:55,770 --> 00:11:58,430 I expect it to add up to the total 221 00:11:58,430 --> 00:12:06,770 for A. For A, the original A has mn, right? 222 00:12:06,770 --> 00:12:09,740 It's an m by n matrix. 223 00:12:09,740 --> 00:12:18,400 The matrix A is m by n with the m less or equal n, giving me 224 00:12:18,400 --> 00:12:18,970 these things. 225 00:12:18,970 --> 00:12:21,670 So it has mn parameters. 226 00:12:25,570 --> 00:12:29,380 So do we get m times n from this? 227 00:12:29,380 --> 00:12:31,190 I hope we do. 228 00:12:31,190 --> 00:12:33,120 I know how many we get from sigma. 229 00:12:33,120 --> 00:12:33,620 What? 230 00:12:33,620 --> 00:12:36,390 How many was the count for sigma? 231 00:12:36,390 --> 00:12:38,450 m. 232 00:12:38,450 --> 00:12:41,670 And what's the count for V? 233 00:12:41,670 --> 00:12:43,410 So that's an n by n. 234 00:12:43,410 --> 00:12:46,370 And what's the count for U? 235 00:12:46,370 --> 00:12:47,090 OK. 236 00:12:47,090 --> 00:12:47,870 Yeah. 237 00:12:47,870 --> 00:12:49,330 They're orthogonal matrices. 238 00:12:49,330 --> 00:12:52,720 So I should be able to go up to that line. 239 00:12:52,720 --> 00:12:55,270 This was an m by n one. 240 00:12:55,270 --> 00:12:57,700 Is that a 1/2n, n minus 1? 241 00:12:57,700 --> 00:13:04,420 Am I copying that correctly out of this circle there? 242 00:13:04,420 --> 00:13:07,130 That's an m by m orthogonal matrix. 243 00:13:07,130 --> 00:13:08,600 Oh, but I have to write m. 244 00:13:08,600 --> 00:13:10,250 That was foolish. 245 00:13:10,250 --> 00:13:12,774 OK. m. 246 00:13:12,774 --> 00:13:14,120 m. 247 00:13:14,120 --> 00:13:17,690 Yeah, because that matrix is of size m. 248 00:13:17,690 --> 00:13:20,270 So that's an m. 249 00:13:20,270 --> 00:13:23,030 And then I have that. 250 00:13:23,030 --> 00:13:28,540 And then I have whatever V transpose n by n. 251 00:13:28,540 --> 00:13:30,328 Oh, what's the deal in there? 252 00:13:30,328 --> 00:13:30,828 Hmm. 253 00:13:33,760 --> 00:13:38,350 Do I want all of the 1/2n, n minus 1? 254 00:13:41,773 --> 00:13:43,730 Oh, God. 255 00:13:43,730 --> 00:13:46,920 I thought I had got this straight. 256 00:13:46,920 --> 00:13:47,510 Let's see. 257 00:13:51,380 --> 00:13:55,730 I could subtract this from this and find out what I should say. 258 00:13:55,730 --> 00:13:57,860 Whoa. 259 00:13:57,860 --> 00:14:01,350 Students have been known to do this too. 260 00:14:01,350 --> 00:14:02,740 Let's see. 261 00:14:02,740 --> 00:14:05,300 Well, let's try to think anyway. 262 00:14:05,300 --> 00:14:07,730 So I have this n by n symmet-- 263 00:14:07,730 --> 00:14:09,935 this n by n orthogonal matrix. 264 00:14:13,100 --> 00:14:17,880 First, it could be any orthogonal matrix. 265 00:14:17,880 --> 00:14:20,210 Yeah. 266 00:14:20,210 --> 00:14:26,860 But is it only the first m columns that I really need? 267 00:14:26,860 --> 00:14:31,120 The rest I could just throw away. 268 00:14:31,120 --> 00:14:35,620 Let me try to imagine that it's just the first. 269 00:14:35,620 --> 00:14:39,400 Well, then I won't have any n in here. 270 00:14:39,400 --> 00:14:42,610 So maybe I better take a 1/2n-- 271 00:14:42,610 --> 00:14:44,170 no. 272 00:14:44,170 --> 00:14:44,670 Help. 273 00:14:44,670 --> 00:14:47,040 Oh, oh, yes, of course. 274 00:14:47,040 --> 00:14:48,450 Ha. 275 00:14:48,450 --> 00:14:55,330 I've got only m columns that matter, the-- 276 00:14:55,330 --> 00:14:59,825 everybody sort of now understands that SVD. 277 00:14:59,825 --> 00:15:01,090 The rank is m. 278 00:15:01,090 --> 00:15:02,690 Don't forget that. 279 00:15:02,690 --> 00:15:03,370 OK. 280 00:15:03,370 --> 00:15:10,270 Then the first R, the first m columns of V are important. 281 00:15:10,270 --> 00:15:15,070 Those are the singular vectors that go with nonzero singular 282 00:15:15,070 --> 00:15:18,280 values that really matter. 283 00:15:18,280 --> 00:15:21,310 And the rest really don't matter. 284 00:15:21,310 --> 00:15:23,010 So I'm going to just-- 285 00:15:23,010 --> 00:15:25,090 I have to count how many-- 286 00:15:25,090 --> 00:15:26,930 so, sorry. 287 00:15:26,930 --> 00:15:39,930 V, the important part of V has how many on the m columns. 288 00:15:39,930 --> 00:15:42,590 But it's an n by n matrix. 289 00:15:42,590 --> 00:15:44,330 And those columns are orthogonal. 290 00:15:44,330 --> 00:15:47,600 So the answer is not mn for this guy. 291 00:15:47,600 --> 00:15:51,560 I have to go through this foolish reasoning again. 292 00:15:51,560 --> 00:16:00,560 I have n minus 1, plus n minus 2, plus so on, plus n minus m. 293 00:16:05,230 --> 00:16:07,060 There were n minus 1 parameters in 294 00:16:07,060 --> 00:16:09,020 the first orthogonal vector-- 295 00:16:09,020 --> 00:16:12,820 unit vector, n minus 2 in the second one, up to n minus 296 00:16:12,820 --> 00:16:14,020 m in the third. 297 00:16:14,020 --> 00:16:17,240 And then V has some more columns that 298 00:16:17,240 --> 00:16:24,400 are coming, really, from a null space, that are not important. 299 00:16:24,400 --> 00:16:29,780 I believe this is the right thing to do. 300 00:16:29,780 --> 00:16:33,120 I'm hoping you agree. 301 00:16:33,120 --> 00:16:38,650 And I mean, I'm hoping even more that those add up to m times n. 302 00:16:38,650 --> 00:16:39,150 OK. 303 00:16:39,150 --> 00:16:40,820 I have a 1/2n s-- 304 00:16:40,820 --> 00:16:43,750 oh, I really have to total this thing. 305 00:16:43,750 --> 00:16:44,250 OK. 306 00:16:44,250 --> 00:16:47,640 This had m terms. 307 00:16:47,640 --> 00:16:52,620 So there's m of these n's. 308 00:16:52,620 --> 00:16:57,660 And then I have to subtract off 1 plus 2 plus 3, up to m. 309 00:16:57,660 --> 00:17:00,510 And so what am I subtracting off? 310 00:17:00,510 --> 00:17:01,500 What's that sum? 311 00:17:01,500 --> 00:17:03,630 1 plus 2 plus 3, stopping at m? 312 00:17:07,079 --> 00:17:10,099 It's one of these guys, 1/2-- 313 00:17:10,099 --> 00:17:13,670 is it 1/2m, m plus 1? 314 00:17:13,670 --> 00:17:17,010 Yeah, 1/2m, m plus 1. 315 00:17:17,010 --> 00:17:17,510 Sorry. 316 00:17:17,510 --> 00:17:21,020 1/2m, m plus 1. 317 00:17:21,020 --> 00:17:22,430 I'm supposed to enjoy this. 318 00:17:22,430 --> 00:17:26,089 And now it gets a little nervous. 319 00:17:26,089 --> 00:17:27,290 But OK. 320 00:17:27,290 --> 00:17:30,470 So I believe that that is that. 321 00:17:30,470 --> 00:17:31,280 OK. 322 00:17:31,280 --> 00:17:32,510 Well, we have the mn. 323 00:17:32,510 --> 00:17:36,180 That's a good sign that we're shooting for. 324 00:17:36,180 --> 00:17:39,020 So does the rest of it add to nothing? 325 00:17:39,020 --> 00:17:42,440 Well, I guess, yeah, I guess it does. 326 00:17:42,440 --> 00:17:47,600 When I put these two together, I have 1/2m, m plus 1. 327 00:17:47,600 --> 00:17:49,950 And then I'm subtracting it away again. 328 00:17:49,950 --> 00:17:50,980 So I get mn. 329 00:17:50,980 --> 00:17:53,420 Hooray. 330 00:17:53,420 --> 00:17:58,090 Well, it had to happen, or we wouldn't-- 331 00:17:58,090 --> 00:18:01,040 anything-- before I erase that board and consign 332 00:18:01,040 --> 00:18:05,120 that to history, is there-- should I pause a little more? 333 00:18:05,120 --> 00:18:06,210 Minute? 334 00:18:06,210 --> 00:18:09,500 This will be, like, I'm hoping, a one-page appendix 335 00:18:09,500 --> 00:18:11,840 to the notes and the book. 336 00:18:11,840 --> 00:18:12,980 And you'll see it. 337 00:18:12,980 --> 00:18:18,050 But I do have one more count to do. 338 00:18:18,050 --> 00:18:21,750 And then I'm good with this review 339 00:18:21,750 --> 00:18:27,120 and ready to move onward to the topic of saddle points 340 00:18:27,120 --> 00:18:29,220 and ready to move onward after that. 341 00:18:29,220 --> 00:18:33,990 Well, I'll say a little bit about the next lab homework 342 00:18:33,990 --> 00:18:35,880 that I'm creating. 343 00:18:35,880 --> 00:18:42,330 And then our next topic will be, like, covariance matrices, 344 00:18:42,330 --> 00:18:45,890 a little statistics this week. 345 00:18:45,890 --> 00:18:49,560 Then we get a week off we could-- to digest it. 346 00:18:49,560 --> 00:18:55,440 And then come back for gradient descent, and deep learning, 347 00:18:55,440 --> 00:18:57,550 and those things. 348 00:18:57,550 --> 00:18:59,190 OK. 349 00:18:59,190 --> 00:19:01,020 Everybody happy with that? 350 00:19:01,020 --> 00:19:04,260 So what's my final question? 351 00:19:04,260 --> 00:19:20,770 My final question is the SVD for any matrix of rank R. 352 00:19:20,770 --> 00:19:25,550 So it's an m by n matrix. 353 00:19:25,550 --> 00:19:30,950 But the rank is only R. It's a natural question-- 354 00:19:30,950 --> 00:19:36,020 how many parameters are there in a rank R matrix? 355 00:19:39,020 --> 00:19:41,840 We may even have touched on this question. 356 00:19:41,840 --> 00:19:44,840 And I have two ways to answer it. 357 00:19:44,840 --> 00:19:48,250 And one way is the SVD. 358 00:19:48,250 --> 00:19:52,460 And that will be similar to what I just pushed up there. 359 00:19:52,460 --> 00:19:58,880 So if the rank is R, the SVD of this typical rank R matrix 360 00:19:58,880 --> 00:20:02,510 will be U sigma V transpose. 361 00:20:02,510 --> 00:20:05,590 But U, now this is the-- 362 00:20:05,590 --> 00:20:10,210 like the condensed thing, where I've thrown away 363 00:20:10,210 --> 00:20:14,860 stuff that's automatically zero because if the rank is only R, 364 00:20:14,860 --> 00:20:18,520 like if the rank was 1, suppose the rank was 1, 365 00:20:18,520 --> 00:20:24,790 then I'd have 1 column times 1 sigma times 1 row, right? 366 00:20:24,790 --> 00:20:28,780 And I could do that count for R equal 1. 367 00:20:28,780 --> 00:20:31,340 Now I have R columns. 368 00:20:31,340 --> 00:20:40,195 So this is m by R. Then sigma is diagonal, of course. 369 00:20:40,195 --> 00:20:42,760 So I'm going to get R numbers out of that. 370 00:20:42,760 --> 00:20:45,500 And this one is now R by n. 371 00:20:45,500 --> 00:20:51,760 In other words, maybe I should, like, save this little bit here 372 00:20:51,760 --> 00:20:53,380 that was helpful. 373 00:20:53,380 --> 00:20:57,400 But now I've got m is reduced to R. 374 00:20:57,400 --> 00:21:00,700 So I believe that if I count these three, 375 00:21:00,700 --> 00:21:04,540 I'll get the right number of parameters for a rank R matrix. 376 00:21:04,540 --> 00:21:10,270 And that's not so obvious because the rank R matrices 377 00:21:10,270 --> 00:21:11,230 are not a-- 378 00:21:11,230 --> 00:21:13,990 we don't have a subspace. 379 00:21:13,990 --> 00:21:17,720 If I add a rank R matrix to another rank R matrix, 380 00:21:17,720 --> 00:21:20,680 well, the rank could be as big as 2R and probably will be. 381 00:21:25,200 --> 00:21:27,940 You know, it's just a little interesting 382 00:21:27,940 --> 00:21:32,240 to get your hands on matrices of rank R 383 00:21:32,240 --> 00:21:38,500 because they're kind of a thin, like a, well, a mass-- 384 00:21:38,500 --> 00:21:42,250 person would call it a manifold, some kind of a surface 385 00:21:42,250 --> 00:21:45,190 within matrix space. 386 00:21:45,190 --> 00:21:47,220 Have you ever thought about matrix space? 387 00:21:47,220 --> 00:21:50,580 So that's vector space because we can add matrices. 388 00:21:50,580 --> 00:21:52,840 We can multiply them by constants. 389 00:21:52,840 --> 00:21:55,630 We can take linear combinations. 390 00:21:55,630 --> 00:21:58,240 We could call them vectors if we like. 391 00:21:58,240 --> 00:22:02,740 There would be a vector space of m by n matrices. 392 00:22:02,740 --> 00:22:07,030 What would be the dimension of that space? 393 00:22:07,030 --> 00:22:12,280 So the vector space of all 3 by 4 matrices. 394 00:22:12,280 --> 00:22:15,260 That has what dimension? 395 00:22:15,260 --> 00:22:16,400 12. 396 00:22:16,400 --> 00:22:19,400 12, because you've got 12 numbers to choose. 397 00:22:19,400 --> 00:22:22,070 And it is a space because you can add. 398 00:22:22,070 --> 00:22:27,890 Now if I say 3 by 4 matrices of rank 2, 399 00:22:27,890 --> 00:22:30,840 I don't have a space anymore. 400 00:22:30,840 --> 00:22:34,830 That word, space, is seriously preserved 401 00:22:34,830 --> 00:22:38,010 for meaning vector space, meaning 402 00:22:38,010 --> 00:22:39,390 I can take combinations. 403 00:22:39,390 --> 00:22:43,110 But if I take a rank 2 matrix plus a rank 2 matrix, I'm not-- 404 00:22:43,110 --> 00:22:49,220 so it's sort of a surface within 12d, the 2-- 405 00:22:49,220 --> 00:22:51,830 the 3 by 4 matrices of rank 2. 406 00:22:51,830 --> 00:22:55,790 And we're about to find the dimension of that surface. 407 00:22:55,790 --> 00:23:01,910 Does your mind sort of visualize a surface in 12 dimensions? 408 00:23:01,910 --> 00:23:05,540 Yeah, well, give it a shot anyway. 409 00:23:05,540 --> 00:23:08,420 But that surface could have-- 410 00:23:08,420 --> 00:23:15,460 be 11 dimensional, so to speak, like, meaning locally, the-- 411 00:23:15,460 --> 00:23:17,210 it wouldn't have to be a pl-- 412 00:23:17,210 --> 00:23:22,370 an 11 dimensional plane going through the origin. 413 00:23:22,370 --> 00:23:24,170 In fact, it wouldn't go through the origin 414 00:23:24,170 --> 00:23:26,910 because the origin won't have rank R. 415 00:23:26,910 --> 00:23:28,910 So it's some kind of a surface. 416 00:23:28,910 --> 00:23:31,610 And maybe it's got some different pieces. 417 00:23:31,610 --> 00:23:34,550 Probably, some smart person knows 418 00:23:34,550 --> 00:23:36,890 what that surface looks like. 419 00:23:36,890 --> 00:23:40,220 But we're just going to find out something 420 00:23:40,220 --> 00:23:44,210 about its number of parameters, its local dimension. 421 00:23:44,210 --> 00:23:50,840 Well, I know that this answer is R because I've got R sigmas. 422 00:23:50,840 --> 00:23:53,450 And this one, I'm pretty good at. 423 00:23:53,450 --> 00:23:55,760 But now it's not-- 424 00:23:55,760 --> 00:23:59,020 it's R by n, so it's-- 425 00:23:59,020 --> 00:24:02,765 instead of-- here R was m. 426 00:24:02,765 --> 00:24:11,435 But now, down here, R is R. So I think it's rn minus 1/2. 427 00:24:14,120 --> 00:24:14,900 What's that? 428 00:24:14,900 --> 00:24:16,790 Is that an m? 429 00:24:16,790 --> 00:24:18,830 So now it's an r. 430 00:24:18,830 --> 00:24:21,120 r plus 1. 431 00:24:21,120 --> 00:24:23,100 I think. 432 00:24:23,100 --> 00:24:25,500 I think. 433 00:24:25,500 --> 00:24:28,770 And what about the U? 434 00:24:28,770 --> 00:24:34,040 So U is going to be similar, except instead of the n here, 435 00:24:34,040 --> 00:24:35,690 we've got an m. 436 00:24:35,690 --> 00:24:42,960 So I think for you, we'll have m minus 1, plus m minus 2, plus-- 437 00:24:42,960 --> 00:24:45,110 so let me write it here. 438 00:24:45,110 --> 00:24:46,966 m minus 1. 439 00:24:46,966 --> 00:24:51,290 So U, I'm talking about U here, it's got R columns. 440 00:24:51,290 --> 00:24:54,740 The first one has m minus 1 because I throw away 1 441 00:24:54,740 --> 00:24:59,130 because it's a unit vector, up to m minus r. 442 00:24:59,130 --> 00:25:01,190 That's r's column. 443 00:25:01,190 --> 00:25:02,150 OK. 444 00:25:02,150 --> 00:25:05,990 And now so how-- what does that add up to? 445 00:25:05,990 --> 00:25:08,200 Well, I put all the m's together. 446 00:25:08,200 --> 00:25:14,930 So that's rm, or let me say mr. And then I'm subtracting on 1 447 00:25:14,930 --> 00:25:17,540 plus 2 plus 3, up to r. 448 00:25:17,540 --> 00:25:20,750 Now tell me again what that adds up to. 449 00:25:20,750 --> 00:25:23,900 1 plus 2 plus 3, stop at r. 450 00:25:23,900 --> 00:25:26,640 That's what we had here. 451 00:25:26,640 --> 00:25:30,240 And we've got it for V. And we've got it again here. 452 00:25:30,240 --> 00:25:35,850 Minus 1/2 r, r plus 1. 453 00:25:35,850 --> 00:25:37,890 Are you OK with that? 454 00:25:37,890 --> 00:25:41,370 And now I just want to add them up. 455 00:25:41,370 --> 00:25:45,240 So I have mr. And I have nr. 456 00:25:45,240 --> 00:25:46,980 And then I have two of these. 457 00:25:46,980 --> 00:25:49,440 So let me get it here. 458 00:25:49,440 --> 00:25:53,250 mr and nr. 459 00:25:53,250 --> 00:25:56,340 And now I have to look at-- so mr, check. 460 00:25:56,340 --> 00:25:58,140 nr, check. 461 00:25:58,140 --> 00:25:59,910 Now I have two of these guys. 462 00:25:59,910 --> 00:26:04,170 So they combine into r squared plus r. 463 00:26:04,170 --> 00:26:09,860 And then I-- r squared, yeah, minus r squared plus r. 464 00:26:09,860 --> 00:26:10,490 Sorry. 465 00:26:10,490 --> 00:26:13,580 They combine into minus r squared plus r. 466 00:26:13,580 --> 00:26:16,730 And then here's r coming in with a plus. 467 00:26:16,730 --> 00:26:18,620 I think we have a minus r squared. 468 00:26:21,600 --> 00:26:24,280 And that is the right answer. 469 00:26:24,280 --> 00:26:25,010 Yeah. 470 00:26:25,010 --> 00:26:26,000 OK. 471 00:26:26,000 --> 00:26:30,110 So I took a bit longer than I intended. 472 00:26:30,110 --> 00:26:35,150 But this is a number that's sort of interesting. 473 00:26:35,150 --> 00:26:37,390 I mentioned saddle points sort of, like, separately 474 00:26:37,390 --> 00:26:42,580 from maxima and minima just because they are definitely not 475 00:26:42,580 --> 00:26:44,710 as easy to work with. 476 00:26:44,710 --> 00:26:47,870 You understand what I mean by saddle points? 477 00:26:47,870 --> 00:26:52,280 The matrices involved have-- 478 00:26:52,280 --> 00:26:54,850 are not positive definite. 479 00:26:54,850 --> 00:26:58,510 Those would go with a maximum. 480 00:26:58,510 --> 00:27:02,250 They're not negati-- they're not-- 481 00:27:02,250 --> 00:27:04,750 well, those would go with maxima and minima. 482 00:27:04,750 --> 00:27:06,880 But we're looking in between. 483 00:27:06,880 --> 00:27:08,390 So saddle points. 484 00:27:08,390 --> 00:27:08,910 OK. 485 00:27:08,910 --> 00:27:11,270 Well, I'll get going on those. 486 00:27:11,270 --> 00:27:11,770 OK. 487 00:27:11,770 --> 00:27:16,450 I sort of realized that there are two main sources 488 00:27:16,450 --> 00:27:19,630 of saddle points. 489 00:27:19,630 --> 00:27:24,210 One of them is when I have problems that-- 490 00:27:24,210 --> 00:27:26,870 when I-- let's say I minimize. 491 00:27:26,870 --> 00:27:30,892 So this will be the constraint. 492 00:27:30,892 --> 00:27:33,960 The saddle points have come from the constraint. 493 00:27:33,960 --> 00:27:37,740 So Lagrange is going to be responsible 494 00:27:37,740 --> 00:27:39,450 for these saddle points. 495 00:27:39,450 --> 00:27:43,530 So we might have some minimum problem like minimize, 496 00:27:43,530 --> 00:27:47,110 some positive definite thing. 497 00:27:47,110 --> 00:27:51,520 And of course, if we don't say anymore, the minimum is zero. 498 00:27:51,520 --> 00:27:52,670 Right? 499 00:27:52,670 --> 00:27:54,860 Because otherwise, it's positive. 500 00:27:54,860 --> 00:27:59,720 But we're going to put on constraints, Ax equal b. 501 00:27:59,720 --> 00:28:06,440 So this is the classical constrained optimization 502 00:28:06,440 --> 00:28:12,310 problem, quadratic cost function, linear constraints. 503 00:28:12,310 --> 00:28:16,000 We could solve this exactly. 504 00:28:16,000 --> 00:28:19,065 But let's just see where saddle points is going to arise. 505 00:28:21,690 --> 00:28:25,440 So this S is positive definite. 506 00:28:25,440 --> 00:28:28,410 But now how do we deal with that problem? 507 00:28:28,410 --> 00:28:30,990 Well, Lagrange said what to do. 508 00:28:30,990 --> 00:28:36,500 Lagrange said, look at the Lagrangium. 509 00:28:36,500 --> 00:28:38,220 Well, OK. 510 00:28:38,220 --> 00:28:39,525 He introduced lambda. 511 00:28:42,420 --> 00:28:45,270 This x is in n dimensions. 512 00:28:45,270 --> 00:28:47,100 That's an n by n matrix. 513 00:28:47,100 --> 00:28:48,810 But I have m constraints. 514 00:28:48,810 --> 00:28:50,970 So the matrix A is m by n. 515 00:28:56,460 --> 00:28:58,010 I've m constraints. 516 00:28:58,010 --> 00:29:01,790 And then I'm going to follow the rules and introduce m, 517 00:29:01,790 --> 00:29:03,080 Lagrange multipliers. 518 00:29:06,970 --> 00:29:07,540 That's an m. 519 00:29:10,530 --> 00:29:13,975 And then the neat part of the Legra-- 520 00:29:13,975 --> 00:29:14,475 and what? 521 00:29:14,475 --> 00:29:15,750 What is this? 522 00:29:15,750 --> 00:29:19,650 Well, it's-- I take the function, 523 00:29:19,650 --> 00:29:22,530 and then I introduce-- 524 00:29:22,530 --> 00:29:26,100 remember, lambda's a vector now, not just a number. 525 00:29:26,100 --> 00:29:28,550 We had some application where it was just-- 526 00:29:28,550 --> 00:29:29,940 there was just one constraint. 527 00:29:29,940 --> 00:29:32,050 But now I have m constraints. 528 00:29:32,050 --> 00:29:33,150 So I take a lambda. 529 00:29:33,150 --> 00:29:39,690 So lambda transposed times Ax minus b. 530 00:29:39,690 --> 00:29:44,220 And the plus or the minus sign here is not important. 531 00:29:44,220 --> 00:29:46,320 I mean, you can choose it because that will 532 00:29:46,320 --> 00:29:48,440 determine the sign of lambda. 533 00:29:48,440 --> 00:29:52,260 But either way, it's correct. 534 00:29:52,260 --> 00:29:53,310 OK. 535 00:29:53,310 --> 00:30:00,690 So we've introduced a function that now depends on x and also 536 00:30:00,690 --> 00:30:02,940 on lambda. 537 00:30:02,940 --> 00:30:04,110 And there is a l-- 538 00:30:04,110 --> 00:30:08,010 and they multiply each other in there. 539 00:30:08,010 --> 00:30:14,400 And my point is that Lagrange says, take the derivatives 540 00:30:14,400 --> 00:30:16,410 with respect to x and lambda. 541 00:30:16,410 --> 00:30:19,200 So that's the cool thing that he's contributed. 542 00:30:19,200 --> 00:30:22,750 He says if you only create my function, 543 00:30:22,750 --> 00:30:26,610 now you can take x derivative and lambda derivative. 544 00:30:26,610 --> 00:30:31,080 That will give you n equation for the x-- 545 00:30:31,080 --> 00:30:33,120 from this one, from the x derivative 546 00:30:33,120 --> 00:30:35,400 and the m equation from the lambda derivative. 547 00:30:35,400 --> 00:30:37,260 It will be n plus m. 548 00:30:37,260 --> 00:30:40,200 It will determine the good x and the lambda. 549 00:30:40,200 --> 00:30:44,190 But I'm saying that's all true and all important. 550 00:30:44,190 --> 00:30:47,010 But I'm saying that the x and that pair x 551 00:30:47,010 --> 00:30:50,880 lambda will be a saddle point of this function. 552 00:30:50,880 --> 00:30:56,590 This function has saddle points, not a maximum. 553 00:30:56,590 --> 00:30:57,260 OK. 554 00:30:57,260 --> 00:31:00,210 Let's just take the derivatives and see what we get. 555 00:31:00,210 --> 00:31:05,650 So the derivatives with respect to x, d by dx-- 556 00:31:05,650 --> 00:31:10,330 x is now a vector, so I really should say the gradient 557 00:31:10,330 --> 00:31:12,910 in the x direction. 558 00:31:12,910 --> 00:31:14,500 I get Sx. 559 00:31:17,050 --> 00:31:21,460 And here, the derivative with respect to x, what that's-- 560 00:31:21,460 --> 00:31:27,070 that is A transpose lambda because this is the dot product 561 00:31:27,070 --> 00:31:29,360 of A transpose lambda with x. 562 00:31:29,360 --> 00:31:32,220 You know, I've put parentheses around it 563 00:31:32,220 --> 00:31:34,610 and followed the transpose rule. 564 00:31:34,610 --> 00:31:38,200 So that's the dot product of A transpose lambda with x. 565 00:31:38,200 --> 00:31:39,370 It's linear in x. 566 00:31:39,370 --> 00:31:40,300 So it's derivative. 567 00:31:40,300 --> 00:31:42,460 It's just A transpose lambda. 568 00:31:42,460 --> 00:31:45,950 And that's zero. 569 00:31:45,950 --> 00:31:51,506 And now I take the other one, the lambda derivative. 570 00:31:51,506 --> 00:31:54,305 The lambda derivative, this doesn't depend on lambda. 571 00:31:54,305 --> 00:31:57,080 The lambda derivative is just Ax minus b. 572 00:31:57,080 --> 00:31:59,360 It brings back the constraints. 573 00:32:03,400 --> 00:32:06,130 So that's pretty simple. 574 00:32:06,130 --> 00:32:08,170 It doesn't even require much thought 575 00:32:08,170 --> 00:32:11,290 because you just know the constraints are coming back. 576 00:32:11,290 --> 00:32:16,240 And of course, b should be put over on this side 577 00:32:16,240 --> 00:32:18,310 because it's a constant. 578 00:32:18,310 --> 00:32:21,910 So there we see two-- 579 00:32:21,910 --> 00:32:23,560 a block. 580 00:32:23,560 --> 00:32:27,910 We see an important, very important class of problems. 581 00:32:27,910 --> 00:32:29,740 And the matrix we're seeing, we could 582 00:32:29,740 --> 00:32:37,020 write this in block matrix form, S minus A transpose. 583 00:32:37,020 --> 00:32:39,840 Oh, I'm going to change that into a plus 584 00:32:39,840 --> 00:32:42,660 because I'm more of a plus person. 585 00:32:42,660 --> 00:32:43,230 OK. 586 00:32:43,230 --> 00:32:49,050 A transpose and A, yeah, yeah. 587 00:32:49,050 --> 00:32:51,930 When I took the derivative with respect to lambda, 588 00:32:51,930 --> 00:32:54,210 I didn't put the minus sign in here. 589 00:32:54,210 --> 00:32:55,260 And I didn't want to. 590 00:32:55,260 --> 00:32:57,380 So let's make it a plus. 591 00:32:57,380 --> 00:33:03,420 A and then there's nothing there. 592 00:33:03,420 --> 00:33:11,740 And the x, and the lambda, and the zero, and the B. 593 00:33:11,740 --> 00:33:16,380 That is the model of a constrained minimum, 594 00:33:16,380 --> 00:33:18,940 a minimum problem with constraint. 595 00:33:18,940 --> 00:33:22,928 It's the model because the function here is quadratic 596 00:33:22,928 --> 00:33:24,220 and the constraints are linear. 597 00:33:26,770 --> 00:33:33,850 In Course 6, it's everywhere, constantly appearing 598 00:33:33,850 --> 00:33:35,780 as the simplest model. 599 00:33:35,780 --> 00:33:36,280 OK. 600 00:33:36,280 --> 00:33:40,950 And my point today is just that the solution 601 00:33:40,950 --> 00:33:50,180 x lambda, that total solution, the x together with a lambda, 602 00:33:50,180 --> 00:33:58,190 that that is a saddle point of the Lagrangian function L. 603 00:33:58,190 --> 00:34:00,860 It's a saddle point, not a minimum. 604 00:34:00,860 --> 00:34:05,425 It's sort of a minimum in the x direction 605 00:34:05,425 --> 00:34:10,230 because this is positive definite. 606 00:34:10,230 --> 00:34:14,280 As a function of x, it's going up. 607 00:34:14,280 --> 00:34:19,460 But somehow the appearance of lambda 608 00:34:19,460 --> 00:34:22,940 makes this matrix indefinite. 609 00:34:22,940 --> 00:34:27,469 It starts positive definite, but it has this A transpose A 610 00:34:27,469 --> 00:34:29,060 and that 0. 611 00:34:29,060 --> 00:34:31,940 It couldn't be p-- actually, if I look at that matrix, 612 00:34:31,940 --> 00:34:34,199 I see it's not positive definite. 613 00:34:34,199 --> 00:34:35,760 What do I see? 614 00:34:35,760 --> 00:34:37,580 Why do I say that immediately? 615 00:34:37,580 --> 00:34:41,239 When I look at that matrix, it's not a positive definite matrix 616 00:34:41,239 --> 00:34:48,460 because when I see that 0 on the diagonal, 617 00:34:48,460 --> 00:34:50,230 that shoots positive definite. 618 00:34:50,230 --> 00:34:50,980 Couldn't be. 619 00:34:50,980 --> 00:34:56,830 Take, as an example, S equal 3, 1, and 1, 0. 620 00:34:56,830 --> 00:34:58,510 Take that matrix. 621 00:34:58,510 --> 00:35:01,990 Just random. 622 00:35:01,990 --> 00:35:06,250 I made it 2 by 2 instead of size m plus n. 623 00:35:06,250 --> 00:35:06,760 Do you see? 624 00:35:06,760 --> 00:35:11,530 Or how do I know that the eigenvalues of that matrix, one 625 00:35:11,530 --> 00:35:14,810 is plus and one is minus? 626 00:35:14,810 --> 00:35:19,990 The determinant is negative. 627 00:35:19,990 --> 00:35:22,810 So that tells me right away that one is plus and one is minus. 628 00:35:22,810 --> 00:35:23,470 Thanks. 629 00:35:23,470 --> 00:35:23,970 Yes. 630 00:35:23,970 --> 00:35:24,470 Yeah, yeah. 631 00:35:24,470 --> 00:35:26,020 The determinant is negative. 632 00:35:26,020 --> 00:35:27,890 And somehow here, the determinate, 633 00:35:27,890 --> 00:35:31,900 a similar calculation, would produce A transpose A 634 00:35:31,900 --> 00:35:37,000 or something with a minus because I'm going this way. 635 00:35:37,000 --> 00:35:42,010 Well, I could do better than that. 636 00:35:42,010 --> 00:35:45,190 But you saw the point. 637 00:35:45,190 --> 00:35:53,660 That simple example of this has eigenvalues of both signs. 638 00:35:53,660 --> 00:35:58,580 Let me just quickly say, and I'll put it in the notes 639 00:35:58,580 --> 00:36:02,500 or in that chapter, I guess that all this is coming-- 640 00:36:02,500 --> 00:36:07,900 is still 3.2. 641 00:36:07,900 --> 00:36:09,970 That was originally 4.2. 642 00:36:09,970 --> 00:36:13,130 And you will see it. 643 00:36:13,130 --> 00:36:14,450 So what do I want to say? 644 00:36:14,450 --> 00:36:17,960 I'd like to say that that example is pretty 645 00:36:17,960 --> 00:36:27,440 convincing to me that these KKT matrices, if you talk to people 646 00:36:27,440 --> 00:36:37,680 in optimization, that's Karush, Kuhn, and Tucker, 647 00:36:37,680 --> 00:36:42,460 three famous guys, and these are the KKT conditions 648 00:36:42,460 --> 00:36:47,000 that they derived following Lagrange. 649 00:36:47,000 --> 00:36:47,730 Right. 650 00:36:47,730 --> 00:36:49,480 And my point is-- 651 00:36:49,480 --> 00:36:55,650 and this is a typical sort, so it's an indefinite matrix. 652 00:36:55,650 --> 00:37:02,800 I believe it has that if I do an elimination, yeah, tell 653 00:37:02,800 --> 00:37:04,590 me this. 654 00:37:04,590 --> 00:37:07,080 This is a good way to look at it. 655 00:37:07,080 --> 00:37:10,830 Suppose I do elimination on this one or on this one. 656 00:37:10,830 --> 00:37:13,200 Well, suppose I do elimination there. 657 00:37:13,200 --> 00:37:16,070 What is the first pivot? 658 00:37:16,070 --> 00:37:16,960 3. 659 00:37:16,960 --> 00:37:18,170 Positive. 660 00:37:18,170 --> 00:37:20,270 So now let me turn down to here. 661 00:37:20,270 --> 00:37:26,120 What if I do elimination on this block matrix? 662 00:37:26,120 --> 00:37:27,250 Then I start up here. 663 00:37:27,250 --> 00:37:30,420 And that first pivot is? 664 00:37:30,420 --> 00:37:31,890 Positive again, right? 665 00:37:31,890 --> 00:37:34,530 This S is a positive definite matrix. 666 00:37:34,530 --> 00:37:35,520 Don't forget. 667 00:37:35,520 --> 00:37:39,300 In fact, the first n pivots will all be positive 668 00:37:39,300 --> 00:37:41,730 because the first n pivots, you're 669 00:37:41,730 --> 00:37:44,130 working away in this corner. 670 00:37:44,130 --> 00:37:47,280 And if you're only thinking about the first n, 671 00:37:47,280 --> 00:37:51,870 this corner is size n by n, then you don't even 672 00:37:51,870 --> 00:37:55,410 see A. You're doing some subtractions. 673 00:37:55,410 --> 00:37:56,700 And I'll do those. 674 00:37:56,700 --> 00:37:59,230 But the pivots themselves are coming-- 675 00:37:59,230 --> 00:38:01,770 all coming from S. And S is positive definite. 676 00:38:01,770 --> 00:38:05,490 So we know that one of the tests for a positive definite matrix 677 00:38:05,490 --> 00:38:07,830 is all pivots are positive. 678 00:38:07,830 --> 00:38:12,000 So I think all n of the first pivots will be positive. 679 00:38:12,000 --> 00:38:14,790 And when we use them, let's just see 680 00:38:14,790 --> 00:38:16,100 what happens when we use them. 681 00:38:18,990 --> 00:38:25,030 So here is the KKT matrix that I start with. 682 00:38:25,030 --> 00:38:26,380 And what do I end up with? 683 00:38:29,610 --> 00:38:35,430 Well, really, what I'm doing is I'm multiplying that block row 684 00:38:35,430 --> 00:38:37,310 by something to-- 685 00:38:37,310 --> 00:38:42,880 and subtracting to kill that A. So these rows-- 686 00:38:42,880 --> 00:38:44,180 well, near enough. 687 00:38:44,180 --> 00:38:46,740 Let me do block elimination. 688 00:38:46,740 --> 00:38:48,740 Block elimination is, like, easier. 689 00:38:48,740 --> 00:38:52,830 I don't have to write down all little tiny numbers. 690 00:38:52,830 --> 00:38:57,450 So I just want to multiply this row by something. 691 00:38:57,450 --> 00:38:59,040 Tell me what. 692 00:38:59,040 --> 00:39:02,520 And subtract from this second row. 693 00:39:02,520 --> 00:39:07,110 Suppose they're numbers or letters. 694 00:39:07,110 --> 00:39:09,110 I guess they are letters. 695 00:39:09,110 --> 00:39:12,965 What do I multiply that first row by and subtract? 696 00:39:18,130 --> 00:39:19,576 Let's see. 697 00:39:19,576 --> 00:39:27,240 If these were just little tiny numbers, as like in 3, 1, 1, 0, 698 00:39:27,240 --> 00:39:31,200 what do I multiply that row by and subtract from this? 699 00:39:31,200 --> 00:39:33,870 I multiply by A over S, right? 700 00:39:33,870 --> 00:39:37,400 I do multiply by A over S, which puts an A there. 701 00:39:37,400 --> 00:39:38,730 Then I subtract. 702 00:39:38,730 --> 00:39:42,600 So here I'll multiply by A over S. But these are matrices, 703 00:39:42,600 --> 00:39:48,750 so I multiply by S-- 704 00:39:48,750 --> 00:39:51,210 by AS inverse, right? 705 00:39:51,210 --> 00:39:54,960 When I multiply by AS inverse times this S, I get A. 706 00:39:54,960 --> 00:39:56,220 And then I subtract. 707 00:39:56,220 --> 00:39:58,080 And I get the 0. 708 00:39:58,080 --> 00:40:01,440 And when I multiply by this guy and subtract, 709 00:40:01,440 --> 00:40:06,660 I get minus because I'm subtracting this thing, minus 710 00:40:06,660 --> 00:40:10,680 AS inverse, A transpose. 711 00:40:14,600 --> 00:40:18,705 That was block elimination, which just, in other words, 712 00:40:18,705 --> 00:40:19,205 it's just-- 713 00:40:22,730 --> 00:40:25,910 you've learned about 2 by 2 matrices, 714 00:40:25,910 --> 00:40:29,270 3x plus 4y equals 7 and stuff. 715 00:40:29,270 --> 00:40:32,600 Now I'm just doing it with blocks 716 00:40:32,600 --> 00:40:34,280 instead of single numbers. 717 00:40:34,280 --> 00:40:39,200 But you see, this produced those positive pivots. 718 00:40:39,200 --> 00:40:43,080 And what can you tell me about that matrix? 719 00:40:43,080 --> 00:40:45,370 What kind of-- what can you tell me 720 00:40:45,370 --> 00:40:48,820 about the signs or the eigenvalues or whatever 721 00:40:48,820 --> 00:40:49,720 of this matrix? 722 00:40:53,810 --> 00:40:56,430 Suppose S was the identity. 723 00:40:56,430 --> 00:41:01,310 What could you tell me about minus AA transpose? 724 00:41:01,310 --> 00:41:03,080 Minus AA transpose. 725 00:41:03,080 --> 00:41:06,120 And my voice should emphasize that minus. 726 00:41:06,120 --> 00:41:12,060 It's that matrix there is negative definite. 727 00:41:12,060 --> 00:41:16,140 So all the next set of m pivots that come from here 728 00:41:16,140 --> 00:41:17,290 will all be negative. 729 00:41:17,290 --> 00:41:28,045 So I get m or rather n, n positive, and n negative 730 00:41:28,045 --> 00:41:28,545 pivots. 731 00:41:31,820 --> 00:41:34,710 And then I remember that the pivots actually 732 00:41:34,710 --> 00:41:37,590 have the same sign as the eigenvalues. 733 00:41:37,590 --> 00:41:39,390 That's just a beautiful fact. 734 00:41:39,390 --> 00:41:43,640 We know that for positive definite ones. 735 00:41:43,640 --> 00:41:45,200 The eigenvalues are all positive. 736 00:41:45,200 --> 00:41:47,220 The pivots are all positive. 737 00:41:47,220 --> 00:41:49,550 But it's even better than that. 738 00:41:49,550 --> 00:41:54,140 If we have some mixture for the signs of the pivots, 739 00:41:54,140 --> 00:41:56,990 that tells us the signs of the eigenvalues. 740 00:41:56,990 --> 00:41:59,180 That's a really neat fact. 741 00:41:59,180 --> 00:42:02,120 So I'll just write that down. 742 00:42:02,120 --> 00:42:09,280 Plus and minus signs of pivots give us 743 00:42:09,280 --> 00:42:16,480 the plus and minus signs of the eigenvalues. 744 00:42:16,480 --> 00:42:22,190 So I've sneaked in a nice matrix there that-- 745 00:42:22,190 --> 00:42:25,020 for symmetric matrices. 746 00:42:25,020 --> 00:42:26,920 This is symmetric matrices. 747 00:42:26,920 --> 00:42:27,420 OK. 748 00:42:30,510 --> 00:42:37,100 That's what I wanted to say about constraint and saddle 749 00:42:37,100 --> 00:42:39,390 points coming from there. 750 00:42:39,390 --> 00:42:41,610 And then I now want to say something 751 00:42:41,610 --> 00:42:44,310 about constraints and-- 752 00:42:44,310 --> 00:42:45,990 not constraints now. 753 00:42:45,990 --> 00:42:53,350 I'm going to look at a second source of saddle points. 754 00:42:53,350 --> 00:43:04,090 So these will be saddles from this remarkable function 755 00:43:04,090 --> 00:43:10,480 that we know. 756 00:43:13,360 --> 00:43:18,770 So I now have a symmetric matrix S. Could 757 00:43:18,770 --> 00:43:20,150 be even positive definite. 758 00:43:20,150 --> 00:43:23,020 Usually, it is here. 759 00:43:23,020 --> 00:43:25,390 Do you know what the name for R is? 760 00:43:25,390 --> 00:43:27,910 It's a ratio or a quotient. 761 00:43:27,910 --> 00:43:32,920 It's named after somebody starting with R. Who's that? 762 00:43:32,920 --> 00:43:33,680 Rayleigh. 763 00:43:33,680 --> 00:43:34,210 Right. 764 00:43:34,210 --> 00:43:35,168 It's Rayleigh quotient. 765 00:43:41,590 --> 00:43:44,503 And what is the largest value, possible value 766 00:43:44,503 --> 00:43:45,545 of the Rayleigh quotient? 767 00:43:49,670 --> 00:43:52,990 We've seen this idea. 768 00:43:52,990 --> 00:43:55,965 It is the maximum value of that Rayleigh quotient, 769 00:43:55,965 --> 00:44:01,220 of that ratio, is lambda max. 770 00:44:01,220 --> 00:44:01,720 Right. 771 00:44:01,720 --> 00:44:04,030 Lambda 1, the biggest one. 772 00:44:04,030 --> 00:44:07,900 And the x that does it is the eigenvector. 773 00:44:07,900 --> 00:44:08,710 Right? 774 00:44:08,710 --> 00:44:18,070 So max is lambda 1 and at x equal q1 775 00:44:18,070 --> 00:44:26,020 because q1 transpose Sq 1, over transpose q1. 776 00:44:26,020 --> 00:44:29,680 So I'm plugging in this winner. 777 00:44:29,680 --> 00:44:33,470 And Sq1 is lambda 1q1. 778 00:44:33,470 --> 00:44:33,970 Right? 779 00:44:33,970 --> 00:44:37,000 It's the first eigenvector. 780 00:44:37,000 --> 00:44:39,080 And so a lambda 1 comes out. 781 00:44:39,080 --> 00:44:39,880 So I get lambda 1. 782 00:44:43,260 --> 00:44:45,820 I know everything about that. 783 00:44:45,820 --> 00:44:51,800 And what I know is if I put in any x, what do I know? 784 00:44:51,800 --> 00:44:56,100 If I put in any x whatever and look at this number, 785 00:44:56,100 --> 00:44:59,570 what do I know about that number? 786 00:44:59,570 --> 00:45:02,890 It's smaller than lambda 1. 787 00:45:02,890 --> 00:45:04,450 Or it might hit lambda 1. 788 00:45:04,450 --> 00:45:05,410 But it's not bigger. 789 00:45:05,410 --> 00:45:07,540 That's why maxima are easy. 790 00:45:07,540 --> 00:45:10,990 You put in any vector, and you know what's happening. 791 00:45:10,990 --> 00:45:12,150 You know, it doesn't-- 792 00:45:12,150 --> 00:45:15,430 it's not above the max, obviously. 793 00:45:15,430 --> 00:45:16,530 And what about the min? 794 00:45:19,900 --> 00:45:21,940 That's equally simple, of course. 795 00:45:21,940 --> 00:45:23,470 It's at the bottom. 796 00:45:23,470 --> 00:45:27,720 So what would be the minimum of that Rayleigh-- 797 00:45:27,720 --> 00:45:29,400 of that quotient if I was looking 798 00:45:29,400 --> 00:45:32,190 for what eigenvector and eigenvalue will 799 00:45:32,190 --> 00:45:36,870 I find when I look at the bottom of this? 800 00:45:36,870 --> 00:45:41,010 I will find lambda n, the last guy. 801 00:45:41,010 --> 00:45:41,790 Lambda min. 802 00:45:45,220 --> 00:45:48,310 At the winning x will be its eigenvector. 803 00:45:48,310 --> 00:45:52,270 And again, this stuff will equal lambda n. 804 00:45:55,030 --> 00:45:55,900 So that's easy. 805 00:45:55,900 --> 00:46:01,260 I know that if I put in any vector whatever, 806 00:46:01,260 --> 00:46:03,810 just choose any vector in dimensions 807 00:46:03,810 --> 00:46:08,220 and compute r of x, what do I now also know about our-- 808 00:46:08,220 --> 00:46:10,660 that R of that vector? 809 00:46:10,660 --> 00:46:16,730 It's greater than lambda n. 810 00:46:16,730 --> 00:46:18,740 Below the max, above the min. 811 00:46:22,480 --> 00:46:24,550 Now what about the other lambdas? 812 00:46:24,550 --> 00:46:28,200 Well, the point is that those are saddle points. 813 00:46:28,200 --> 00:46:30,760 The beautiful thing about this Rayleigh quotient 814 00:46:30,760 --> 00:46:36,250 is its derivative equals 0 right at the saddle point-- 815 00:46:36,250 --> 00:46:38,790 at the eigenvectors. 816 00:46:38,790 --> 00:46:44,290 And its value at the eigenvectors is the eigenvalue. 817 00:46:44,290 --> 00:46:46,030 You see what I'm saying? 818 00:46:46,030 --> 00:46:51,370 I have lambda 1 here, a max, lambda n here, a min. 819 00:46:51,370 --> 00:46:55,270 And in between I have a bunch of other lambdas, 820 00:46:55,270 --> 00:46:57,460 which are saddle points. 821 00:46:57,460 --> 00:47:04,000 And if I put an x into r of x and look to see what happens, 822 00:47:04,000 --> 00:47:09,100 I have no idea whether I'm on this side, below it, 823 00:47:09,100 --> 00:47:12,040 or this side, above lambda i. 824 00:47:12,040 --> 00:47:15,430 So the saddle points are more difficult 825 00:47:15,430 --> 00:47:20,630 and take a little more patience. 826 00:47:20,630 --> 00:47:22,520 So that's the other source of saddle points. 827 00:47:30,740 --> 00:47:33,650 Let me just emphasize again what I'm saying. 828 00:47:33,650 --> 00:47:48,320 At lambda at x equal qk, I have some number the-- 829 00:47:48,320 --> 00:47:53,740 r of x has some number of positive eigenvalues 830 00:47:53,740 --> 00:47:58,330 and some number of negative ones for the things 831 00:47:58,330 --> 00:48:00,880 above and below qk. 832 00:48:00,880 --> 00:48:01,780 OK. 833 00:48:01,780 --> 00:48:04,660 I've run out of time to follow up 834 00:48:04,660 --> 00:48:08,833 on the saddle point part of the-- 835 00:48:08,833 --> 00:48:11,185 on the details of this picture. 836 00:48:13,870 --> 00:48:15,200 That will be on the notes. 837 00:48:15,200 --> 00:48:21,580 And I might come back to it at the very start of next time. 838 00:48:21,580 --> 00:48:30,180 Before that, you will have the lab number three. 839 00:48:30,180 --> 00:48:33,330 And then I think we should discuss it because I 840 00:48:33,330 --> 00:48:36,610 haven't done this lab. 841 00:48:36,610 --> 00:48:41,130 It's intended to give you some feeling for overfitting 842 00:48:41,130 --> 00:48:43,650 and also intended to give you a little introduction 843 00:48:43,650 --> 00:48:46,390 to deep learning. 844 00:48:46,390 --> 00:48:52,360 And so I'll get it to you, and we can talk about it Wednesday. 845 00:48:52,360 --> 00:48:56,330 And again, it won't be due until the Wednesday after break. 846 00:48:56,330 --> 00:48:56,830 OK. 847 00:48:56,830 --> 00:48:57,330 Thanks. 848 00:48:57,330 --> 00:48:59,191 So I'll see you Wednesday.