1 00:00:01,550 --> 00:00:03,920 The following content is provided under a Creative 2 00:00:03,920 --> 00:00:05,310 Commons license. 3 00:00:05,310 --> 00:00:07,520 Your support will help MIT OpenCourseWare 4 00:00:07,520 --> 00:00:11,610 continue to offer high quality educational resources for free. 5 00:00:11,610 --> 00:00:14,180 To make a donation or to view additional materials 6 00:00:14,180 --> 00:00:18,140 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,140 --> 00:00:19,026 at ocw.mit.edu. 8 00:00:22,463 --> 00:00:23,630 ALAN EDELMAN: Hi, everybody. 9 00:00:23,630 --> 00:00:24,560 I'm Alan Edelman. 10 00:00:24,560 --> 00:00:29,970 And helped a little bit to teach this class last year. 11 00:00:29,970 --> 00:00:32,990 But happy to see that it's going great this year. 12 00:00:32,990 --> 00:00:37,040 So Professor Strang came to teach 18.06. 13 00:00:37,040 --> 00:00:40,200 Some of you may know the introductory linear algebra 14 00:00:40,200 --> 00:00:40,700 course. 15 00:00:40,700 --> 00:00:44,180 And Professor Strang came by and gave 16 00:00:44,180 --> 00:00:47,032 this great demonstration about the row rank 17 00:00:47,032 --> 00:00:47,990 equals the column rank. 18 00:00:47,990 --> 00:00:50,600 And I'm wondering if you did that in this class at any time. 19 00:00:50,600 --> 00:00:51,830 Or would they have seen that? 20 00:00:51,830 --> 00:00:52,640 AUDIENCE: It's in the notes. 21 00:00:52,640 --> 00:00:53,973 ALAN EDELMAN: It's in the notes. 22 00:00:53,973 --> 00:00:59,960 Well, in any event, just as Professor Strang walked out-- 23 00:00:59,960 --> 00:01:01,600 so here, I'll just grab this. 24 00:01:01,600 --> 00:01:02,333 This is true. 25 00:01:02,333 --> 00:01:04,250 I was actually going to start writing the code 26 00:01:04,250 --> 00:01:06,710 to do the quadrilateral, but I didn't have enough time. 27 00:01:06,710 --> 00:01:08,390 You can see me starting. 28 00:01:08,390 --> 00:01:11,630 But here's the 0.5 for the triangle, which was easy. 29 00:01:11,630 --> 00:01:13,690 So what's the story with Julia in this class? 30 00:01:13,690 --> 00:01:15,440 Have they used it a little, or a lot, or-- 31 00:01:15,440 --> 00:01:16,010 AUDIENCE: In the labs. 32 00:01:16,010 --> 00:01:17,270 ALAN EDELMAN: In the labs you've used Julia. 33 00:01:17,270 --> 00:01:18,500 But that was just MATLAB. 34 00:01:18,500 --> 00:01:21,800 So that was-- but OK. 35 00:01:24,350 --> 00:01:28,540 So Professor Strang showed this proof where he would-- 36 00:01:28,540 --> 00:01:30,370 he put down a 3 by 3 matrix. 37 00:01:30,370 --> 00:01:31,700 It had rank two. 38 00:01:31,700 --> 00:01:35,300 And he took the columns that were-- the first two 39 00:01:35,300 --> 00:01:36,380 columns were independent. 40 00:01:36,380 --> 00:01:41,270 And it was easy to show the row rank equals the column rank. 41 00:01:41,270 --> 00:01:43,430 After Professor Strang went out, I asked, 42 00:01:43,430 --> 00:01:47,210 would that work for the zero matrix? 43 00:01:47,210 --> 00:01:49,550 So here's the zero matrix. 44 00:01:49,550 --> 00:01:51,680 And since I'm not really telling you the proof, 45 00:01:51,680 --> 00:01:57,680 I'll just say, if I were to make a matrix of the literally 46 00:01:57,680 --> 00:02:02,030 independent columns of this matrix, what would I do? 47 00:02:02,030 --> 00:02:04,613 So zero might seem tricky. 48 00:02:04,613 --> 00:02:05,780 It's not really that tricky. 49 00:02:05,780 --> 00:02:10,490 But this is what I did the moment you walked out. 50 00:02:10,490 --> 00:02:14,480 So yeah, so I've got the 3 by 3 matrix, and I need to make a-- 51 00:02:14,480 --> 00:02:17,300 first step in whatever the proof was, 52 00:02:17,300 --> 00:02:21,020 I needed to take the columns, the literally 53 00:02:21,020 --> 00:02:23,210 independent columns of this matrix, 54 00:02:23,210 --> 00:02:26,630 and place them in a matrix of their own. 55 00:02:26,630 --> 00:02:28,403 What would I do? 56 00:02:28,403 --> 00:02:29,570 It would be an empty matrix. 57 00:02:29,570 --> 00:02:33,450 What would be the size of this empty matrix? 58 00:02:33,450 --> 00:02:35,190 Not exactly zero by zero. 59 00:02:35,190 --> 00:02:40,700 Because where every column is still in our 3, you see. 60 00:02:40,700 --> 00:02:41,790 So the right answer-- 61 00:02:41,790 --> 00:02:43,470 I hope this makes sense-- 62 00:02:43,470 --> 00:02:46,210 is a 3 by 0 empty matrix. 63 00:02:46,210 --> 00:02:49,170 And that's a concept that exists in MATLAB, and in Julia, 64 00:02:49,170 --> 00:02:51,410 and in Python, I think, I'm sure, 65 00:02:51,410 --> 00:02:52,740 and in any computer language. 66 00:02:52,740 --> 00:02:56,160 So if you had a full rank 3 by 3 matrix, 67 00:02:56,160 --> 00:02:58,200 the linear independent columns would be 3 by 3. 68 00:02:58,200 --> 00:03:00,660 If you had a rank two, it would be 3 by 2. 69 00:03:00,660 --> 00:03:03,330 If I had a rank one matrix, it would be 3 by 1. 70 00:03:03,330 --> 00:03:06,660 So if I had no columns, 3 by 0 makes sense. 71 00:03:06,660 --> 00:03:09,830 And to finish the proof, again, I'm not telling these 72 00:03:09,830 --> 00:03:11,880 students-- it's in the notes apparently-- 73 00:03:11,880 --> 00:03:14,145 the next matrix would be random 0, 3. 74 00:03:14,145 --> 00:03:16,020 And of course you multiply it and you get a 3 75 00:03:16,020 --> 00:03:17,550 by 3 matrix of zeros. 76 00:03:17,550 --> 00:03:19,410 So it's fun to see that that proof still 77 00:03:19,410 --> 00:03:22,230 works, even for the zero matrix, without any real edge. 78 00:03:22,230 --> 00:03:23,655 So that was just today. 79 00:03:27,610 --> 00:03:29,790 The other thing is let me-- 80 00:03:29,790 --> 00:03:34,920 can I say a word or two about recent stuff about Julia? 81 00:03:34,920 --> 00:03:38,010 So I started to put together a talk. 82 00:03:38,010 --> 00:03:41,280 It's not really ready yet, but I'll share it with you anyway. 83 00:03:41,280 --> 00:03:44,460 So Google did the Julia world a big favor last week. 84 00:03:44,460 --> 00:03:45,930 I mean, this is huge. 85 00:03:45,930 --> 00:03:48,930 So you all know machine learning is hot. 86 00:03:48,930 --> 00:03:51,510 That's probably why you're here in this class. 87 00:03:51,510 --> 00:03:54,600 I probably don't have to tell you. 88 00:03:54,600 --> 00:03:58,860 And yet I wouldn't be surprised if a number of you 89 00:03:58,860 --> 00:04:01,710 wished this whole class was in Python or something, or maybe 90 00:04:01,710 --> 00:04:02,640 MATLAB or something. 91 00:04:02,640 --> 00:04:04,380 I don't doubt that some of you might 92 00:04:04,380 --> 00:04:08,130 have wanted that to happen. 93 00:04:08,130 --> 00:04:14,580 And we get sort of bombarded with, you know, why not Python. 94 00:04:14,580 --> 00:04:15,900 Not so much MATLAB anymore. 95 00:04:15,900 --> 00:04:19,170 But you know, why not Python is sort of the issue that comes up 96 00:04:19,170 --> 00:04:20,910 a lot. 97 00:04:20,910 --> 00:04:23,793 And I could talk till the cows come home, 98 00:04:23,793 --> 00:04:24,960 but nobody would believe me. 99 00:04:24,960 --> 00:04:28,405 But Google came out last week and said 100 00:04:28,405 --> 00:04:30,030 that when it comes to machine learning, 101 00:04:30,030 --> 00:04:32,990 there really are two languages that are powerful enough 102 00:04:32,990 --> 00:04:35,610 to do machine learning-- 103 00:04:35,610 --> 00:04:38,350 to do machine learning kinds of things that you want to do. 104 00:04:38,350 --> 00:04:41,310 And in some sense, the rest of today's lecture 105 00:04:41,310 --> 00:04:45,360 that I'm going to give will be maybe illustrations of this. 106 00:04:45,360 --> 00:04:49,170 But if you want, you can go and look at their blog here. 107 00:04:49,170 --> 00:04:51,717 What they do is they basically sort of start the race 108 00:04:51,717 --> 00:04:53,550 with a whole bunch of programming languages. 109 00:04:53,550 --> 00:04:54,180 There's Python. 110 00:04:54,180 --> 00:04:57,380 There's R. Java, JavaScript. 111 00:04:57,380 --> 00:05:00,288 They sort of look at all of these languages. 112 00:05:00,288 --> 00:05:01,830 And if you read the blog, you'll see. 113 00:05:01,830 --> 00:05:04,590 But we're going to filter them out on technical merits. 114 00:05:04,590 --> 00:05:06,810 And right away, a lot of them disappear, 115 00:05:06,810 --> 00:05:09,300 including Python and Java. 116 00:05:09,300 --> 00:05:11,430 And if you go to the blog, you'll 117 00:05:11,430 --> 00:05:14,255 see they spend a great deal of time on the Python story. 118 00:05:14,255 --> 00:05:15,630 Because they know that people are 119 00:05:15,630 --> 00:05:17,040 going to want to hear that one. 120 00:05:17,040 --> 00:05:18,540 I mean, people want to be convinced. 121 00:05:18,540 --> 00:05:20,940 And so there's actually multiple screens 122 00:05:20,940 --> 00:05:23,250 full on the reason why Python is just not good 123 00:05:23,250 --> 00:05:25,170 enough for machine learning. 124 00:05:25,170 --> 00:05:28,160 So they leave four languages left-- 125 00:05:28,160 --> 00:05:30,570 Julia, Swift, C++, and Rust. 126 00:05:30,570 --> 00:05:36,120 And then if you go to the next part of the blog, 127 00:05:36,120 --> 00:05:37,560 they filter on usability. 128 00:05:37,560 --> 00:05:40,680 And then two more sort of bite the dust. 129 00:05:40,680 --> 00:05:42,930 So C++ and Rust disappeared. 130 00:05:42,930 --> 00:05:46,290 And then they go on to say that these are the only two 131 00:05:46,290 --> 00:05:49,600 languages they feel are appropriate for machine 132 00:05:49,600 --> 00:05:50,100 learning. 133 00:05:50,100 --> 00:05:54,390 And they put this nice quote that it 134 00:05:54,390 --> 00:05:55,680 shares many common values. 135 00:05:55,680 --> 00:05:58,530 And they actually go on about what machine learning really 136 00:05:58,530 --> 00:05:59,070 needs. 137 00:05:59,070 --> 00:06:01,680 And I'd recommend you look at it. 138 00:06:01,680 --> 00:06:03,270 And then finally, of course, they're 139 00:06:03,270 --> 00:06:07,260 going to push Swift, which they should. 140 00:06:07,260 --> 00:06:09,510 So they had somewhere-- blah, blah-- about more people 141 00:06:09,510 --> 00:06:10,283 are using Swift. 142 00:06:10,283 --> 00:06:10,950 Maybe it's true. 143 00:06:10,950 --> 00:06:13,290 I don't know. 144 00:06:13,290 --> 00:06:17,900 So they really said is they're more familiar with Swift 145 00:06:17,900 --> 00:06:19,990 than Julia, which, you know, if I was speaking, 146 00:06:19,990 --> 00:06:21,990 I'd say I'm more familiar with Julia than Swift. 147 00:06:21,990 --> 00:06:23,460 So maybe it's fair. 148 00:06:23,460 --> 00:06:26,108 And then I started to put a little cartoon 149 00:06:26,108 --> 00:06:27,900 on the psychology of programming languages, 150 00:06:27,900 --> 00:06:29,850 just because it's sort of something 151 00:06:29,850 --> 00:06:31,860 that I bump into with all the time. 152 00:06:31,860 --> 00:06:34,440 People always say all languages are equally good. 153 00:06:34,440 --> 00:06:36,670 It doesn't really matter. 154 00:06:36,670 --> 00:06:39,900 But the truth is if you mention a language that you're not 155 00:06:39,900 --> 00:06:42,900 using yet, you're going to tune it out, 156 00:06:42,900 --> 00:06:44,850 at least until Google comes along. 157 00:06:44,850 --> 00:06:46,320 So that that's where we are. 158 00:06:46,320 --> 00:06:48,148 OK, enough about-- 159 00:06:48,148 --> 00:06:49,190 I just put this together. 160 00:06:49,190 --> 00:06:51,930 I was testing it out on you. 161 00:06:51,930 --> 00:06:54,000 All right. 162 00:06:54,000 --> 00:06:57,880 So now let me do two more mathematical things. 163 00:06:57,880 --> 00:07:00,660 So the first thing I want to do is talk to you 164 00:07:00,660 --> 00:07:02,692 about forward mode automatic differentiation. 165 00:07:02,692 --> 00:07:05,275 So have you done any automatic differentiation in this course? 166 00:07:05,275 --> 00:07:05,540 AUDIENCE: Very little. 167 00:07:05,540 --> 00:07:07,540 ALAN EDELMAN: OK, so I think this is pretty fun. 168 00:07:07,540 --> 00:07:09,180 I hope you'll like it. 169 00:07:09,180 --> 00:07:11,527 I have a notebook in Julia on forward mode 170 00:07:11,527 --> 00:07:12,610 automatic differentiation. 171 00:07:12,610 --> 00:07:17,100 And this notebook came together because I 172 00:07:17,100 --> 00:07:20,010 was trying to understand what the big deal was 173 00:07:20,010 --> 00:07:21,290 for a long time. 174 00:07:21,290 --> 00:07:22,560 And I had a little trouble. 175 00:07:22,560 --> 00:07:25,530 I mean, it's the usual story where on a line-by-line level, 176 00:07:25,530 --> 00:07:27,000 it's easy to understand. 177 00:07:27,000 --> 00:07:28,620 But what's the big deal part? 178 00:07:28,620 --> 00:07:31,110 That's sometimes the harder thing to grasp. 179 00:07:31,110 --> 00:07:33,470 And the first notebook I'm going to show you 180 00:07:33,470 --> 00:07:36,560 is sort of the result of my trying 181 00:07:36,560 --> 00:07:40,190 to grasp what was the real picture here. 182 00:07:40,190 --> 00:07:41,480 And the second thing-- 183 00:07:41,480 --> 00:07:42,530 I think I'll just do it on the blackboard. 184 00:07:42,530 --> 00:07:45,170 It's not even really ready yet, but I'll sort of unleash it 185 00:07:45,170 --> 00:07:46,460 on you folks anyway-- 186 00:07:46,460 --> 00:07:50,600 is to show you how to do a particular example of backward 187 00:07:50,600 --> 00:07:52,878 mode automatic differentiation, the 188 00:07:52,878 --> 00:07:54,170 that you see in the neural net. 189 00:07:54,170 --> 00:07:56,287 And I guess you have seen some neural nets here? 190 00:07:56,287 --> 00:07:56,870 AUDIENCE: Yep. 191 00:07:56,870 --> 00:07:59,810 ALAN EDELMAN So I think by now everybody's seen neural nets. 192 00:07:59,810 --> 00:08:02,288 I think two years from now, it'll be in high schools. 193 00:08:02,288 --> 00:08:04,580 And three years from there, it will be in kindergarten. 194 00:08:04,580 --> 00:08:05,990 I don't know. 195 00:08:05,990 --> 00:08:07,700 Neural nets seem to be sort of-- 196 00:08:07,700 --> 00:08:09,430 they're not that hard to understand. 197 00:08:09,430 --> 00:08:13,550 OK, so let me start things off. 198 00:08:13,550 --> 00:08:18,050 And really, the two things that I'd love to convince you of 199 00:08:18,050 --> 00:08:19,100 is-- 200 00:08:19,100 --> 00:08:22,170 let me just find-- here's my auto diff thing. 201 00:08:22,170 --> 00:08:24,260 The two things that I really want to convince you 202 00:08:24,260 --> 00:08:28,190 of-- and maybe you already believe some of this-- 203 00:08:28,190 --> 00:08:32,480 is one, that-- well, maybe you don't believe this yet-- 204 00:08:32,480 --> 00:08:36,110 but that the language matters in a mathematical sense. 205 00:08:38,614 --> 00:08:41,750 The right computer language can do more for you 206 00:08:41,750 --> 00:08:46,113 than just take some algorithm on a blackboard and implement it. 207 00:08:46,113 --> 00:08:47,030 It could do much more. 208 00:08:47,030 --> 00:08:51,470 And this is something that I hope to give a few examples of. 209 00:08:51,470 --> 00:08:54,470 And the other thing that I bet you all believe now, 210 00:08:54,470 --> 00:08:57,590 because you've been in this class, 211 00:08:57,590 --> 00:09:00,240 is that linear algebra is the basis for everything. 212 00:09:00,240 --> 00:09:02,450 Every course should start with linear algebra. 213 00:09:02,450 --> 00:09:07,220 I mean, to me, it feels like a unfortunate accident of history 214 00:09:07,220 --> 00:09:11,160 that linear algebra came too late for too many reasons. 215 00:09:11,160 --> 00:09:14,630 And so very often, things that would be better done 216 00:09:14,630 --> 00:09:16,490 with linear algebra are not. 217 00:09:16,490 --> 00:09:20,390 And I mean, to me, it feels like doing physics without calculus. 218 00:09:20,390 --> 00:09:21,410 I just don't get it. 219 00:09:21,410 --> 00:09:22,577 I know high schools do that. 220 00:09:22,577 --> 00:09:23,960 But it just seems wrong. 221 00:09:23,960 --> 00:09:26,710 To me, all of engineering, all of-- it 222 00:09:26,710 --> 00:09:27,930 should all be linear algebra. 223 00:09:27,930 --> 00:09:30,230 I mean, I just believe that-- almost all. 224 00:09:30,230 --> 00:09:31,250 Maybe not all. 225 00:09:31,250 --> 00:09:32,250 But quite a lot. 226 00:09:32,250 --> 00:09:35,130 More than most people realize, I would say. 227 00:09:35,130 --> 00:09:40,022 OK, so let me start with automatic differentiation. 228 00:09:40,022 --> 00:09:41,480 So I'm going to start by this story 229 00:09:41,480 --> 00:09:44,340 by telling you that I would go to conferences. 230 00:09:44,340 --> 00:09:46,490 I would go to numerical analysis conferences. 231 00:09:46,490 --> 00:09:48,980 I would hear people talk about automatic differentiation. 232 00:09:48,980 --> 00:09:49,700 I'm going to be honest. 233 00:09:49,700 --> 00:09:50,450 I was reading my email. 234 00:09:50,450 --> 00:09:51,117 I was tuned out. 235 00:09:51,117 --> 00:09:52,190 Like, who cares about-- 236 00:09:52,190 --> 00:09:53,690 I know calculus. 237 00:09:53,690 --> 00:09:56,135 You could teach a computer to do it. 238 00:09:56,135 --> 00:09:57,260 It seems pretty easy to me. 239 00:09:57,260 --> 00:09:59,190 I mean, I'm sure there are technical details. 240 00:09:59,190 --> 00:10:02,060 But it didn't seem that interesting to teach a computer 241 00:10:02,060 --> 00:10:03,312 to differentiate. 242 00:10:03,312 --> 00:10:05,270 I sort of figured that it was the same calculus 243 00:10:05,270 --> 00:10:07,860 that I learned when I took a calculus class. 244 00:10:07,860 --> 00:10:09,530 You know, you memorize this table. 245 00:10:09,530 --> 00:10:10,940 You teach it to a computer. 246 00:10:10,940 --> 00:10:13,053 You learn the chain rule, and the product rule, 247 00:10:13,053 --> 00:10:13,970 and the quotient rule. 248 00:10:13,970 --> 00:10:16,430 And bump, the computer is doing just what I 249 00:10:16,430 --> 00:10:18,320 would do with paper and pencil. 250 00:10:18,320 --> 00:10:19,150 So big deal. 251 00:10:19,150 --> 00:10:20,750 I didn't pay attention. 252 00:10:20,750 --> 00:10:23,360 And in any event, there was this little neuron 253 00:10:23,360 --> 00:10:25,760 in the back of my brain that said, hey, maybe I'm wrong. 254 00:10:25,760 --> 00:10:27,260 Maybe it's doing finite differences, 255 00:10:27,260 --> 00:10:29,840 you know, the sort of thing where 256 00:10:29,840 --> 00:10:32,240 you take the dy by the dx. 257 00:10:32,240 --> 00:10:35,240 In some numerical way, you do the finite differences. 258 00:10:35,240 --> 00:10:37,400 And in numerical analysis, they're 259 00:10:37,400 --> 00:10:40,210 supposed to tell you if h is too big, you get truncation error. 260 00:10:40,210 --> 00:10:42,252 If you have h too small, you get round off error. 261 00:10:42,252 --> 00:10:44,990 And the truth is nobody ever tells you what's a good h. 262 00:10:44,990 --> 00:10:46,820 But you go to a numerical analysis class 263 00:10:46,820 --> 00:10:48,810 hoping somebody would tell you. 264 00:10:48,810 --> 00:10:50,310 But in any event, so I thought maybe 265 00:10:50,310 --> 00:10:54,200 it was that kind of a numerical finite difference. 266 00:10:54,200 --> 00:10:56,720 And I think the big surprise for me 267 00:10:56,720 --> 00:10:59,930 was that automatic differentiation was neither 268 00:10:59,930 --> 00:11:02,360 the first nor the second thing, that there's actually 269 00:11:02,360 --> 00:11:04,910 a third thing, something different, that's neither 270 00:11:04,910 --> 00:11:06,020 the first or the second. 271 00:11:06,020 --> 00:11:08,330 And I found that fascinating. 272 00:11:08,330 --> 00:11:11,960 And maybe I'll even tell you how it hit me in the head 273 00:11:11,960 --> 00:11:12,980 that this was the story. 274 00:11:12,980 --> 00:11:15,500 Because I really wasn't paying attention. 275 00:11:15,500 --> 00:11:19,190 But I love the singular value decomposition. 276 00:11:19,190 --> 00:11:22,820 I'm glad to see that people are drawing parabolas and quarter 277 00:11:22,820 --> 00:11:26,840 circles and figuring out what the minimum SVD value is. 278 00:11:26,840 --> 00:11:29,400 The singular value is just-- 279 00:11:29,400 --> 00:11:31,240 it's just God's gift to mankind. 280 00:11:31,240 --> 00:11:34,880 It's just a good factorization. 281 00:11:34,880 --> 00:11:38,900 One of the things I was playing with with Julia 282 00:11:38,900 --> 00:11:43,460 was to calculate the Jacobian matrix for the SVD. 283 00:11:43,460 --> 00:11:46,630 So you know, all matrix factorizations 284 00:11:46,630 --> 00:11:48,230 are just changes of variables. 285 00:11:48,230 --> 00:11:52,760 So if you have a square matrix, n by n, the SVD-- 286 00:11:52,760 --> 00:11:56,570 I'm sure you know this-- is the U matrix is really n times n 287 00:11:56,570 --> 00:11:57,630 mass 1 over 2 variable. 288 00:11:57,630 --> 00:12:00,505 So is the V. And the sigma has got n variables. 289 00:12:00,505 --> 00:12:02,130 Put it all together, you got n squared. 290 00:12:02,130 --> 00:12:03,890 So it's just a change of variables. 291 00:12:03,890 --> 00:12:05,720 And every time you change variables, 292 00:12:05,720 --> 00:12:11,020 you can form that big matrix, n squared by n squared of dy dx, 293 00:12:11,020 --> 00:12:13,820 compute its determinant, and get an answer. 294 00:12:13,820 --> 00:12:15,330 And I wanted to know-- 295 00:12:15,330 --> 00:12:17,330 I actually knew the theoretical answer for that. 296 00:12:17,330 --> 00:12:19,670 And I wanted to see a computer confirm 297 00:12:19,670 --> 00:12:21,290 that theoretical answer. 298 00:12:21,290 --> 00:12:24,870 And I spoke to some people who wrote auto diff 299 00:12:24,870 --> 00:12:26,773 in non-Julia languages. 300 00:12:26,773 --> 00:12:28,190 And I was surprised by the answer. 301 00:12:28,190 --> 00:12:29,900 They said, oh, yeah. 302 00:12:29,900 --> 00:12:31,883 We could teach the answer to our system. 303 00:12:31,883 --> 00:12:34,050 I said, what do you mean you could teach the answer? 304 00:12:34,050 --> 00:12:35,240 Why doesn't it just compute the answer? 305 00:12:35,240 --> 00:12:36,970 Why do you have to teach the answer? 306 00:12:36,970 --> 00:12:38,275 I thought that was all wrong. 307 00:12:38,275 --> 00:12:40,150 Because in Julia, we didn't have to teach it. 308 00:12:40,150 --> 00:12:42,837 It would actually calculate it. 309 00:12:42,837 --> 00:12:44,920 And then I started to understand a little bit more 310 00:12:44,920 --> 00:12:47,540 about what auto diff was doing and what Julia was doing. 311 00:12:47,540 --> 00:12:49,760 And so this is how this notebook came to be. 312 00:12:49,760 --> 00:12:50,980 So let me start-- 313 00:12:50,980 --> 00:12:52,150 I'm saying too much. 314 00:12:52,150 --> 00:12:55,870 Let me start with an example that might kind of hit home. 315 00:12:55,870 --> 00:12:57,880 So I'm going to compute the square root 316 00:12:57,880 --> 00:12:59,537 of x, a real simple example. 317 00:12:59,537 --> 00:13:01,120 You know, a square root's pretty easy. 318 00:13:01,120 --> 00:13:03,287 I'm going to take one of the oldest algorithms known 319 00:13:03,287 --> 00:13:05,920 to mankind, the Babylonian square root algorithm. 320 00:13:05,920 --> 00:13:08,230 It says start with a starting guess t. 321 00:13:08,230 --> 00:13:10,720 Maybe it's a little bit too low for the square root of x. 322 00:13:10,720 --> 00:13:11,590 Get x over t. 323 00:13:11,590 --> 00:13:13,270 So that would be too large. 324 00:13:13,270 --> 00:13:15,080 Go ahead and take the average, and repeat. 325 00:13:15,080 --> 00:13:15,580 OK. 326 00:13:15,580 --> 00:13:17,205 This is equivalent to a Newton's method 327 00:13:17,205 --> 00:13:18,980 for taking the square root. 328 00:13:18,980 --> 00:13:21,950 And it's been known for millennia to mankind. 329 00:13:21,950 --> 00:13:26,770 So it's not the latest research, by any means, 330 00:13:26,770 --> 00:13:28,870 for computing square roots. 331 00:13:28,870 --> 00:13:31,030 But it works very effectively. 332 00:13:31,030 --> 00:13:33,610 And here's a little Julia code that 333 00:13:33,610 --> 00:13:36,850 actually will implement it. 334 00:13:36,850 --> 00:13:39,050 It probably looks like code in any language. 335 00:13:39,050 --> 00:13:41,350 So I'm going to start off at 1. 336 00:13:41,350 --> 00:13:43,540 So literally, I'm just going to take 337 00:13:43,540 --> 00:13:47,320 1 plus the starting value of x and divide by 2. 338 00:13:47,320 --> 00:13:48,830 And then I'm going to repeat. 339 00:13:48,830 --> 00:13:49,330 OK? 340 00:13:49,330 --> 00:13:53,200 And we can check that the algorithm works. 341 00:13:53,200 --> 00:13:54,520 Here's alpha is pi. 342 00:13:54,520 --> 00:13:56,770 And so I'll take the Babylonian algorithm 343 00:13:56,770 --> 00:13:58,240 and compare it to Julia's built in. 344 00:13:58,240 --> 00:14:01,305 And you see it gives the right answer. 345 00:14:01,305 --> 00:14:02,930 Here it is with the square root of two. 346 00:14:02,930 --> 00:14:05,680 It's always good to check your code works. 347 00:14:05,680 --> 00:14:08,130 OK? 348 00:14:08,130 --> 00:14:10,180 I like to see things graphically, 349 00:14:10,180 --> 00:14:14,530 so I ran the algorithm for lots of values of x. 350 00:14:14,530 --> 00:14:16,740 And I love doing this. 351 00:14:16,740 --> 00:14:19,940 I kind of wish that in the previous talk-- 352 00:14:19,940 --> 00:14:21,813 if I'd only worked fast enough, I 353 00:14:21,813 --> 00:14:23,230 wanted to build a little GUI where 354 00:14:23,230 --> 00:14:25,352 I can move the points in front of your eyes. 355 00:14:25,352 --> 00:14:26,560 Maybe you have one in MATLAB. 356 00:14:26,560 --> 00:14:27,678 I bet you do. 357 00:14:27,678 --> 00:14:28,720 But I wanted to build it. 358 00:14:28,720 --> 00:14:30,520 But I didn't get there fast enough. 359 00:14:30,520 --> 00:14:32,650 But here this is the sort of thing. 360 00:14:32,650 --> 00:14:34,260 And I like to see the convergence. 361 00:14:34,260 --> 00:14:36,270 And so you could see the digits converging, 362 00:14:36,270 --> 00:14:38,140 the parabola on the bottom. 363 00:14:38,140 --> 00:14:39,880 The block is the square root, of course. 364 00:14:39,880 --> 00:14:41,290 So there it is. 365 00:14:41,290 --> 00:14:43,810 There's the Babylonian algorithm. 366 00:14:43,810 --> 00:14:47,020 I would like to get the derivative of square root. 367 00:14:47,020 --> 00:14:49,600 But the rules of the game are I'm not going 368 00:14:49,600 --> 00:14:51,460 to type method 1 or method 2. 369 00:14:51,460 --> 00:14:53,650 I'm not going to do-- you'll never see me type 1/2 x 370 00:14:53,650 --> 00:14:54,360 to the minus 1/2. 371 00:14:54,360 --> 00:14:54,860 Right? 372 00:14:54,860 --> 00:14:56,440 You all know that's the derivative. 373 00:14:56,440 --> 00:14:57,500 I will not type that. 374 00:14:57,500 --> 00:14:58,000 I will not. 375 00:14:58,000 --> 00:14:59,910 It's not going to come anywhere from Julia. 376 00:14:59,910 --> 00:15:00,220 OK. 377 00:15:00,220 --> 00:15:01,762 And the second thing is I'm not going 378 00:15:01,762 --> 00:15:02,860 to do a finite difference. 379 00:15:02,860 --> 00:15:03,370 All right? 380 00:15:03,370 --> 00:15:04,787 I'm going to get that square root, 381 00:15:04,787 --> 00:15:07,420 but not by sort of either of the two things 382 00:15:07,420 --> 00:15:09,260 that I'm sure you would think of. 383 00:15:09,260 --> 00:15:10,610 Right? 384 00:15:10,610 --> 00:15:11,860 Here's how I'm going to do it. 385 00:15:11,860 --> 00:15:13,935 And I'm going to do a little bit of Julia code. 386 00:15:13,935 --> 00:15:15,310 There'll be eight lines of Julia. 387 00:15:15,310 --> 00:15:17,518 But I'm not going to completely say how it works yet. 388 00:15:17,518 --> 00:15:20,200 I'll keep you in suspense for maybe about five minutes. 389 00:15:20,200 --> 00:15:22,280 And then I'll tell you how it works. 390 00:15:22,280 --> 00:15:22,780 All right? 391 00:15:22,780 --> 00:15:25,900 So here's eight lines of Julia code that will get me 392 00:15:25,900 --> 00:15:26,900 the square root. 393 00:15:26,900 --> 00:15:30,760 So in these three lines, I'm going to create a Julia type. 394 00:15:30,760 --> 00:15:34,660 I'm going to call it a D for a dual number, which 395 00:15:34,660 --> 00:15:39,310 is a name that goes back at least a century, maybe more. 396 00:15:39,310 --> 00:15:41,440 So I'm going to create a D type. 397 00:15:41,440 --> 00:15:44,890 And all this is is a pair of floats. 398 00:15:44,890 --> 00:15:47,590 So it's a tuple with a pair of floats. 399 00:15:47,590 --> 00:15:50,620 It's going to be some sort of numerical function 400 00:15:50,620 --> 00:15:52,210 and derivative pair. 401 00:15:52,210 --> 00:15:56,860 So three of my eight lines is to create a D. In Julia language, 402 00:15:56,860 --> 00:15:58,572 this means to use a subtype of a number, 403 00:15:58,572 --> 00:16:00,280 so we're going to treat it like a number. 404 00:16:00,280 --> 00:16:00,400 Right? 405 00:16:00,400 --> 00:16:01,900 We want to be able to add, multiply, 406 00:16:01,900 --> 00:16:04,450 and divide these ordered pairs. 407 00:16:04,450 --> 00:16:06,130 But it's just a pair of numbers. 408 00:16:06,130 --> 00:16:07,870 Don't let the Julia scare you. 409 00:16:07,870 --> 00:16:10,110 It's just a function derivative numerical pair. 410 00:16:10,110 --> 00:16:10,720 OK? 411 00:16:10,720 --> 00:16:12,310 And what's these other five lines? 412 00:16:12,310 --> 00:16:15,460 Well, I want to teach it the sum rule and the quotient rule. 413 00:16:15,460 --> 00:16:18,400 So you all remember the same rule. 414 00:16:18,400 --> 00:16:19,660 I guess that's the easy one. 415 00:16:19,660 --> 00:16:21,280 The quotient rule-- 416 00:16:21,280 --> 00:16:23,920 I still have my teacher from high school ringing 417 00:16:23,920 --> 00:16:24,940 in the back of my ear. 418 00:16:24,940 --> 00:16:26,450 The denominator times the derivative of the numerator 419 00:16:26,450 --> 00:16:27,520 and minus the numerator-- you all 420 00:16:27,520 --> 00:16:29,250 have that jingle in your brain, too? 421 00:16:29,250 --> 00:16:30,100 I bet you do. 422 00:16:30,100 --> 00:16:31,860 divided by the denominator squared. 423 00:16:31,860 --> 00:16:33,235 Can't even get it out of my head. 424 00:16:35,670 --> 00:16:38,090 So there's the quotient rule. 425 00:16:38,090 --> 00:16:40,880 And so what are we doing in these five lines? 426 00:16:40,880 --> 00:16:45,230 Well, first of all, I want to overlook plus and divide 427 00:16:45,230 --> 00:16:46,290 and a few other things. 428 00:16:46,290 --> 00:16:49,273 And Julia wants me to say, are you sure? 429 00:16:49,273 --> 00:16:50,690 So the way you say are you sure is 430 00:16:50,690 --> 00:16:52,820 that I'm going to import plus and divide. 431 00:16:52,820 --> 00:16:54,890 Because it would be dangerous to play with plus. 432 00:16:54,890 --> 00:16:57,613 So here I'm going to plus two dual numbers. 433 00:16:57,613 --> 00:16:59,780 We're going to add the function and the derivatives. 434 00:16:59,780 --> 00:17:00,885 Divide two dual numbers. 435 00:17:00,885 --> 00:17:03,260 We're going to divide the function values and denominator 436 00:17:03,260 --> 00:17:05,359 times the numerator, blah, blah, blah, you get it. 437 00:17:05,359 --> 00:17:06,730 OK. 438 00:17:06,730 --> 00:17:09,260 That's six of the eight lines. 439 00:17:09,260 --> 00:17:12,200 The seventh line is, if I have a dual number, 440 00:17:12,200 --> 00:17:13,849 I wanted to convert it. 441 00:17:13,849 --> 00:17:16,950 You know how the wheels are embedded in the complexes? 442 00:17:16,950 --> 00:17:19,910 We have to tell Julia to take the dual number 443 00:17:19,910 --> 00:17:21,160 and stick a zero in. 444 00:17:21,160 --> 00:17:23,480 And then dual numbers and regular numbers 445 00:17:23,480 --> 00:17:25,490 can play nicely together. 446 00:17:25,490 --> 00:17:27,260 And this actually is the thing that 447 00:17:27,260 --> 00:17:30,080 actually says, if I have a dual number and a number 448 00:17:30,080 --> 00:17:33,230 in operation, promote them so they'll work as dual numbers-- 449 00:17:33,230 --> 00:17:35,850 so eight lines of code. 450 00:17:35,850 --> 00:17:38,420 So the first thing I'm going to tell you is I'm 451 00:17:38,420 --> 00:17:42,280 going to remind you I never typed 1/2 x to the minus 1/2. 452 00:17:42,280 --> 00:17:43,190 Do you agree? 453 00:17:43,190 --> 00:17:45,045 No one-- I'm not importing any packages. 454 00:17:45,045 --> 00:17:46,670 It's not like it's coming in from the-- 455 00:17:46,670 --> 00:17:48,380 I'm not sneaking it in from the side. 456 00:17:48,380 --> 00:17:50,600 There's no one half x to the minus 1/2. 457 00:17:50,600 --> 00:17:56,120 And there's certainly not any numerical derivatives, 458 00:17:56,120 --> 00:17:58,183 either, right? 459 00:17:58,183 --> 00:17:59,600 Arguably, a rule that almost feels 460 00:17:59,600 --> 00:18:01,683 symbolic, the quotient rule and the addition rule. 461 00:18:01,683 --> 00:18:05,350 But no numerical finite differences at all here. 462 00:18:05,350 --> 00:18:06,930 OK. 463 00:18:06,930 --> 00:18:09,860 So first of all, let me show you here 464 00:18:09,860 --> 00:18:16,490 that I'm applying the Babylonian algorithm without rewriting 465 00:18:16,490 --> 00:18:18,470 code to a dual number now. 466 00:18:18,470 --> 00:18:19,890 Before we applied it to numbers. 467 00:18:19,890 --> 00:18:21,890 But now I'm going to play it to this dual number 468 00:18:21,890 --> 00:18:22,820 that I just invented. 469 00:18:22,820 --> 00:18:26,320 I'm going to apply it at 49, 1, because I know the answer. 470 00:18:26,320 --> 00:18:28,010 And then I'm going to compare it with-- 471 00:18:28,010 --> 00:18:29,593 I'm taking one half of the square root 472 00:18:29,593 --> 00:18:31,820 of x just for comparison purposes 473 00:18:31,820 --> 00:18:33,530 and not in my own algorithm. 474 00:18:33,530 --> 00:18:36,290 And of course, you see that I'm getting magically 475 00:18:36,290 --> 00:18:38,493 the right answer without ever-- 476 00:18:38,493 --> 00:18:40,160 so you should wonder, how did I do that? 477 00:18:40,160 --> 00:18:41,503 How did I get the derivative? 478 00:18:41,503 --> 00:18:42,920 We could take any number you like. 479 00:18:42,920 --> 00:18:44,510 Here's 100. 480 00:18:44,510 --> 00:18:47,870 If you prefer to see a number like pi, we can do that. 481 00:18:47,870 --> 00:18:50,600 I mean, we can do whatever you like. 482 00:18:50,600 --> 00:18:51,540 It's going to work. 483 00:18:51,540 --> 00:18:53,870 So there you see this is the square root of pi. 484 00:18:53,870 --> 00:18:56,850 And this would be 1/2 over the square root of pi numerically. 485 00:18:56,850 --> 00:19:00,500 So when you see it matches these numbers to enough digits, 486 00:19:00,500 --> 00:19:02,010 in fact, all the digits, actually. 487 00:19:02,010 --> 00:19:02,510 Yeah. 488 00:19:02,510 --> 00:19:04,400 So the thing magically worked. 489 00:19:04,400 --> 00:19:07,870 You should all be wondering, how did that happen? 490 00:19:07,870 --> 00:19:09,117 I didn't rewrite any code. 491 00:19:09,117 --> 00:19:11,450 I actually wrote a code to just compute the square root. 492 00:19:11,450 --> 00:19:15,350 I never wrote a code to compute the root of a square root. 493 00:19:15,350 --> 00:19:18,250 And by the way, this is a little bit of the Julia magic 494 00:19:18,250 --> 00:19:20,120 that we're pushing numerically. 495 00:19:20,120 --> 00:19:23,743 That very often in this world, people 496 00:19:23,743 --> 00:19:25,160 will write a code to do something, 497 00:19:25,160 --> 00:19:26,910 and then if you want to do something more, 498 00:19:26,910 --> 00:19:29,540 like get a derivative, somebody writes another code. 499 00:19:29,540 --> 00:19:32,090 With Julia, very often, you can actually 500 00:19:32,090 --> 00:19:33,320 keep to the original code. 501 00:19:33,320 --> 00:19:35,990 And if you just use it properly and intelligently, 502 00:19:35,990 --> 00:19:38,330 you can do magic things without writing new codes. 503 00:19:38,330 --> 00:19:40,400 And you'll see this again in a little bit. 504 00:19:40,400 --> 00:19:43,280 But here's the derivative of-- 505 00:19:43,280 --> 00:19:45,740 this is the plot of 1/2 over the square root of x in black. 506 00:19:45,740 --> 00:19:49,050 And again, you could see the convergence over here. 507 00:19:49,050 --> 00:19:49,550 All right. 508 00:19:49,550 --> 00:19:52,280 Well, I'm still not going to show you why it works just yet. 509 00:19:52,280 --> 00:19:55,770 I promise I will in just probably a few minutes more. 510 00:19:55,770 --> 00:19:59,990 But what I will do first is I'd like to show you something 511 00:19:59,990 --> 00:20:01,490 that most people will never look at. 512 00:20:01,490 --> 00:20:02,910 I never look at it. 513 00:20:02,910 --> 00:20:05,630 I want to show you-- here's the same Babylonian code. 514 00:20:05,630 --> 00:20:10,220 I want to show you the assembler for the computation 515 00:20:10,220 --> 00:20:10,980 of the derivative. 516 00:20:10,980 --> 00:20:14,900 So I'm going to run Babylonian on a dual number. 517 00:20:14,900 --> 00:20:17,360 And we're going to look here. 518 00:20:17,360 --> 00:20:20,210 And I don't know if anybody here reads assembler. 519 00:20:20,210 --> 00:20:22,652 I'm betting there is zero or one of you actually 520 00:20:22,652 --> 00:20:23,360 reads this stuff. 521 00:20:23,360 --> 00:20:25,980 How many of you read assembler? 522 00:20:25,980 --> 00:20:27,370 OK. 523 00:20:27,370 --> 00:20:28,200 It wasn't 0, 1. 524 00:20:28,200 --> 00:20:28,830 We had a half. 525 00:20:28,830 --> 00:20:29,820 Right there's half. 526 00:20:29,820 --> 00:20:31,030 He's kind of going like this. 527 00:20:31,030 --> 00:20:31,530 Here's zero. 528 00:20:31,530 --> 00:20:32,030 Here's one. 529 00:20:32,030 --> 00:20:33,020 He's like this. 530 00:20:33,020 --> 00:20:33,520 OK. 531 00:20:33,520 --> 00:20:36,300 So I think 0, 1 is like the record. 532 00:20:36,300 --> 00:20:39,900 But I'll bet you'll believe me if I tell you that, when 533 00:20:39,900 --> 00:20:43,020 you have short assembler like this and it's not very long, 534 00:20:43,020 --> 00:20:44,460 then you have efficient code. 535 00:20:44,460 --> 00:20:45,190 It's very tight. 536 00:20:45,190 --> 00:20:46,530 It will run very fast. 537 00:20:46,530 --> 00:20:49,340 So whatever this thing is doing, it's short. 538 00:20:49,340 --> 00:20:52,023 And this you won't get from any other language. 539 00:20:52,023 --> 00:20:53,940 If you did try to do the same thing in Python, 540 00:20:53,940 --> 00:20:55,680 I promise you there would be screens 541 00:20:55,680 --> 00:20:57,630 and screens and screens full of stuff, 542 00:20:57,630 --> 00:21:00,330 even if you could get it. 543 00:21:00,330 --> 00:21:04,470 So here's the Babylonian algorithm on the dual number. 544 00:21:04,470 --> 00:21:07,343 And here it is in assembler, and it's short. 545 00:21:07,343 --> 00:21:08,760 So the other thing that I'm saying 546 00:21:08,760 --> 00:21:12,120 is not only does it work, but Julia also makes it efficient. 547 00:21:12,120 --> 00:21:15,090 So before I finally tell you what's really going on 548 00:21:15,090 --> 00:21:17,760 and why it works, I'm going to grab 549 00:21:17,760 --> 00:21:22,060 a Python symbolic package, which will work nicely with Julia. 550 00:21:22,060 --> 00:21:27,640 And I'm going to run the same code through the Python 551 00:21:27,640 --> 00:21:30,220 symbolic and show you what-- 552 00:21:30,220 --> 00:21:31,990 these are the iterations that you get. 553 00:21:31,990 --> 00:21:34,420 So you actually see the iterations 554 00:21:34,420 --> 00:21:35,790 towards the square root. 555 00:21:35,790 --> 00:21:37,957 And here are the iterations of the derivative that's 556 00:21:37,957 --> 00:21:39,580 actually being calculated. 557 00:21:39,580 --> 00:21:43,360 And the key point here is, of course, this 558 00:21:43,360 --> 00:21:44,560 is a symbolic computation. 559 00:21:44,560 --> 00:21:46,420 We're not doing a symbolic computation. 560 00:21:46,420 --> 00:21:49,460 This is mathematically equivalent to the function 561 00:21:49,460 --> 00:21:52,310 we would get if we were to, like, plot it or something. 562 00:21:52,310 --> 00:21:54,610 But of course, symbolic computation 563 00:21:54,610 --> 00:21:55,570 is very inefficient. 564 00:21:55,570 --> 00:21:57,220 I mean, you get these big coefficients. 565 00:21:57,220 --> 00:21:58,387 I mean, look at this number. 566 00:21:58,387 --> 00:21:59,140 What is this? 567 00:21:59,140 --> 00:22:01,160 5 million or something? 568 00:22:01,160 --> 00:22:02,950 Anyway, you get these big numbers, 569 00:22:02,950 --> 00:22:04,350 these even bigger numbers here. 570 00:22:04,350 --> 00:22:06,490 Look at these huge numbers, right? 571 00:22:06,490 --> 00:22:09,970 It takes a lot of storage dragging these x's along. 572 00:22:09,970 --> 00:22:11,380 There's a big drag on memory. 573 00:22:11,380 --> 00:22:12,940 I mean, this is not the way that-- 574 00:22:12,940 --> 00:22:15,320 this is why we do numerical computation. 575 00:22:15,320 --> 00:22:18,610 But the Babylonian algorithm, in the absence of any round off, 576 00:22:18,610 --> 00:22:21,350 is equivalent to computing-- 577 00:22:21,350 --> 00:22:23,920 above the line, it's computing the square root here. 578 00:22:23,920 --> 00:22:27,070 And then below here, these are the iterates 579 00:22:27,070 --> 00:22:28,180 towards the derivative. 580 00:22:28,180 --> 00:22:32,440 So it's not actually calculating 1/2 x to the minus 1/2. 581 00:22:32,440 --> 00:22:34,270 It's actually doing something iterative 582 00:22:34,270 --> 00:22:37,700 that is approximating 1/2 x to the minus 1/2. 583 00:22:37,700 --> 00:22:38,200 All right. 584 00:22:38,200 --> 00:22:39,310 Well, let me tell you now. 585 00:22:39,310 --> 00:22:41,990 Let me sort of reveal what's going on, 586 00:22:41,990 --> 00:22:43,823 just so that I can kind of show you 587 00:22:43,823 --> 00:22:44,990 how it's getting the answer. 588 00:22:44,990 --> 00:22:48,792 And like I said, it was the SVD that sort of convinced me 589 00:22:48,792 --> 00:22:49,750 how this was happening. 590 00:22:49,750 --> 00:22:51,708 Because the SVD is also an iterative algorithm, 591 00:22:51,708 --> 00:22:53,250 like this Babylonian square root. 592 00:22:53,250 --> 00:22:55,625 But it's easier to show you the point with the Babylonian 593 00:22:55,625 --> 00:22:56,330 square root. 594 00:22:56,330 --> 00:22:59,440 So I'm going to do something that I would never want to do, 595 00:22:59,440 --> 00:23:02,320 which is explicitly write a derivative Babylonian 596 00:23:02,320 --> 00:23:03,350 algorithm. 597 00:23:03,350 --> 00:23:05,050 And what I'm doing is I'm going to take 598 00:23:05,050 --> 00:23:07,780 the derivative in respect to x of every line on my code. 599 00:23:07,780 --> 00:23:10,083 So if every even or odd line-- 600 00:23:10,083 --> 00:23:11,750 I never know what's even or odd anymore. 601 00:23:11,750 --> 00:23:14,788 But the original line of code had 1 plus x over 2. 602 00:23:14,788 --> 00:23:16,330 Now I'm going to take the derivative. 603 00:23:16,330 --> 00:23:17,980 I'll get a half. 604 00:23:17,980 --> 00:23:20,140 Here I had this line of code. 605 00:23:20,140 --> 00:23:24,340 If I take the derivative I'll, use the quotient rule, 606 00:23:24,340 --> 00:23:26,530 and this would be the derivative. 607 00:23:26,530 --> 00:23:30,160 If I run this code, what I'm effectively doing 608 00:23:30,160 --> 00:23:33,280 is I'm just using good old plus and times and divide, 609 00:23:33,280 --> 00:23:34,750 nothing fancy. 610 00:23:34,750 --> 00:23:36,460 There's not a square root to be seen. 611 00:23:36,460 --> 00:23:39,460 But what I'm doing is, as I run my algorithm, 612 00:23:39,460 --> 00:23:41,320 I'm also running-- 613 00:23:41,320 --> 00:23:44,508 I'm actually computing the derivative as I go. 614 00:23:44,508 --> 00:23:46,300 So if I have this infinite algorithm that's 615 00:23:46,300 --> 00:23:48,480 going to converge to the square roots, 616 00:23:48,480 --> 00:23:50,980 the derivative algorithm will converge to the derivative 617 00:23:50,980 --> 00:23:52,760 of the square roots. 618 00:23:52,760 --> 00:23:56,590 But I'm not using anything other than plus, minus, times, 619 00:23:56,590 --> 00:23:58,880 and divide to make that happen. 620 00:23:58,880 --> 00:24:00,910 So if you rewrite any code at all, 621 00:24:00,910 --> 00:24:02,740 you could have any code-- iterative, 622 00:24:02,740 --> 00:24:04,108 finite, it doesn't matter. 623 00:24:04,108 --> 00:24:05,650 If you just take the derivatives back 624 00:24:05,650 --> 00:24:08,180 to your variable of every line of your code, 625 00:24:08,180 --> 00:24:10,360 then you can get a derivative out. 626 00:24:10,360 --> 00:24:12,370 And as I said, it's not a symbolic derivative, 627 00:24:12,370 --> 00:24:14,920 like, you know, all of 18.01, or whatever, 628 00:24:14,920 --> 00:24:16,810 wherever we teach calculus these days. 629 00:24:16,810 --> 00:24:18,310 And it's not a numerical derivative 630 00:24:18,310 --> 00:24:22,650 like in the numerical courses, the 18.3, axyz's, whatever. 631 00:24:22,650 --> 00:24:23,690 It's a different beast. 632 00:24:23,690 --> 00:24:29,050 It's using the quotient rule and the addition rule 633 00:24:29,050 --> 00:24:31,315 at every step of the way to get the answer. 634 00:24:34,060 --> 00:24:35,680 Here's this dBabylonian algorithm. 635 00:24:35,680 --> 00:24:36,877 You could see it running. 636 00:24:36,877 --> 00:24:37,960 It gives the right answer. 637 00:24:37,960 --> 00:24:41,140 Oop, I have to execute the code first to get the right answer. 638 00:24:41,140 --> 00:24:43,960 But if you see, it gives the right answer. 639 00:24:43,960 --> 00:24:47,800 Oh, I was just in Istanbul and they challenged me to do sine. 640 00:24:47,800 --> 00:24:48,730 I forget about that. 641 00:24:48,730 --> 00:24:50,128 It's still in my notebook. 642 00:24:50,128 --> 00:24:51,420 I did it in front of everybody. 643 00:24:51,420 --> 00:24:52,150 It worked. 644 00:24:52,150 --> 00:24:52,980 I got a cosine. 645 00:24:52,980 --> 00:24:53,480 OK. 646 00:24:53,480 --> 00:24:55,720 But let me pass all of that. 647 00:24:58,820 --> 00:25:02,907 So let me go back and tell you then how is this all working. 648 00:25:02,907 --> 00:25:04,490 Well, what's happening-- let's go back 649 00:25:04,490 --> 00:25:06,590 to the eight lines of code, and now, maybe, you 650 00:25:06,590 --> 00:25:09,650 can see what's happening. 651 00:25:09,650 --> 00:25:12,318 Where's my eight lines of code from the very beginning? 652 00:25:12,318 --> 00:25:13,610 And I've got to watch the time. 653 00:25:13,610 --> 00:25:15,485 I want to show you this one other thing, too. 654 00:25:15,485 --> 00:25:18,300 So hopefully, I'll have enough time to do that. 655 00:25:18,300 --> 00:25:20,335 But here, let's see. 656 00:25:20,335 --> 00:25:21,710 Where are my eight lines of code? 657 00:25:21,710 --> 00:25:24,470 Where are they? 658 00:25:24,470 --> 00:25:25,180 Here we go. 659 00:25:25,180 --> 00:25:26,640 Here are the eight lines of code. 660 00:25:26,640 --> 00:25:31,070 So what I'm doing is, instead of rewriting all your code 661 00:25:31,070 --> 00:25:34,160 by taking the derivative of every line the human way, 662 00:25:34,160 --> 00:25:36,410 I'm saying that why can't the software just 663 00:25:36,410 --> 00:25:37,820 do this in some automatic way? 664 00:25:37,820 --> 00:25:40,302 And this is where the automatic differentiation comes in. 665 00:25:40,302 --> 00:25:42,260 And in the old, old days, when people-- and all 666 00:25:42,260 --> 00:25:44,010 the numerical code was in Fortran, 667 00:25:44,010 --> 00:25:46,340 there would be the source to source translators 668 00:25:46,340 --> 00:25:52,290 that would actually input code and output derivatives of code. 669 00:25:52,290 --> 00:25:54,560 The Julia way, the more modern way, 670 00:25:54,560 --> 00:25:57,230 is to let the git compiler kind of do that for you. 671 00:25:57,230 --> 00:25:59,720 So here, I needed plus and divide. 672 00:25:59,720 --> 00:26:02,870 Of course, I would want to add minus and times. 673 00:26:02,870 --> 00:26:07,670 But you just add a couple of things and then bump, 674 00:26:07,670 --> 00:26:10,280 you don't have to rewrite the dBabylonian. 675 00:26:10,280 --> 00:26:12,380 Because the Babylonian, with this type, 676 00:26:12,380 --> 00:26:14,880 will just do the work for you. 677 00:26:14,880 --> 00:26:15,380 OK? 678 00:26:15,380 --> 00:26:17,708 And that's where the magic of a good piece of software 679 00:26:17,708 --> 00:26:18,250 will have it. 680 00:26:18,250 --> 00:26:20,030 So you don't have to write a translator. 681 00:26:20,030 --> 00:26:23,180 You don't have to hand write it. 682 00:26:23,180 --> 00:26:25,840 You just give the rules and you let the computer do it. 683 00:26:25,840 --> 00:26:26,340 Right? 684 00:26:26,340 --> 00:26:28,610 And that's what computers are supposed to be good at. 685 00:26:28,610 --> 00:26:30,510 So that's what's happening. 686 00:26:30,510 --> 00:26:31,010 All right. 687 00:26:31,010 --> 00:26:35,630 So that's forward mode automatic differentiation. 688 00:26:35,630 --> 00:26:37,410 I've got 10 minutes to go backwards. 689 00:26:37,410 --> 00:26:39,035 But let me see if there's any-- anybody 690 00:26:39,035 --> 00:26:40,798 have any questions about this? 691 00:26:40,798 --> 00:26:41,840 It's really magic, right? 692 00:26:41,840 --> 00:26:43,470 But it's pretty wonderful magic. 693 00:26:43,470 --> 00:26:47,630 And I don't know what you've heard about machine learning, 694 00:26:47,630 --> 00:26:51,710 but to be honest, machine learning these days, 695 00:26:51,710 --> 00:26:56,540 it's forgetting about whether humans will be useless, which 696 00:26:56,540 --> 00:26:57,830 I don't believe by the way. 697 00:26:57,830 --> 00:27:00,380 But the big thing about machine learning 698 00:27:00,380 --> 00:27:02,950 is that it's really just a big optimization. 699 00:27:02,950 --> 00:27:03,950 That's all it is, right? 700 00:27:03,950 --> 00:27:07,427 One big minimum maximum problem where you've all 701 00:27:07,427 --> 00:27:09,260 known from calculus that what you need to do 702 00:27:09,260 --> 00:27:10,190 is take derivatives. 703 00:27:10,190 --> 00:27:12,245 You know, set them to zero, right? 704 00:27:12,245 --> 00:27:14,120 In the case of multivariate, it's a gradient. 705 00:27:14,120 --> 00:27:15,290 You set it to zero. 706 00:27:15,290 --> 00:27:18,680 And so really all of this machine learning, 707 00:27:18,680 --> 00:27:21,080 all the big stories and everything in the end 708 00:27:21,080 --> 00:27:23,190 comes down to automatic differentiation. 709 00:27:23,190 --> 00:27:25,510 It's sort of like the workhorse of the whole thing. 710 00:27:25,510 --> 00:27:27,980 And so if we could have a language that gives you 711 00:27:27,980 --> 00:27:31,070 that workhorse in a good way, then machine learning really 712 00:27:31,070 --> 00:27:32,300 sort of benefits from that. 713 00:27:32,300 --> 00:27:36,260 So I hope you all see the big picture of machine learning. 714 00:27:36,260 --> 00:27:38,580 It really does come down to taking derivatives. 715 00:27:38,580 --> 00:27:43,340 That's the end-- that's how you optimize. 716 00:27:43,340 --> 00:27:44,247 Any quick questions? 717 00:27:44,247 --> 00:27:45,830 Otherwise, I'm going to switch topics, 718 00:27:45,830 --> 00:27:47,520 and I'm going to move to the blackboard. 719 00:27:47,520 --> 00:27:48,640 Yeah? 720 00:27:48,640 --> 00:27:50,940 AUDIENCE: Does the same thing happen for second order 721 00:27:50,940 --> 00:27:52,027 derivatives as well? 722 00:27:52,027 --> 00:27:54,110 ALAN EDELMAN: There is a trick that basically lets 723 00:27:54,110 --> 00:27:55,360 you go to higher orders, yeah. 724 00:27:55,360 --> 00:27:58,940 You can basically make it a combo of two first order 725 00:27:58,940 --> 00:28:00,200 derivatives. 726 00:28:00,200 --> 00:28:01,870 So yeah, it can be done. 727 00:28:01,870 --> 00:28:02,870 Did you have a question? 728 00:28:02,870 --> 00:28:03,495 AUDIENCE: Yeah. 729 00:28:03,495 --> 00:28:06,258 Is this notation of [INAUDIBLE],, and is this 730 00:28:06,258 --> 00:28:09,453 only really used for computing different orders of derivatives 731 00:28:09,453 --> 00:28:10,620 or are there other examples? 732 00:28:10,620 --> 00:28:12,270 ALAN EDELMAN: Well, for using types? 733 00:28:12,270 --> 00:28:14,054 AUDIENCE: Or specifically, I guess, 734 00:28:14,054 --> 00:28:16,262 the way that you did through this whole presentation, 735 00:28:16,262 --> 00:28:19,760 just this generalized other-- 736 00:28:19,760 --> 00:28:23,050 ALAN EDELMAN: So it's the biggest trick in the world. 737 00:28:23,050 --> 00:28:24,950 It's not this little thing. 738 00:28:24,950 --> 00:28:28,340 The idea of making a type to do what you-- 739 00:28:28,340 --> 00:28:31,456 I mean, did you see Kronecker products in this class? 740 00:28:31,456 --> 00:28:32,250 AUDIENCE: No. 741 00:28:32,250 --> 00:28:33,007 ALAN EDELMAN: No? 742 00:28:33,007 --> 00:28:34,140 AUDIENCE: [INAUDIBLE]. 743 00:28:34,140 --> 00:28:35,730 ALAN EDELMAN: OK. 744 00:28:35,730 --> 00:28:36,230 Let me see. 745 00:28:36,230 --> 00:28:39,070 What would you have seen in this? 746 00:28:39,070 --> 00:28:42,980 Did you see tridiagonal matrices, your favorite? 747 00:28:42,980 --> 00:28:43,480 OK. 748 00:28:43,480 --> 00:28:44,260 So here. 749 00:28:44,260 --> 00:28:46,730 So here's a built in type. 750 00:28:46,730 --> 00:28:48,110 Let's say n is-- 751 00:28:48,110 --> 00:28:50,010 oh, n doesn't have to be 4. 752 00:28:50,010 --> 00:28:52,840 I'm going to create a strang matrix, 753 00:28:52,840 --> 00:28:55,360 if I could spell it right. 754 00:28:55,360 --> 00:28:57,850 And it's going to be a SymTridiagonal, which 755 00:28:57,850 --> 00:29:00,580 is a Julia type. 756 00:29:00,580 --> 00:29:06,670 And we will create two times ones of n and minus 757 00:29:06,670 --> 00:29:09,760 ones of n minus 1. 758 00:29:09,760 --> 00:29:11,118 Here's a type. 759 00:29:11,118 --> 00:29:12,160 I mean, this is built in. 760 00:29:12,160 --> 00:29:15,820 But you could have created it yourself just as easily. 761 00:29:15,820 --> 00:29:18,430 And I don't like calling this-- 762 00:29:18,430 --> 00:29:19,990 it's certainly not a dense matrix. 763 00:29:19,990 --> 00:29:22,000 And I don't like calling it a sparse matrix. 764 00:29:22,000 --> 00:29:24,780 I prefer to call it a structured matrix. 765 00:29:24,780 --> 00:29:27,370 Though the word sparse, it's a little tricky here. 766 00:29:27,370 --> 00:29:30,790 But the reason why I don't like to call this a sparse matrix 767 00:29:30,790 --> 00:29:34,307 is because we're not storing indices in any-- 768 00:29:34,307 --> 00:29:36,640 I mean, there a lot of fancy schemes for storing indices 769 00:29:36,640 --> 00:29:38,350 for sparse matrices. 770 00:29:38,350 --> 00:29:41,110 Well, all we store is a diagonal vector. 771 00:29:41,110 --> 00:29:43,180 There's the 2s on the diagonal. 772 00:29:43,180 --> 00:29:45,490 There's this 4 vector with four twos. 773 00:29:45,490 --> 00:29:48,537 And here's a three vector for the off diagonal. 774 00:29:48,537 --> 00:29:50,620 And you know, you don't have it twice, by the way. 775 00:29:50,620 --> 00:29:56,620 Most sparse matrix structures would have the minus vector 776 00:29:56,620 --> 00:29:59,140 twice, the super and the sub. 777 00:29:59,140 --> 00:30:02,050 But really, only the core information that's needed 778 00:30:02,050 --> 00:30:03,400 is stored. 779 00:30:03,400 --> 00:30:09,430 And in a way, one uses types in Julia to basically-- 780 00:30:09,430 --> 00:30:11,740 you only store what you need, not more. 781 00:30:11,740 --> 00:30:14,870 And then you define your operations to work. 782 00:30:14,870 --> 00:30:20,810 So for example, if I were to take a strang inverse times, 783 00:30:20,810 --> 00:30:24,340 oh, anything, times a random 4. 784 00:30:24,340 --> 00:30:25,980 I'm going to do a linear solve. 785 00:30:25,980 --> 00:30:28,330 You would want to use a special [INAUDIBLE] that 786 00:30:28,330 --> 00:30:31,150 knew that the matrix was a symmetric tridiagonal. 787 00:30:31,150 --> 00:30:35,770 So it's a big story of being able to create types and use 788 00:30:35,770 --> 00:30:40,240 them for your own purposes without any wastage. 789 00:30:40,240 --> 00:30:43,420 And this is the sort of thing that while you can do it 790 00:30:43,420 --> 00:30:46,660 in languages like Python, in MATLAB, if you were 791 00:30:46,660 --> 00:30:49,810 able the assembler-- and MATLAB would never let you, Python, 792 00:30:49,810 --> 00:30:52,480 you just would regret it-- but you 793 00:30:52,480 --> 00:30:56,217 would see just how much overhead there is in doing this. 794 00:30:56,217 --> 00:30:57,800 So there would be no performance gain. 795 00:30:57,800 --> 00:31:01,030 But in a way, this is what you want to do. 796 00:31:01,030 --> 00:31:04,270 You want to use these things to match the mathematics. 797 00:31:04,270 --> 00:31:07,150 And so that's really the nice thing to be able to do. 798 00:31:07,150 --> 00:31:07,650 All right. 799 00:31:07,650 --> 00:31:08,500 I only have five minutes. 800 00:31:08,500 --> 00:31:10,292 I don't know if I'm going to pull this off. 801 00:31:10,292 --> 00:31:11,950 But let me see if I could give you 802 00:31:11,950 --> 00:31:15,460 the main idea in five minutes of over immersed mode 803 00:31:15,460 --> 00:31:16,450 differentiations. 804 00:31:16,450 --> 00:31:20,140 But here, as long as you are familiar with neural networks, 805 00:31:20,140 --> 00:31:22,690 let me see if I can do this very quickly. 806 00:31:22,690 --> 00:31:24,960 I'm going to start with scalars. 807 00:31:24,960 --> 00:31:25,460 OK? 808 00:31:25,460 --> 00:31:27,460 I'm going to do a neural network of all scalars. 809 00:31:27,460 --> 00:31:29,087 But only for simplicity, for starters, 810 00:31:29,087 --> 00:31:31,420 but I think you're going to see that this can generalize 811 00:31:31,420 --> 00:31:34,790 to vectors and matrices, which are real neural networks. 812 00:31:34,790 --> 00:31:37,660 So what I'm going to do is I want to imagine 813 00:31:37,660 --> 00:31:41,290 that we have our inputs. 814 00:31:41,290 --> 00:31:43,600 We'll have a bunch of scalar weights and biases. 815 00:31:43,600 --> 00:31:49,610 So here's W1, and I'll go up to wn and bn. 816 00:31:49,610 --> 00:31:50,110 All right? 817 00:31:50,110 --> 00:31:53,500 So we have a bunch of weights and biases here. 818 00:31:53,500 --> 00:31:54,100 OK? 819 00:31:54,100 --> 00:32:00,670 And we'll also have an x1, which will sort of start off 820 00:32:00,670 --> 00:32:02,230 our neural network. 821 00:32:02,230 --> 00:32:03,520 And we're going to compute-- 822 00:32:03,520 --> 00:32:05,740 I'll write it in sort of Julia-like or MATLAB-like 823 00:32:05,740 --> 00:32:10,660 notation, for i equals 1 through n. 824 00:32:10,660 --> 00:32:19,150 I will update x by taking some function of my current input, 825 00:32:19,150 --> 00:32:20,870 maybe something like this. 826 00:32:20,870 --> 00:32:22,660 And what function h to use? 827 00:32:22,660 --> 00:32:24,730 I don't really care too much. 828 00:32:24,730 --> 00:32:26,380 In the old days, people used to talk 829 00:32:26,380 --> 00:32:29,110 about the sigmoid function. 830 00:32:29,110 --> 00:32:33,310 Nowadays, it's the maximum of 0 and t 831 00:32:33,310 --> 00:32:35,890 that gets used all the time. 832 00:32:35,890 --> 00:32:38,770 It's got this ridiculous name RELU, 833 00:32:38,770 --> 00:32:41,140 which I really can't stand. 834 00:32:41,140 --> 00:32:43,840 But anyway, the rectified linear unit. 835 00:32:43,840 --> 00:32:46,100 But in any event, I mean, it's just 836 00:32:46,100 --> 00:32:50,570 the function that's t of t is greater than or equal to 0. 837 00:32:50,570 --> 00:32:51,670 0, if not. 838 00:32:56,350 --> 00:32:59,890 But whatever function you like. 839 00:32:59,890 --> 00:33:01,520 And here I'm just going to update. 840 00:33:01,520 --> 00:33:02,020 OK. 841 00:33:02,020 --> 00:33:05,567 And then ultimately, you might also have some data y. 842 00:33:05,567 --> 00:33:07,150 And you would like to, if everything's 843 00:33:07,150 --> 00:33:08,525 a scalar, like I said, this could 844 00:33:08,525 --> 00:33:10,180 be generalized pretty quickly. 845 00:33:10,180 --> 00:33:14,530 But what we can do is we can minimize, say, 846 00:33:14,530 --> 00:33:18,283 1/2 y minus xm squared. 847 00:33:18,283 --> 00:33:20,200 And you're going to want to find the data that 848 00:33:20,200 --> 00:33:21,610 would minimize that. 849 00:33:21,610 --> 00:33:24,250 All this generalizes to matrices and vectors, which 850 00:33:24,250 --> 00:33:27,110 is what most neural nets do. 851 00:33:27,110 --> 00:33:27,610 OK? 852 00:33:27,610 --> 00:33:31,240 And since I'm not going to have a lot of time, 853 00:33:31,240 --> 00:33:34,000 maybe I can just sort of cut to the chase. 854 00:33:34,000 --> 00:33:37,140 If I were to differentiate the key line here, 855 00:33:37,140 --> 00:33:39,170 I got a little bit of Julia here. 856 00:33:39,170 --> 00:33:41,540 But if I were to differentiate the key line, what would 857 00:33:41,540 --> 00:33:42,040 I write? 858 00:33:42,040 --> 00:33:44,050 I would write-- well, here, actually, 859 00:33:44,050 --> 00:33:46,310 let me use the usual notation. 860 00:33:46,310 --> 00:33:52,490 Let me have delta I be the h prime of wxi plus bi. 861 00:33:52,490 --> 00:33:52,990 OK? 862 00:33:52,990 --> 00:33:55,890 So that's delta i. 863 00:33:55,890 --> 00:34:01,330 And then you can see that the dxi plus 1 is delta i. 864 00:34:01,330 --> 00:34:09,310 And I'll have dwi xi plus dxi wi plus dbi would 865 00:34:09,310 --> 00:34:11,889 be the differential. 866 00:34:11,889 --> 00:34:13,440 This would be how-- 867 00:34:13,440 --> 00:34:15,760 so I'm almost done, that's the good news. 868 00:34:15,760 --> 00:34:17,600 So if I make a little change-- 869 00:34:17,600 --> 00:34:20,320 I like to think of this as, like, 0.001 changes. 870 00:34:20,320 --> 00:34:21,830 I don't like infinitesimals. 871 00:34:21,830 --> 00:34:22,545 I like 0.001. 872 00:34:22,545 --> 00:34:23,670 That's how I think of them. 873 00:34:23,670 --> 00:34:25,480 But you make a little change here, a little change here, 874 00:34:25,480 --> 00:34:26,512 a little change here. 875 00:34:26,512 --> 00:34:27,429 You get a change here. 876 00:34:30,780 --> 00:34:33,580 You'll get this linear this linear function 877 00:34:33,580 --> 00:34:37,730 of the perturbations here gives you perturbations here. 878 00:34:37,730 --> 00:34:38,230 OK? 879 00:34:38,230 --> 00:34:39,530 Well, I've only got one minute. 880 00:34:39,530 --> 00:34:41,139 So I'm going to write all this out with linear algebra, 881 00:34:41,139 --> 00:34:43,014 because everything is better when written out 882 00:34:43,014 --> 00:34:44,080 with linear algebra. 883 00:34:44,080 --> 00:34:47,920 So I'm going to write down that-- 884 00:34:47,920 --> 00:34:50,860 I'm going to write down that I'm actually 885 00:34:50,860 --> 00:34:53,800 interested in the last element. 886 00:34:53,800 --> 00:34:57,500 But dx dn plus 1 is going to equal 887 00:34:57,500 --> 00:34:59,500 and I'm going to have a couple of matrices here. 888 00:34:59,500 --> 00:35:03,190 Let me just sort of get the structure right. 889 00:35:03,190 --> 00:35:06,370 This will dx2, dxn plus 1 again. 890 00:35:06,370 --> 00:35:08,590 Sorry for the squishing. 891 00:35:08,590 --> 00:35:14,860 But here-- in fact, I'd like to use block matrices 892 00:35:14,860 --> 00:35:15,820 a little bit. 893 00:35:15,820 --> 00:35:19,400 So here I'm going to have dw1 db1. 894 00:35:19,400 --> 00:35:22,650 I'm going to put the bias together-- sorry for the mess. 895 00:35:22,650 --> 00:35:24,370 But dwn dbn. 896 00:35:24,370 --> 00:35:27,755 And Julia lets you make block matrices. 897 00:35:27,755 --> 00:35:29,380 And you can actually use them directly. 898 00:35:29,380 --> 00:35:31,680 There'd be a special type right there. 899 00:35:31,680 --> 00:35:32,180 OK? 900 00:35:32,180 --> 00:35:33,730 And then what goes here you could actually 901 00:35:33,730 --> 00:35:34,605 see what it would be. 902 00:35:34,605 --> 00:35:35,160 It would be-- 903 00:35:35,160 --> 00:35:36,243 I hope I'm doing it right. 904 00:35:36,243 --> 00:35:40,110 But there'd be a delta 1x1 and a delta NxN 905 00:35:40,110 --> 00:35:42,230 And this would be a diagonal matrix. 906 00:35:42,230 --> 00:35:42,730 OK? 907 00:35:42,730 --> 00:35:44,420 And then what do I have over here? 908 00:35:44,420 --> 00:35:48,990 Here I'd have the delta w's. 909 00:35:48,990 --> 00:35:51,337 And if you check you'll see that this will be-- 910 00:35:51,337 --> 00:35:53,920 I'm not going to get the indices right, and I don't have time. 911 00:35:53,920 --> 00:35:55,600 So I'm just going to write it like this. 912 00:35:55,600 --> 00:35:57,017 And now I'm just going to give you 913 00:35:57,017 --> 00:35:59,350 the end of the story, because I've run out of time. 914 00:35:59,350 --> 00:36:01,090 You could write all this as dx is 915 00:36:01,090 --> 00:36:03,670 equal to a diagonal matrix times the derivative 916 00:36:03,670 --> 00:36:07,660 of the parameters plus a lower triangle or matrix 917 00:36:07,660 --> 00:36:09,345 times the x again. 918 00:36:09,345 --> 00:36:10,720 And so if you want to solve this, 919 00:36:10,720 --> 00:36:13,360 linear algebra just does the propagation. 920 00:36:13,360 --> 00:36:20,650 You have I minus L dx is DdP or dx 921 00:36:20,650 --> 00:36:25,398 will be I minus L inverse DDP. 922 00:36:25,398 --> 00:36:27,190 And if I only want the last element-- let's 923 00:36:27,190 --> 00:36:31,060 say en is the vector that pulls out the last element, 924 00:36:31,060 --> 00:36:34,273 then this is all I'm going to need to get all my derivatives. 925 00:36:34,273 --> 00:36:35,690 And what's the moral of the story? 926 00:36:35,690 --> 00:36:37,390 I apologize for going one minute over. 927 00:36:37,390 --> 00:36:40,930 But the moral of the story is instead of back propagating 928 00:36:40,930 --> 00:36:44,770 through your own hard work, you probably 929 00:36:44,770 --> 00:36:47,830 know that when you solve a lower triangular matrix, 930 00:36:47,830 --> 00:36:50,560 people will read written code that back solves the lower 931 00:36:50,560 --> 00:36:51,640 triangular matrix. 932 00:36:51,640 --> 00:36:54,670 The back, the big back piece, has already 933 00:36:54,670 --> 00:36:55,750 been implemented for you. 934 00:36:55,750 --> 00:37:00,340 Why reinvent the wheel in if the back-- if linear algebra 935 00:37:00,340 --> 00:37:02,250 already has the back, you see? 936 00:37:02,250 --> 00:37:04,090 And so if you just do this, and you do it 937 00:37:04,090 --> 00:37:07,240 in a language that lets you get full performance, 938 00:37:07,240 --> 00:37:09,790 you don't need to do your own backpropagation. 939 00:37:09,790 --> 00:37:12,750 Because a simple backslash will do it for you. 940 00:37:12,750 --> 00:37:14,000 So I apologize for going over. 941 00:37:14,000 --> 00:37:16,950 I don't know if Professor Strang had some final words. 942 00:37:16,950 --> 00:37:20,370 But anyway, linear algebra is the secret to everything. 943 00:37:20,370 --> 00:37:22,582 That's the big message. 944 00:37:22,582 --> 00:37:24,010 AUDIENCE: OK. 945 00:37:24,010 --> 00:37:26,390 [APPLAUSE] 946 00:37:29,250 --> 00:37:34,770 GILBERT STRANG: Well, since it's our last two minutes, or minus 947 00:37:34,770 --> 00:37:39,330 two minutes of 18.065. 948 00:37:39,330 --> 00:37:40,900 I hope you guys enjoyed it. 949 00:37:40,900 --> 00:37:44,400 I certainly enjoyed it, as you could tell. 950 00:37:44,400 --> 00:37:47,580 Teaching this class, seeing how it would go, 951 00:37:47,580 --> 00:37:50,130 and writing about it. 952 00:37:50,130 --> 00:37:53,250 So I'll let you know as about the writing. 953 00:37:53,250 --> 00:37:55,170 And meanwhile, I'll get your writing 954 00:37:55,170 --> 00:37:58,950 on the projects, which I appreciate very much. 955 00:37:58,950 --> 00:38:01,950 And of course, grades are going to come out well. 956 00:38:01,950 --> 00:38:04,860 And I hope you've enjoyed it. 957 00:38:04,860 --> 00:38:05,968 So thank you all. 958 00:38:05,968 --> 00:38:06,510 You're right. 959 00:38:06,510 --> 00:38:07,110 Thanks. 960 00:38:07,110 --> 00:38:08,960 [APPLAUSE]