1 00:00:01,550 --> 00:00:03,920 The following content is provided under a Creative 2 00:00:03,920 --> 00:00:05,310 Commons license. 3 00:00:05,310 --> 00:00:07,520 Your support will help MIT Open CourseWare 4 00:00:07,520 --> 00:00:11,610 continue to offer high quality educational resources for free. 5 00:00:11,610 --> 00:00:14,180 To make a donation or to view additional materials 6 00:00:14,180 --> 00:00:18,140 from hundreds of MIT courses, visit MIT Open CourseWare 7 00:00:18,140 --> 00:00:19,026 at ocw.mit.edu. 8 00:00:21,738 --> 00:00:23,280 CHARLES E. LEISERSON: Hey, everybody. 9 00:00:23,280 --> 00:00:24,060 Let's get going. 10 00:00:27,180 --> 00:00:29,270 Who here has heard of the FFT? 11 00:00:32,390 --> 00:00:36,020 That's most of you. 12 00:00:36,020 --> 00:00:38,930 So I first met Steve Johnson when 13 00:00:38,930 --> 00:00:41,570 he worked with one of my graduate students, 14 00:00:41,570 --> 00:00:44,930 now former graduate student, Matteo Frigo. 15 00:00:44,930 --> 00:00:48,380 And they came up with a really spectacular piece 16 00:00:48,380 --> 00:00:52,440 of performance engineering for the FFT, 17 00:00:52,440 --> 00:00:56,570 a system they call FFTW for the Fastest Fourier 18 00:00:56,570 --> 00:00:58,760 Transform in the West. 19 00:00:58,760 --> 00:01:01,370 And it has, for over years and years 20 00:01:01,370 --> 00:01:05,750 been a staple of anybody doing signal processing will 21 00:01:05,750 --> 00:01:07,760 know FFTW. 22 00:01:07,760 --> 00:01:12,070 So anyway, it's a great pleasure to welcome Steve Johnson, who 23 00:01:12,070 --> 00:01:14,570 is going to talk about some of the work 24 00:01:14,570 --> 00:01:17,930 that he's been doing on dynamic languages, 25 00:01:17,930 --> 00:01:20,715 such as Julia and Python. 26 00:01:20,715 --> 00:01:21,590 STEVEN JOHNSON: Yeah. 27 00:01:21,590 --> 00:01:22,410 Thanks. 28 00:01:22,410 --> 00:01:23,330 CHARLES E. LEISERSON: Is that pretty actuate? 29 00:01:23,330 --> 00:01:24,340 STEVEN JOHNSON: Yeah. 30 00:01:24,340 --> 00:01:27,080 Yeah, so I'm going to talk, as I said, about high level dynamic 31 00:01:27,080 --> 00:01:29,300 languages and how you get performance in these. 32 00:01:29,300 --> 00:01:34,520 And so most of you have probably used Python, or R, and Matlab. 33 00:01:34,520 --> 00:01:36,590 And so these are really popular for people 34 00:01:36,590 --> 00:01:39,650 doing in technical computing, statistics, and anything 35 00:01:39,650 --> 00:01:42,272 where you want kind of interactive exploration. 36 00:01:42,272 --> 00:01:44,480 You'd like to have a dynamically typed language where 37 00:01:44,480 --> 00:01:46,310 you can just type x equals 3. 38 00:01:46,310 --> 00:01:48,890 And then three lines later, you said, oh, x is an array. 39 00:01:48,890 --> 00:01:50,640 Because you're doing things interactively. 40 00:01:50,640 --> 00:01:53,560 You don't have to be stuck with a particular set of types. 41 00:01:53,560 --> 00:01:56,610 And there's a lot of choices for these. 42 00:01:56,610 --> 00:01:58,640 But they usually hit a wall when it 43 00:01:58,640 --> 00:02:01,010 comes to writing performance critical code 44 00:02:01,010 --> 00:02:01,820 in these languages. 45 00:02:01,820 --> 00:02:05,720 And so traditionally, people doing some serious computing 46 00:02:05,720 --> 00:02:08,449 in these languages have a two-language solution. 47 00:02:08,449 --> 00:02:11,210 So they do high level exploration, 48 00:02:11,210 --> 00:02:14,240 and productivity and so forth in Python or whatever. 49 00:02:14,240 --> 00:02:17,270 But when they need to write performance critical code, 50 00:02:17,270 --> 00:02:19,760 then you drop down to a lower level language, Fortran, 51 00:02:19,760 --> 00:02:22,460 or C, or Cython, or one of these things. 52 00:02:22,460 --> 00:02:26,900 And you use Python as the glue for these low level kernels. 53 00:02:26,900 --> 00:02:30,010 And the problem-- and this is workable. 54 00:02:30,010 --> 00:02:31,010 I've done this myself. 55 00:02:31,010 --> 00:02:32,900 Many of you have probably done this. 56 00:02:32,900 --> 00:02:36,420 But when you drop down from Python to C, or even to Cython, 57 00:02:36,420 --> 00:02:38,450 there there's a huge discontinuous jump 58 00:02:38,450 --> 00:02:41,760 in the complexity of the coding. 59 00:02:41,760 --> 00:02:43,710 And there's usually a lot of generality. 60 00:02:43,710 --> 00:02:45,710 When you write code in C or something like that, 61 00:02:45,710 --> 00:02:48,648 it's specific to a very small set of types, 62 00:02:48,648 --> 00:02:50,690 whereas the nice thing about high level languages 63 00:02:50,690 --> 00:02:52,357 is you can write generic code that works 64 00:02:52,357 --> 00:02:55,590 for a lot of different types. 65 00:02:55,590 --> 00:02:58,410 So at this point, there's often someone who pops up and says, 66 00:02:58,410 --> 00:03:01,550 oh, well, I did performance programming in Python. 67 00:03:01,550 --> 00:03:04,610 And everyone knows you just need to vectorize your code, right? 68 00:03:04,610 --> 00:03:07,700 So basically, what they mean is you 69 00:03:07,700 --> 00:03:11,120 rely on mature external libraries 70 00:03:11,120 --> 00:03:12,680 that you pass on a big block of data. 71 00:03:12,680 --> 00:03:15,240 It does a huge amount of computation and comes back. 72 00:03:15,240 --> 00:03:17,330 And so you never write your own loops. 73 00:03:17,330 --> 00:03:19,920 And this is great. 74 00:03:19,920 --> 00:03:23,070 If there's someone who's already written the code that you need, 75 00:03:23,070 --> 00:03:25,520 you should try and leverage that as much as possible. 76 00:03:25,520 --> 00:03:28,580 But somebody has to write those. 77 00:03:28,580 --> 00:03:31,020 And eventually, that person will be you. 78 00:03:31,020 --> 00:03:34,800 And because eventually if you do scientific computing, 79 00:03:34,800 --> 00:03:37,490 you run into a problem inevitably that you just 80 00:03:37,490 --> 00:03:40,460 can't express in terms of existing libraries very 81 00:03:40,460 --> 00:03:44,060 easily or at all. 82 00:03:44,060 --> 00:03:47,110 So this was the state of affairs for a long time. 83 00:03:47,110 --> 00:03:51,830 And a few years ago, starting in Alan Edelman's group at MIT, 84 00:03:51,830 --> 00:03:54,950 there was a proposal for a new language called 85 00:03:54,950 --> 00:04:00,560 Julia, which tries to be as high level and interactive 86 00:04:00,560 --> 00:04:04,430 as-- it's a dynamically typed language, you know, as Matlab, 87 00:04:04,430 --> 00:04:06,320 or Python, and so forth. 88 00:04:06,320 --> 00:04:07,940 But general purpose language like 89 00:04:07,940 --> 00:04:10,130 Python, very productive for technical work, 90 00:04:10,130 --> 00:04:15,020 so really oriented towards scientific numerical computing. 91 00:04:15,020 --> 00:04:16,589 But you can write a loop, and you 92 00:04:16,589 --> 00:04:18,540 write low level code in that that's as fast 93 00:04:18,540 --> 00:04:20,779 as C. So that was the goal. 94 00:04:20,779 --> 00:04:23,810 The first release was in 2013. 95 00:04:23,810 --> 00:04:25,520 So it's a pretty young language. 96 00:04:25,520 --> 00:04:27,950 The 1.0 release was in August of this year. 97 00:04:27,950 --> 00:04:30,390 So before that point every year there 98 00:04:30,390 --> 00:04:33,530 was a new release, 0.1, 0.2, Point3. 99 00:04:33,530 --> 00:04:36,320 And every year, it would break all your old code, 100 00:04:36,320 --> 00:04:38,720 and you'd have to update everything to keep it working. 101 00:04:38,720 --> 00:04:40,370 So now they said, OK, it's stable. 102 00:04:40,370 --> 00:04:41,330 We'll add new features. 103 00:04:41,330 --> 00:04:42,620 We'll make it faster. 104 00:04:42,620 --> 00:04:45,710 But from this point onwards, for least 105 00:04:45,710 --> 00:04:48,510 until 2.0, many years in the future 106 00:04:48,510 --> 00:04:51,920 it will be backwards compatible. 107 00:04:51,920 --> 00:04:53,330 So there's lots of-- 108 00:04:53,330 --> 00:04:55,460 in my experience, this pretty much holds up. 109 00:04:55,460 --> 00:04:58,580 I haven't found any problem where 110 00:04:58,580 --> 00:05:02,180 there was a nice highly optimized C or Fortran code 111 00:05:02,180 --> 00:05:05,540 where I couldn't write equivalent 112 00:05:05,540 --> 00:05:09,620 code or equivalent performance, equivalently performing 113 00:05:09,620 --> 00:05:12,110 code in Julia given enough time, right? 114 00:05:12,110 --> 00:05:13,640 Obviously, if something is-- 115 00:05:13,640 --> 00:05:16,130 there's a library with 100,000 lines of code. 116 00:05:16,130 --> 00:05:17,810 It takes quite a long time to rewrite 117 00:05:17,810 --> 00:05:20,010 that in any other language. 118 00:05:20,010 --> 00:05:22,610 So there are lots of benchmarks that illustrate this. 119 00:05:22,610 --> 00:05:25,280 The goal of Julia is usually to stay within a factor of 2 of C. 120 00:05:25,280 --> 00:05:27,800 In my experience, it's usually within a factor 121 00:05:27,800 --> 00:05:30,840 of a few percent if you know what you're doing. 122 00:05:30,840 --> 00:05:35,350 So there's a very simple example that I like to use, 123 00:05:35,350 --> 00:05:38,850 which is generating a Vandermonde matrix. 124 00:05:38,850 --> 00:05:43,540 So giving a vector a value as alpha 1 alpha 2 to alpha n. 125 00:05:43,540 --> 00:05:46,040 And you want to make an n by m matrix whose columns are just 126 00:05:46,040 --> 00:05:51,050 those entries to 0 with power, first power squared, cubed, 127 00:05:51,050 --> 00:05:52,790 and so forth element-wise. 128 00:05:52,790 --> 00:05:54,200 All right, so this kind of matrix 129 00:05:54,200 --> 00:05:55,860 shows up in a lot of problems. 130 00:05:55,860 --> 00:05:58,580 So most matrix and vector libraries 131 00:05:58,580 --> 00:06:01,360 have a built in function to do this and Python. 132 00:06:01,360 --> 00:06:06,150 In NumPy, there is a function called numpy.vander to do this. 133 00:06:06,150 --> 00:06:07,680 And if you look at-- 134 00:06:07,680 --> 00:06:09,775 it's generating a big matrix. 135 00:06:09,775 --> 00:06:11,150 It could be performance critical. 136 00:06:11,150 --> 00:06:12,988 So they can implement it in Python. 137 00:06:12,988 --> 00:06:14,780 So if you look at the NumPy implementation, 138 00:06:14,780 --> 00:06:19,190 it's a little Python shim that calls immediately to C. 139 00:06:19,190 --> 00:06:20,690 And then if you look at the C code-- 140 00:06:20,690 --> 00:06:22,190 I won't scroll through it-- but it's 141 00:06:22,190 --> 00:06:23,630 several hundred lines of code. 142 00:06:23,630 --> 00:06:25,480 It's quite long and complicated. 143 00:06:25,480 --> 00:06:27,230 And all that several hundred lines of code 144 00:06:27,230 --> 00:06:32,620 is doing is just figuring out what types to work with, 145 00:06:32,620 --> 00:06:34,398 like what kernels to dispatch to. 146 00:06:34,398 --> 00:06:36,440 And at the end of that, it dispatches to a kernel 147 00:06:36,440 --> 00:06:37,790 that does the actual work. 148 00:06:37,790 --> 00:06:40,640 And that kernel is also C code, but that C code 149 00:06:40,640 --> 00:06:42,800 was generated by a special purpose code generation. 150 00:06:42,800 --> 00:06:47,930 So it's quite involved to get good performance 151 00:06:47,930 --> 00:06:50,473 for this while still being somewhat type generic. 152 00:06:50,473 --> 00:06:51,890 So their goal is to have something 153 00:06:51,890 --> 00:06:56,300 that works for basically any NumPy array and any NumPy type, 154 00:06:56,300 --> 00:06:58,070 which there's a handful, like maybe 155 00:06:58,070 --> 00:07:01,530 a dozen scalar types that it should work with, all right? 156 00:07:01,530 --> 00:07:03,440 So if you're implementing this in C, 157 00:07:03,440 --> 00:07:06,920 it's really trivial to write 20 lines of code that implements 158 00:07:06,920 --> 00:07:10,280 this but only for double precision, a point or two 159 00:07:10,280 --> 00:07:12,120 double position array, all right? 160 00:07:12,120 --> 00:07:15,320 So the difficulty is getting type generic in C. 161 00:07:15,320 --> 00:07:18,320 So in Julia. 162 00:07:18,320 --> 00:07:22,690 Here is the implementation in Julia. 163 00:07:22,690 --> 00:07:25,885 It looks at first glance much like what roughly 164 00:07:25,885 --> 00:07:28,010 what a C or Fourier implementation would look like. 165 00:07:28,010 --> 00:07:31,850 It's just implemented the most simple way. 166 00:07:31,850 --> 00:07:33,840 It's just two nested loops. 167 00:07:33,840 --> 00:07:36,950 So just basically, you loop across. 168 00:07:36,950 --> 00:07:39,950 And as you go across, you accumulate powers 169 00:07:39,950 --> 00:07:43,105 by multiplying repeatedly by x. 170 00:07:43,105 --> 00:07:44,000 That's all it is. 171 00:07:44,000 --> 00:07:46,490 And it just fills in the array. 172 00:07:46,490 --> 00:07:48,170 The performance of that graph here 173 00:07:48,170 --> 00:07:50,403 is the time for the NumPy implementation divided 174 00:07:50,403 --> 00:07:52,070 by the time for the Julie implementation 175 00:07:52,070 --> 00:07:54,420 as a function of n for an n by n matrix. 176 00:07:54,420 --> 00:07:56,030 The first data point, I think there's 177 00:07:56,030 --> 00:07:58,470 something funny going on that's not 10,000 times slower. 178 00:07:58,470 --> 00:08:02,660 But for a 10 by 10, 20 by 20 array, 179 00:08:02,660 --> 00:08:05,330 the NumPy version is actually 10 times slower 180 00:08:05,330 --> 00:08:07,730 because it's basically the overhead that's imposed by all 181 00:08:07,730 --> 00:08:09,470 going through all those layers. 182 00:08:09,470 --> 00:08:11,090 Once you get to 100 by 100 matrix, 183 00:08:11,090 --> 00:08:12,710 the overhead doesn't matter. 184 00:08:12,710 --> 00:08:18,340 And then it's all this optimized C code, generation and so forth 185 00:08:18,340 --> 00:08:20,630 is pretty much the same speed as the Julia code. 186 00:08:20,630 --> 00:08:23,035 Except the Julia code there, as I said, 187 00:08:23,035 --> 00:08:27,350 it looks much like C code would, except there's no types. 188 00:08:27,350 --> 00:08:28,430 It's Vander x. 189 00:08:28,430 --> 00:08:31,340 There's no type declaration. x can be anything. 190 00:08:31,340 --> 00:08:33,950 And, in fact, this works with any container type 191 00:08:33,950 --> 00:08:36,409 as long as it has an indexing operation. 192 00:08:36,409 --> 00:08:38,419 And any numeric type-- it could be real numbers. 193 00:08:38,419 --> 00:08:39,586 It could be complex numbers. 194 00:08:39,586 --> 00:08:41,240 It could be quarternians, anything that 195 00:08:41,240 --> 00:08:43,360 supports the times operation. 196 00:08:43,360 --> 00:08:46,370 And there's also a call to 1. 197 00:08:46,370 --> 00:08:49,520 So 1 returns the multiplicative identity for whatever, 198 00:08:49,520 --> 00:08:52,230 so whatever group you're in you need to have a 1, right? 199 00:08:52,230 --> 00:08:53,230 That's the first column. 200 00:08:53,230 --> 00:08:56,235 That might be a different type of 1 for a different object, 201 00:08:56,235 --> 00:08:56,735 right? 202 00:08:56,735 --> 00:08:59,310 It might be an array of matrices, for example. 203 00:08:59,310 --> 00:09:02,190 And then the 1 is the identity matrix. 204 00:09:02,190 --> 00:09:04,430 So, in fact. 205 00:09:04,430 --> 00:09:06,530 There are even cases where you can do. 206 00:09:06,530 --> 00:09:11,180 Get significantly faster than optimize C and Fortran codes. 207 00:09:11,180 --> 00:09:13,760 So I found this when I was implementing special functions, 208 00:09:13,760 --> 00:09:16,460 so things like the error function, or polygamma 209 00:09:16,460 --> 00:09:18,740 function, or the inverse of the error function. 210 00:09:18,740 --> 00:09:22,040 I've consistently found that I can get often 211 00:09:22,040 --> 00:09:25,910 two to three times faster than optimized C and Fortran 212 00:09:25,910 --> 00:09:29,507 libraries out there, partly because I'm smarter than people 213 00:09:29,507 --> 00:09:31,340 who wrote those libraries, but-- no-- mainly 214 00:09:31,340 --> 00:09:35,090 because in Julia, I'm using basically 215 00:09:35,090 --> 00:09:37,910 the same expansions, the same series, rational functions 216 00:09:37,910 --> 00:09:39,200 that everyone else is using. 217 00:09:39,200 --> 00:09:41,083 The difference is then in Julia, it 218 00:09:41,083 --> 00:09:42,500 has built-in techniques for what's 219 00:09:42,500 --> 00:09:44,690 called metaprogramming or co-generation. 220 00:09:44,690 --> 00:09:47,900 So usually, the special functions 221 00:09:47,900 --> 00:09:49,940 involved lots of polynomial evaluations. 222 00:09:49,940 --> 00:09:51,500 That's what they boil down to. 223 00:09:51,500 --> 00:09:53,900 And you can basically write co-generation 224 00:09:53,900 --> 00:09:56,990 that generates very optimized inline evaluation 225 00:09:56,990 --> 00:10:00,320 of the specific polynomials for these functions that 226 00:10:00,320 --> 00:10:02,962 would be really awkward to write in Fortran. 227 00:10:02,962 --> 00:10:04,670 You'd either have to write it all by hand 228 00:10:04,670 --> 00:10:07,310 or write a separate routine, a separate program that 229 00:10:07,310 --> 00:10:09,380 wrote Fortran code for you. 230 00:10:09,380 --> 00:10:10,560 So you can do this. 231 00:10:10,560 --> 00:10:12,920 It's a high level languages allow you to do tricks 232 00:10:12,920 --> 00:10:15,545 for performance that it would be really hard to do in low level 233 00:10:15,545 --> 00:10:16,068 languages. 234 00:10:16,068 --> 00:10:17,610 So mainly what I wanted to talk about 235 00:10:17,610 --> 00:10:23,840 is give some idea of why Julia can be fast. 236 00:10:23,840 --> 00:10:26,060 And to understand this, you also need 237 00:10:26,060 --> 00:10:29,000 to understand why is Python slow. 238 00:10:29,000 --> 00:10:33,050 And in general, what's going on in determining the performance 239 00:10:33,050 --> 00:10:34,050 in a language like this? 240 00:10:34,050 --> 00:10:36,990 What do you need in the language to enable you to compile it 241 00:10:36,990 --> 00:10:40,920 to fast code while still, still being completely 242 00:10:40,920 --> 00:10:44,310 generic like this Vander function, which 243 00:10:44,310 --> 00:10:46,430 works on any type. 244 00:10:46,430 --> 00:10:49,470 Even user-defined, numeric type, user-defined container type 245 00:10:49,470 --> 00:10:51,030 will be just as fast. 246 00:10:51,030 --> 00:10:54,210 There's no privileged-- in fact, if you look at Julia, 247 00:10:54,210 --> 00:10:57,685 almost all of Julia is implemented in Julia. 248 00:10:57,685 --> 00:11:00,060 Integer operations and things like that, the really basic 249 00:11:00,060 --> 00:11:02,830 types, most of that is implemented in Julia, right? 250 00:11:02,830 --> 00:11:05,370 Obviously, if you're multiplying two 32-bit integers. 251 00:11:05,370 --> 00:11:08,100 At some point, it's calling an assembly language instruction. 252 00:11:08,100 --> 00:11:11,400 But even that, calling out to the assembly 253 00:11:11,400 --> 00:11:14,810 is actually in Julia. 254 00:11:14,810 --> 00:11:19,110 So at this point, I want to switch over 255 00:11:19,110 --> 00:11:24,410 to sort of a live calculation. 256 00:11:27,470 --> 00:11:30,570 So this is from a notebook that I developed 257 00:11:30,570 --> 00:11:32,870 as part of a short course with Alan Edelman, who's 258 00:11:32,870 --> 00:11:36,850 sitting over there, [INAUDIBLE] on performance optimization 259 00:11:36,850 --> 00:11:38,860 high level languages. 260 00:11:38,860 --> 00:11:43,450 And so I want to go through just a very simple calculation. 261 00:11:43,450 --> 00:11:45,268 Of course, you would never-- 262 00:11:45,268 --> 00:11:47,560 in any language, usually you would have a built-- often 263 00:11:47,560 --> 00:11:49,120 have a built-in function for this. 264 00:11:49,120 --> 00:11:51,460 But it's just a sum function just written up there. 265 00:11:51,460 --> 00:11:55,077 So we need to have a list, an array, of n numbers. 266 00:11:55,077 --> 00:11:56,410 We're just going to add them up. 267 00:11:56,410 --> 00:12:01,572 And if we can't make this fast, then we have real problems. 268 00:12:01,572 --> 00:12:04,030 And we're not going to be able to do anything in this list. 269 00:12:04,030 --> 00:12:06,640 So this is the simple sort of thing 270 00:12:06,640 --> 00:12:08,860 where if someone doesn't provide this for you, 271 00:12:08,860 --> 00:12:12,970 you're going to have to write a loop to do this. 272 00:12:12,970 --> 00:12:16,390 So I'm going to look at it not just in Julia but also 273 00:12:16,390 --> 00:12:23,510 in Python, in C, and Python with NumPy and so forth. 274 00:12:23,510 --> 00:12:27,040 So this document that I'm showing you here 275 00:12:27,040 --> 00:12:29,135 is a Jupyter Notebook. 276 00:12:29,135 --> 00:12:31,010 Some of you may have seen this kind of thing. 277 00:12:31,010 --> 00:12:32,612 So Jupyter is this really nice-- 278 00:12:32,612 --> 00:12:34,820 they provide this really nice browser-based front end 279 00:12:34,820 --> 00:12:39,940 when I can put in equations, and text, and code, and results, 280 00:12:39,940 --> 00:12:43,253 and graphs all in one Mathematical notebook document. 281 00:12:43,253 --> 00:12:44,920 And you can plug in different languages. 282 00:12:44,920 --> 00:12:46,450 So initially, it was for Python. 283 00:12:46,450 --> 00:12:47,570 But we plugged in Julia. 284 00:12:47,570 --> 00:12:49,480 And now there's R, and there's 30 different-- 285 00:12:49,480 --> 00:12:51,188 like 100 different languages that you can 286 00:12:51,188 --> 00:12:52,750 plug in to the same front end. 287 00:12:52,750 --> 00:12:56,480 OK, so I'll start with the C implementation of this. 288 00:12:56,480 --> 00:12:58,720 So this is a Julia notebook, but I can easily 289 00:12:58,720 --> 00:13:00,645 compile and call out to C. So I just 290 00:13:00,645 --> 00:13:02,020 made a string that has just-- you 291 00:13:02,020 --> 00:13:04,192 know there's 10 lines C implementation. 292 00:13:04,192 --> 00:13:05,650 It's just the most obvious function 293 00:13:05,650 --> 00:13:09,250 that just takes in a pointer to an array of doubles 294 00:13:09,250 --> 00:13:10,600 and it's length. 295 00:13:10,600 --> 00:13:14,500 And it just loops over them and sums them up, 296 00:13:14,500 --> 00:13:16,180 just what you would do. 297 00:13:16,180 --> 00:13:21,760 And then I'll compile it with GCC dash 03 and link it 298 00:13:21,760 --> 00:13:24,670 to a shared library, and load that shared library in Julia 299 00:13:24,670 --> 00:13:25,840 and just call it. 300 00:13:25,840 --> 00:13:29,260 So there's a function called C call in Julia where 301 00:13:29,260 --> 00:13:32,770 I can just call out to a C library with a 0 overhead, 302 00:13:32,770 --> 00:13:34,823 basically. 303 00:13:34,823 --> 00:13:37,240 So it's nice because you have lots of existing C libraries 304 00:13:37,240 --> 00:13:37,640 out there. 305 00:13:37,640 --> 00:13:38,807 You don't want to lose them. 306 00:13:38,807 --> 00:13:41,920 So I just say C call, and we call this c_sum function 307 00:13:41,920 --> 00:13:43,210 in my library. 308 00:13:43,210 --> 00:13:44,740 It returns a flow 64. 309 00:13:44,740 --> 00:13:49,555 It takes two parameters, a size t and a flow 64. 310 00:13:49,555 --> 00:13:51,430 And I'm going to pass the length of my array. 311 00:13:51,430 --> 00:13:53,493 And the array-- and it'll automatically-- 312 00:13:53,493 --> 00:13:58,240 a Julia array, of course, is just a bunch of numbers. 313 00:13:58,240 --> 00:14:00,430 And it'll pass a pointer to that under the hood. 314 00:14:00,430 --> 00:14:01,780 So do that. 315 00:14:01,780 --> 00:14:06,040 And I wrote a little function to call 316 00:14:06,040 --> 00:14:08,170 relerr that computes the relative error 317 00:14:08,170 --> 00:14:11,112 between the fractional difference between x and y. 318 00:14:11,112 --> 00:14:12,070 And I'll just check it. 319 00:14:12,070 --> 00:14:15,250 I'll just generate 10 to the 7 random numbers in 01 320 00:14:15,250 --> 00:14:17,800 and compare that to the Julia because Julia has a built-in 321 00:14:17,800 --> 00:14:20,370 function called sum, that sums and array. 322 00:14:20,370 --> 00:14:24,785 And it's giving the same answer to 13 decimal places, so not 323 00:14:24,785 --> 00:14:27,160 quite machine precision, but there's 10 to the 7 numbers. 324 00:14:27,160 --> 00:14:28,690 So the error is kind of accumulative 325 00:14:28,690 --> 00:14:29,648 when you add it across. 326 00:14:29,648 --> 00:14:32,320 OK so, as I'm calling it, it's giving the right answer. 327 00:14:32,320 --> 00:14:34,990 And now I want to just benchmark the C implementation, 328 00:14:34,990 --> 00:14:37,540 use that as kind of a baseline for this. 329 00:14:37,540 --> 00:14:41,650 This should be pretty fast for an array of floating point 330 00:14:41,650 --> 00:14:42,770 values. 331 00:14:42,770 --> 00:14:45,238 So there's a Julia package called benchmark tools. 332 00:14:45,238 --> 00:14:46,780 As you probably know from this class, 333 00:14:46,780 --> 00:14:48,940 benchmarking is a little bit tricky. 334 00:14:48,940 --> 00:14:51,400 So this will take something, run it lots of times, 335 00:14:51,400 --> 00:14:54,050 collect some statistics, return the minimum time, 336 00:14:54,050 --> 00:14:58,560 or you can also get the variance and other things. 337 00:14:58,560 --> 00:15:01,247 So I'm going to get that number. 338 00:15:01,247 --> 00:15:03,080 B time is something called a macro in Julia. 339 00:15:03,080 --> 00:15:05,590 So it takes an expression, rewrites it 340 00:15:05,590 --> 00:15:08,950 into something that basically has a loop, and times it, 341 00:15:08,950 --> 00:15:10,710 and does all that stuff. 342 00:15:10,710 --> 00:15:15,250 OK, so it takes 11 milliseconds to sum 10 to the 7 numbers 343 00:15:15,250 --> 00:15:20,050 with a straight C, C loop compiled with jcc-03, 344 00:15:20,050 --> 00:15:23,710 no special tricks, OK? 345 00:15:23,710 --> 00:15:26,710 And so that's 1 gigaflop, basically, 346 00:15:26,710 --> 00:15:29,210 billion operations per second. 347 00:15:29,210 --> 00:15:32,980 So it's not hitting the peak rate of the CPU. 348 00:15:32,980 --> 00:15:37,730 But, of course, there's additional calculations. 349 00:15:37,730 --> 00:15:40,990 This array doesn't fit in cache, and so forth. 350 00:15:40,990 --> 00:15:42,550 OK. 351 00:15:42,550 --> 00:15:44,620 So now let's-- before I do anything in Julia, 352 00:15:44,620 --> 00:15:46,240 let's do some Python. 353 00:15:46,240 --> 00:15:47,500 But I'll do a trick. 354 00:15:47,500 --> 00:15:50,230 I can call Python from Julia, so that way 355 00:15:50,230 --> 00:15:53,560 I can just do everything from one notebook using a package I 356 00:15:53,560 --> 00:15:55,220 wrote called PyCall. 357 00:15:55,220 --> 00:15:58,300 And PyCall just calls directly out to lib Python. 358 00:15:58,300 --> 00:16:01,675 So with no virtually no overhead, so 359 00:16:01,675 --> 00:16:04,030 it's just like calling Python from within Python. 360 00:16:04,030 --> 00:16:06,760 I'm calling directly out to lib Python functions to call. 361 00:16:06,760 --> 00:16:09,820 And I can pass any type I want, and call any function, 362 00:16:09,820 --> 00:16:12,010 and do conversions back and forth. 363 00:16:12,010 --> 00:16:15,710 OK, so I'm going to take that array. 364 00:16:15,710 --> 00:16:18,220 I'll convert it to a Python list object. 365 00:16:18,220 --> 00:16:21,310 So I don't want to time the overhead of converting 366 00:16:21,310 --> 00:16:23,230 my array to a Python array. 367 00:16:23,230 --> 00:16:25,840 So I'll just convert ahead of time. 368 00:16:25,840 --> 00:16:27,570 And just start with a built-- 369 00:16:27,570 --> 00:16:30,120 Python has a built-in function called sum. 370 00:16:30,120 --> 00:16:33,430 So I'll use the built-in sum function. 371 00:16:33,430 --> 00:16:36,030 And I'll get this Py object for it. 372 00:16:36,030 --> 00:16:39,235 I'll call it PySum on this list and make 373 00:16:39,235 --> 00:16:41,610 sure it's giving the right answer OK is the difference is 374 00:16:41,610 --> 00:16:44,190 10 to the minus 13 again. 375 00:16:44,190 --> 00:16:47,810 And now let's benchmark it. 376 00:16:47,810 --> 00:16:48,390 Oops. 377 00:16:48,390 --> 00:16:50,010 There we go. 378 00:16:50,010 --> 00:16:53,160 So it takes a few seconds because it 379 00:16:53,160 --> 00:16:57,780 has to run it a few times and catch up with statistics. 380 00:16:57,780 --> 00:17:00,365 OK, so it takes 40 milliseconds. 381 00:17:00,365 --> 00:17:00,990 That's not bad. 382 00:17:00,990 --> 00:17:03,840 It's actually it's four or five times slower than C, 383 00:17:03,840 --> 00:17:06,849 but it's pretty good, OK? 384 00:17:06,849 --> 00:17:10,680 So and why is it five times slower than C? 385 00:17:10,680 --> 00:17:11,937 Is it is it because-- 386 00:17:11,937 --> 00:17:13,770 the glib answer is, oh, well, Python is slow 387 00:17:13,770 --> 00:17:15,400 because it's interpreted. 388 00:17:15,400 --> 00:17:18,119 But the sum function is actually written 389 00:17:18,119 --> 00:17:21,900 in C. Here's the C implementation of the sum 390 00:17:21,900 --> 00:17:23,240 function that I'm calling. 391 00:17:23,240 --> 00:17:26,410 And I'm just linking to the GitHub code. 392 00:17:26,410 --> 00:17:28,280 There's a whole bunch of boilerplate 393 00:17:28,280 --> 00:17:30,900 that just checks with the type of the object, 394 00:17:30,900 --> 00:17:34,900 and then has some loops and so forth. 395 00:17:34,900 --> 00:17:37,020 And so if you look carefully, it turns out 396 00:17:37,020 --> 00:17:38,740 it's actually doing really well. 397 00:17:38,740 --> 00:17:42,780 And the reason it does really well is it has a fast path. 398 00:17:42,780 --> 00:17:46,190 If you have a list where everything is a number type, 399 00:17:46,190 --> 00:17:51,490 so then it has an optimized implementation for that case. 400 00:17:51,490 --> 00:17:53,910 But it's still five times slower than C. 401 00:17:53,910 --> 00:17:55,860 And they've spent a lot of work on it. 402 00:17:55,860 --> 00:17:58,920 It used to be 10 times slower than C a couple of years ago. 403 00:17:58,920 --> 00:18:02,880 So they do a lot of work on optimizing this. 404 00:18:02,880 --> 00:18:05,160 And so why aren't they able to get C speed? 405 00:18:05,160 --> 00:18:08,760 Since they have a C implementation of a sum, 406 00:18:08,760 --> 00:18:10,440 are they just dumb, you know? 407 00:18:10,440 --> 00:18:11,070 No. 408 00:18:11,070 --> 00:18:14,320 It's because the semantics of the data type 409 00:18:14,320 --> 00:18:18,550 prevent them from getting anything faster than that. 410 00:18:18,550 --> 00:18:21,700 And this is one of the things you 411 00:18:21,700 --> 00:18:23,950 learn when you do high level performance in high level 412 00:18:23,950 --> 00:18:24,572 languages. 413 00:18:24,572 --> 00:18:26,030 You have to think about data types, 414 00:18:26,030 --> 00:18:28,155 and you have to think about what the semantics are. 415 00:18:28,155 --> 00:18:31,110 And that that greatly constrains what any conceivable compiler 416 00:18:31,110 --> 00:18:32,070 can do. 417 00:18:32,070 --> 00:18:35,510 And if the language doesn't provide you 418 00:18:35,510 --> 00:18:38,190 with the ability to express the semantics 419 00:18:38,190 --> 00:18:39,720 you want, then you're dead. 420 00:18:39,720 --> 00:18:42,225 And that's one of the basic things that Julia does. 421 00:18:42,225 --> 00:18:43,350 So what does a Python list? 422 00:18:46,880 --> 00:18:56,760 Right, so you have-- you can have three, four, right? 423 00:18:56,760 --> 00:18:59,230 A Python list is a bunch of objects, Python objects. 424 00:18:59,230 --> 00:19:00,950 But the Python numbers can be anything. 425 00:19:00,950 --> 00:19:03,040 They can be any type. 426 00:19:03,040 --> 00:19:04,665 So it's completely heterogeneous types. 427 00:19:10,627 --> 00:19:12,710 So, of course, a particular list like in this case 428 00:19:12,710 --> 00:19:14,000 can be homogeneous. 429 00:19:14,000 --> 00:19:16,813 But the data structure has to be heterogeneous 430 00:19:16,813 --> 00:19:18,980 because, in fact, I can take that homogeneous thing. 431 00:19:18,980 --> 00:19:21,490 At any point, I can assign the third element to a string, 432 00:19:21,490 --> 00:19:21,990 right? 433 00:19:21,990 --> 00:19:23,880 And it has to support that. 434 00:19:23,880 --> 00:19:26,900 So think about what that means for how it has 435 00:19:26,900 --> 00:19:29,270 to be implemented in memory. 436 00:19:29,270 --> 00:19:34,406 So what this has to do-- so this is a list of-- 437 00:19:34,406 --> 00:19:37,910 in this case, three items. 438 00:19:37,910 --> 00:19:40,220 But what are those items? 439 00:19:40,220 --> 00:19:43,175 So if they can be an item of any type, 440 00:19:43,175 --> 00:19:45,800 they could be things that-- they could be another array, right? 441 00:19:45,800 --> 00:19:48,835 It could be of different sizes and so forth. 442 00:19:48,835 --> 00:19:50,960 You don't want to have an array where everything is 443 00:19:50,960 --> 00:19:52,252 a different size, first of all. 444 00:19:52,252 --> 00:19:57,500 So it has to be this an array of pointers 445 00:19:57,500 --> 00:20:01,490 where the first pointer is 3, turned out to be 3. 446 00:20:01,490 --> 00:20:02,990 The next one is four. 447 00:20:02,990 --> 00:20:07,877 The next one is 2, all right? 448 00:20:07,877 --> 00:20:08,960 But it can't just be that. 449 00:20:08,960 --> 00:20:10,460 It can't just be pointer to-- 450 00:20:10,460 --> 00:20:13,400 if this is a you 64-bit number, it's 451 00:20:13,400 --> 00:20:17,150 can't just be pointer to one 64-bit value in memory, 452 00:20:17,150 --> 00:20:19,130 because it has to know. 453 00:20:19,130 --> 00:20:21,560 It has to somehow store what type the subject is. 454 00:20:21,560 --> 00:20:25,190 So there has to be a type tag this says this is an integer. 455 00:20:30,570 --> 00:20:35,170 And this one has to have a type tag that says it's a string. 456 00:20:35,170 --> 00:20:37,620 So this is sometimes called the box. 457 00:20:37,620 --> 00:20:44,290 So you have a value, you have a type tag plus a value. 458 00:20:47,240 --> 00:20:51,160 And so imagine what even the most optimized C imitation 459 00:20:51,160 --> 00:20:53,610 has to do given this kind of data structure. 460 00:20:53,610 --> 00:20:55,120 OK, here's the first element. 461 00:20:55,120 --> 00:20:58,420 It has to chase the pointer and then ask what type of object 462 00:20:58,420 --> 00:20:59,548 is it, OK? 463 00:20:59,548 --> 00:21:01,090 Then depending on what type of object 464 00:21:01,090 --> 00:21:04,390 is it, so I initialize my sum to that. 465 00:21:04,390 --> 00:21:06,300 Then I read the next object. 466 00:21:06,300 --> 00:21:09,277 I have to chase the second pointer, read the type tag, 467 00:21:09,277 --> 00:21:10,110 figure out the type. 468 00:21:10,110 --> 00:21:12,280 This is all done at run time, right? 469 00:21:12,280 --> 00:21:14,650 And then, oh, this is another integer 470 00:21:14,650 --> 00:21:18,040 that tells me I want to use the plus function for two integers, 471 00:21:18,040 --> 00:21:18,730 OK? 472 00:21:18,730 --> 00:21:22,470 And then I read the next value, which maybe-- 473 00:21:22,470 --> 00:21:24,253 which plus function it's using depends 474 00:21:24,253 --> 00:21:25,420 upon the type of the object. 475 00:21:25,420 --> 00:21:26,320 It's an to object-oriented language. 476 00:21:26,320 --> 00:21:27,800 I can define my own type. 477 00:21:27,800 --> 00:21:30,490 If it has its own plus function, it should work with sum. 478 00:21:30,490 --> 00:21:33,550 So it's looking up the types of the objects at runtime. 479 00:21:33,550 --> 00:21:35,760 It's looking at the plus function at runtime. 480 00:21:35,760 --> 00:21:38,620 And not only that, but each time it does a loop iteration it 481 00:21:38,620 --> 00:21:41,680 has to add two things and allocate a result. 482 00:21:41,680 --> 00:21:43,390 That result in general might be another-- 483 00:21:43,390 --> 00:21:45,547 it has to be a box because the type might change 484 00:21:45,547 --> 00:21:46,630 as you're summing through. 485 00:21:46,630 --> 00:21:49,172 If you start with integers, and then you get a floating point 486 00:21:49,172 --> 00:21:52,730 value, and then you get an array, the type will change. 487 00:21:52,730 --> 00:21:54,700 So each time you do a loop iteration, 488 00:21:54,700 --> 00:21:56,458 it allocates another box. 489 00:21:56,458 --> 00:21:58,750 So what happens is the C implementation is a fast path. 490 00:21:58,750 --> 00:22:02,320 If they're all integer types, I think 491 00:22:02,320 --> 00:22:05,200 it doesn't reallocate that box for the sum it's 492 00:22:05,200 --> 00:22:06,520 accumulating all the time. 493 00:22:06,520 --> 00:22:10,240 And it caches the value of the plus function it's using. 494 00:22:10,240 --> 00:22:11,690 So it's a little bit faster. 495 00:22:11,690 --> 00:22:14,410 But still, it has to inspect every type tag 496 00:22:14,410 --> 00:22:17,260 and chase all these pointers for every element of the array, 497 00:22:17,260 --> 00:22:21,750 whereas the C implementation of sum, 498 00:22:21,750 --> 00:22:25,440 if you imagine what this is this compiles down to, 499 00:22:25,440 --> 00:22:28,030 for each loop iteration, what does it do? 500 00:22:28,030 --> 00:22:31,428 It increments a pointer to the next element. 501 00:22:31,428 --> 00:22:33,220 At compile time, the types are all flow 64. 502 00:22:33,220 --> 00:22:36,940 So it flushes flow 64 value into a register. 503 00:22:36,940 --> 00:22:39,820 And then it has a running sum in another register, 504 00:22:39,820 --> 00:22:43,120 calls one machine instruction to add that running sum, 505 00:22:43,120 --> 00:22:46,280 and then it checks to see if we're done, 506 00:22:46,280 --> 00:22:48,700 an if statement there, and then goes on, all right? 507 00:22:48,700 --> 00:22:52,360 So just a few instructions and in a very tight loop here, 508 00:22:52,360 --> 00:22:54,712 whereas each loop iteration here has 509 00:22:54,712 --> 00:22:56,170 to be lots and lots of instructions 510 00:22:56,170 --> 00:22:58,212 to chase all these pointers to get the type tag-- 511 00:22:58,212 --> 00:23:01,000 and that's in the fast case where they're all the same type 512 00:23:01,000 --> 00:23:04,030 and it's optimized for that. 513 00:23:04,030 --> 00:23:07,230 So where was I? 514 00:23:07,230 --> 00:23:10,320 So wrong thing. 515 00:23:10,320 --> 00:23:13,060 So that's the Python sum function. 516 00:23:13,060 --> 00:23:16,200 Now most many of you, you've used 517 00:23:16,200 --> 00:23:18,280 Python know that there is another type of array 518 00:23:18,280 --> 00:23:19,947 and there's a whole library called NumPy 519 00:23:19,947 --> 00:23:21,700 for working with numerics. 520 00:23:21,700 --> 00:23:25,300 So what problem is that addressing? 521 00:23:25,300 --> 00:23:28,340 So the basic problem is this data structure. 522 00:23:28,340 --> 00:23:31,360 This data structure, as soon as you have a list of items-- 523 00:23:31,360 --> 00:23:32,590 it can be any type-- 524 00:23:32,590 --> 00:23:33,580 you're dead, right? 525 00:23:33,580 --> 00:23:36,880 There's no way to make this as fast as a C 526 00:23:36,880 --> 00:23:38,890 loop over a double pointer. 527 00:23:38,890 --> 00:23:42,250 So to make it fast, what you need 528 00:23:42,250 --> 00:23:45,580 to have is a way to say, oh, every element of this array 529 00:23:45,580 --> 00:23:46,970 is the same type. 530 00:23:46,970 --> 00:23:49,270 So I don't need to store type tags for every element. 531 00:23:49,270 --> 00:23:51,820 I can store a type tag once for the whole array. 532 00:23:57,020 --> 00:23:59,180 So there, there is a tag. 533 00:24:01,800 --> 00:24:08,030 There is a type, which is, say, float 64, OK? 534 00:24:08,030 --> 00:24:12,520 There is maybe a length of the array. 535 00:24:12,520 --> 00:24:15,020 And then there's just a bunch of values one after the other. 536 00:24:15,020 --> 00:24:21,050 So this is just 1.0, 3.7, 8.9. 537 00:24:21,050 --> 00:24:25,970 And each of these are just 8 bytes, an 8-byte double in C 538 00:24:25,970 --> 00:24:28,340 notation, right? 539 00:24:28,340 --> 00:24:29,750 So it's just one after the other. 540 00:24:29,750 --> 00:24:32,690 So it reads this once, reads the length, 541 00:24:32,690 --> 00:24:35,240 and then it dispatches to code that says, 542 00:24:35,240 --> 00:24:38,580 OK, now-- basically dispatches to the equivalent of my C code, 543 00:24:38,580 --> 00:24:39,080 all right? 544 00:24:39,080 --> 00:24:41,830 So now once it knows the type and the length, then OK, 545 00:24:41,830 --> 00:24:45,350 it says it runs this, OK? 546 00:24:45,350 --> 00:24:46,822 And that can be quite fast. 547 00:24:46,822 --> 00:24:48,530 And the only problem is you cannot write, 548 00:24:48,530 --> 00:24:50,440 implement this in Python. 549 00:24:50,440 --> 00:24:53,180 So Python doesn't provide you a way 550 00:24:53,180 --> 00:24:57,090 to have that semantics to have a list of objects 551 00:24:57,090 --> 00:24:59,090 where you say they all have to be the same type. 552 00:24:59,090 --> 00:25:03,230 There is no way to enforce that, or to inform 553 00:25:03,230 --> 00:25:05,840 the language of that in Python. 554 00:25:05,840 --> 00:25:08,120 And then to tell it-- oh, for this, 555 00:25:08,120 --> 00:25:09,830 since these are all the same type, 556 00:25:09,830 --> 00:25:11,390 you can throw away the boxes. 557 00:25:11,390 --> 00:25:13,280 Every Python object looks like this. 558 00:25:13,280 --> 00:25:14,870 So there's no way to tell Python. 559 00:25:14,870 --> 00:25:16,970 Oh, well, these are all the same types. 560 00:25:16,970 --> 00:25:18,967 You don't need to store the type tags. 561 00:25:18,967 --> 00:25:20,300 You don't need to have pointers. 562 00:25:20,300 --> 00:25:22,160 You don't need to have reference counting. 563 00:25:22,160 --> 00:25:24,680 You can just slam the values into memory one 564 00:25:24,680 --> 00:25:26,060 after the other. 565 00:25:26,060 --> 00:25:27,830 It doesn't provide you with that facility, 566 00:25:27,830 --> 00:25:29,750 and so there's no way to-- 567 00:25:29,750 --> 00:25:32,360 you can make a fast Python compiler that will do this. 568 00:25:32,360 --> 00:25:35,270 So NumPy is implemented in C. Even with-- 569 00:25:35,270 --> 00:25:37,650 some of you are familiar with PyPy, 570 00:25:37,650 --> 00:25:40,100 which is an attempt to make a fast-- like a tracing 571 00:25:40,100 --> 00:25:41,450 jit for Python. 572 00:25:41,450 --> 00:25:44,840 So when they poured into NumPy to PyPy, 573 00:25:44,840 --> 00:25:49,640 or they attempted to, even then they could implement more of it 574 00:25:49,640 --> 00:25:50,330 in Python. 575 00:25:50,330 --> 00:25:53,290 But they had to implement the core in C. 576 00:25:53,290 --> 00:25:57,200 OK, but given that, I can do this. 577 00:25:57,200 --> 00:26:00,870 I can import the NumPy module into Julia, 578 00:26:00,870 --> 00:26:04,970 get its sum function, and benchmark the NumPy sum 579 00:26:04,970 --> 00:26:07,756 function, and-- 580 00:26:07,756 --> 00:26:12,440 OK, again, it takes a few seconds to run. 581 00:26:15,660 --> 00:26:19,590 OK, and it takes 3.8 milliseconds. 582 00:26:19,590 --> 00:26:22,650 So the C was 10 milliseconds. 583 00:26:22,650 --> 00:26:24,420 So it's actually doing faster than the C 584 00:26:24,420 --> 00:26:28,230 code, almost a little over twice as fast, actually. 585 00:26:28,230 --> 00:26:33,120 And what's going on is their C code is better than my C code. 586 00:26:33,120 --> 00:26:37,700 Their C code is using SIMD instructions. 587 00:26:37,700 --> 00:26:40,410 So and at this point, I'm sure that you guys all 588 00:26:40,410 --> 00:26:41,850 know about these things where you 589 00:26:41,850 --> 00:26:46,380 can read in two numbers or four numbers into one giant register 590 00:26:46,380 --> 00:26:50,390 and one instruction add all four numbers at once. 591 00:26:50,390 --> 00:26:54,200 OK, so what about if we go in the other direction? 592 00:26:54,200 --> 00:26:55,960 We write our own Python sum function. 593 00:26:55,960 --> 00:26:59,080 So we don't use the Python sum implemented in C. 594 00:26:59,080 --> 00:27:00,130 I write our own Python. 595 00:27:00,130 --> 00:27:04,040 So here is a little my sum function in Python. 596 00:27:04,040 --> 00:27:05,290 Only works for floating point. 597 00:27:05,290 --> 00:27:07,172 Oh, I initialize S to 0.0. 598 00:27:07,172 --> 00:27:09,380 So really, it only accumulates floating point values. 599 00:27:09,380 --> 00:27:10,510 But that's OK. 600 00:27:10,510 --> 00:27:12,510 And then I just loop for x and a, 601 00:27:12,510 --> 00:27:16,180 s equals s plus x, return s is the most obvious thing you 602 00:27:16,180 --> 00:27:18,730 would write in Python. 603 00:27:18,730 --> 00:27:20,970 OK, and checked that it works. 604 00:27:20,970 --> 00:27:22,700 Yeah, errors 10 the minus 13th. 605 00:27:22,700 --> 00:27:25,060 It's giving the right answer. 606 00:27:25,060 --> 00:27:26,740 And now let's time it. 607 00:27:29,680 --> 00:27:31,600 So remember that C was 10 milliseconds. 608 00:27:31,600 --> 00:27:37,750 NumPy was 5 milliseconds, and then built-in Python was like, 609 00:27:37,750 --> 00:27:40,920 the sum was 50 milliseconds operating on this list. 610 00:27:40,920 --> 00:27:44,560 So now we have C code operating on this list 611 00:27:44,560 --> 00:27:46,060 with 50 milliseconds. 612 00:27:46,060 --> 00:27:49,810 And now we have Python code operating on this list. 613 00:27:49,810 --> 00:27:56,320 And that is 230 milliseconds, all right? 614 00:27:56,320 --> 00:27:58,270 So it's quite a bit slower. 615 00:27:58,270 --> 00:28:00,425 And it's because, basically, in Python, there's-- 616 00:28:00,425 --> 00:28:02,050 in the pure Python code, there's no way 617 00:28:02,050 --> 00:28:04,338 to implement this fast path that checks-- 618 00:28:04,338 --> 00:28:06,130 oh they're all the same type, so I can cash 619 00:28:06,130 --> 00:28:07,422 the plus function and so forth. 620 00:28:07,422 --> 00:28:11,020 I don't think it's feasible to implement that. 621 00:28:11,020 --> 00:28:13,150 And so basically then on every loop 622 00:28:13,150 --> 00:28:16,900 iteration has to look up the plus function dynamically 623 00:28:16,900 --> 00:28:19,570 and allocate a new box for the result 624 00:28:19,570 --> 00:28:22,990 and do that 10 to 7 times. 625 00:28:22,990 --> 00:28:29,370 Now, so there's a built-in sum function in Julia. 626 00:28:29,370 --> 00:28:31,870 So [INAUDIBLE] benchmark that as a-- it's actually 627 00:28:31,870 --> 00:28:32,840 implemented in Julia. 628 00:28:32,840 --> 00:28:35,537 It's not implemented in C. I won't 629 00:28:35,537 --> 00:28:37,120 show you the code for the built-in one 630 00:28:37,120 --> 00:28:39,162 because it's a little messy because it's actually 631 00:28:39,162 --> 00:28:42,100 computing the sum more accurately than the loop that 632 00:28:42,100 --> 00:28:43,540 have done. 633 00:28:43,540 --> 00:28:48,560 So that's 3.9 milliseconds. 634 00:28:48,560 --> 00:28:51,820 So it's comparable to the NumPy code, OK? 635 00:28:51,820 --> 00:28:55,600 So it's also using SIMD, but so this is also fast. 636 00:28:58,150 --> 00:29:02,290 So now so why can Julie do that? 637 00:29:02,290 --> 00:29:05,020 So it has to be that the array type, first of all, 638 00:29:05,020 --> 00:29:06,350 has the type attached to it. 639 00:29:06,350 --> 00:29:09,410 So you can see the type of the array is an array of low 64. 640 00:29:09,410 --> 00:29:12,310 So there's a type tag attached to the array itself. 641 00:29:12,310 --> 00:29:15,850 So somehow, that's involved. 642 00:29:15,850 --> 00:29:17,860 So it looks more like an NumPy ray 643 00:29:17,860 --> 00:29:23,920 in memory than a Python list. 644 00:29:23,920 --> 00:29:25,960 You can make the equivalent of a Python list 645 00:29:25,960 --> 00:29:28,270 that's called an array of any. 646 00:29:28,270 --> 00:29:33,230 So if I convert this to an array of any, so an array of any 647 00:29:33,230 --> 00:29:36,110 is something where the elements types can be any Julia type. 648 00:29:36,110 --> 00:29:37,943 And so then it has to be stored as something 649 00:29:37,943 --> 00:29:40,430 like this as an array of pointers to boxes. 650 00:29:40,430 --> 00:29:42,140 And when I do that-- 651 00:29:42,140 --> 00:29:46,400 let's see-- [INAUDIBLE] there it is-- 652 00:29:46,400 --> 00:29:48,500 then it's 355 milliseconds. 653 00:29:48,500 --> 00:29:50,540 So it's actually even worse than Python. 654 00:29:50,540 --> 00:29:53,480 So the Julia-- the Python, they spent a lot 655 00:29:53,480 --> 00:29:55,460 of time optimizing their code past 656 00:29:55,460 --> 00:29:58,160 for things that had to allocate lots of boxes all the time. 657 00:29:58,160 --> 00:29:59,960 So in Julia, it's usually understood 658 00:29:59,960 --> 00:30:01,790 that if you're writing optimized code 659 00:30:01,790 --> 00:30:05,150 you're going to do it not on arrays of pointers to boxes. 660 00:30:05,150 --> 00:30:09,200 You're going to write on homogeneous arrays or things 661 00:30:09,200 --> 00:30:11,690 where the types are known at compile time. 662 00:30:11,690 --> 00:30:15,980 OK, so let's write our own Julia sum function. 663 00:30:15,980 --> 00:30:17,570 So this is a a Julia sum function. 664 00:30:17,570 --> 00:30:18,740 There's no type declaration. 665 00:30:18,740 --> 00:30:22,740 It works on any container type. 666 00:30:22,740 --> 00:30:26,690 I initialize s for this function called 0 667 00:30:26,690 --> 00:30:27,810 for the element type of a. 668 00:30:27,810 --> 00:30:29,420 So it initializes into the additive identity. 669 00:30:29,420 --> 00:30:31,460 So it will work on any container of anything 670 00:30:31,460 --> 00:30:35,510 that supports a plus function that has an additive identity. 671 00:30:35,510 --> 00:30:36,770 So it's completely generic. 672 00:30:36,770 --> 00:30:38,270 It looks a lot like the Python code 673 00:30:38,270 --> 00:30:41,816 except for there's no 0 function in Python. 674 00:30:41,816 --> 00:30:43,790 And let's make sure it gives the right answer. 675 00:30:43,790 --> 00:30:45,630 It does. 676 00:30:45,630 --> 00:30:46,890 And let's benchmark it. 677 00:30:51,835 --> 00:30:53,960 So this is the code you'd like to be able to right. 678 00:30:53,960 --> 00:30:55,590 You'd like to be able to write high level code that's 679 00:30:55,590 --> 00:30:57,270 a straight level straight loop. 680 00:30:57,270 --> 00:31:00,810 Unlike the C code, it's completely generic, right? 681 00:31:00,810 --> 00:31:02,550 It works on any container, anything 682 00:31:02,550 --> 00:31:06,510 you can loop over, and anything that has a plus function, so 683 00:31:06,510 --> 00:31:09,390 an array of quarternians or whatever. 684 00:31:09,390 --> 00:31:11,580 And a benchmark, it's 11 milliseconds. 685 00:31:11,580 --> 00:31:15,060 It's the same as the C code I wrote in the beginning. 686 00:31:15,060 --> 00:31:16,688 It's not using SIMD. 687 00:31:16,688 --> 00:31:17,730 So the instructions are-- 688 00:31:17,730 --> 00:31:19,647 that's where the additional factors of 2 come. 689 00:31:19,647 --> 00:31:22,040 But it's the same as the non SIMD C code. 690 00:31:22,040 --> 00:31:24,900 And, in fact, if I want to use SIMD, 691 00:31:24,900 --> 00:31:28,590 there is a little tag you can put on a loop to tell-- 692 00:31:28,590 --> 00:31:30,840 it says compile this with llvm. 693 00:31:30,840 --> 00:31:33,060 Tell llvm to try and vectorize the loop. 694 00:31:33,060 --> 00:31:33,930 Sometimes it can. 695 00:31:33,930 --> 00:31:35,843 Sometimes it can't. 696 00:31:35,843 --> 00:31:37,260 But something like this, it should 697 00:31:37,260 --> 00:31:40,200 be able to vectorize it simple enough. 698 00:31:40,200 --> 00:31:43,080 You don't need to hand code SIMD instructions for a loop 699 00:31:43,080 --> 00:31:43,600 this simple. 700 00:31:43,600 --> 00:31:44,185 Yeah? 701 00:31:44,185 --> 00:31:46,920 AUDIENCE: Why don't you always put the [INAUDIBLE]?? 702 00:31:46,920 --> 00:31:49,087 STEVEN JOHNSON: So a yeah, why isn't it the default? 703 00:31:49,087 --> 00:31:53,070 Because most code, the compiler cannot autovectorize. 704 00:31:53,070 --> 00:31:56,970 So it increases the completion time and often blows 705 00:31:56,970 --> 00:31:58,410 to the code size for no benefit. 706 00:31:58,410 --> 00:32:00,030 So it's only really-- 707 00:32:00,030 --> 00:32:02,070 it's really only relatively simple loops 708 00:32:02,070 --> 00:32:05,370 on doing simple operations and arrays that benefit from SIMD. 709 00:32:05,370 --> 00:32:08,250 So you don't want it to be there. 710 00:32:08,250 --> 00:32:11,950 Yeah, so now it's 4.3 milliseconds. 711 00:32:11,950 --> 00:32:15,870 So it's about the same as the NumPy and so forth. 712 00:32:15,870 --> 00:32:18,282 It's a little slower than the NumPy. 713 00:32:18,282 --> 00:32:18,990 It's interesting. 714 00:32:18,990 --> 00:32:21,150 A year ago when I tried this, it was almost exactly 715 00:32:21,150 --> 00:32:22,890 the same speed as the NumPy. 716 00:32:22,890 --> 00:32:25,888 And then since then, both the NumPy and the Julia 717 00:32:25,888 --> 00:32:26,680 have gotten better. 718 00:32:26,680 --> 00:32:27,720 But the NumPy got better more. 719 00:32:27,720 --> 00:32:29,730 So there's something going on with basically 720 00:32:29,730 --> 00:32:32,910 how well the compiler can use AVX instructions by-- 721 00:32:32,910 --> 00:32:35,390 it seems like we're still investigating what that is. 722 00:32:35,390 --> 00:32:38,650 But it's an llvm limitation looks like. 723 00:32:38,650 --> 00:32:40,880 So as it's still completely type generic. 724 00:32:40,880 --> 00:32:43,830 So it I make a random array of complex numbers, 725 00:32:43,830 --> 00:32:45,570 and then I sum them-- 726 00:32:48,835 --> 00:32:49,960 which one am I calling now? 727 00:32:49,960 --> 00:32:50,960 Am I calling the vector? 728 00:32:50,960 --> 00:32:52,890 My sum is the vectorized one, right? 729 00:32:52,890 --> 00:32:55,590 So complex numbers-- each complex number 730 00:32:55,590 --> 00:32:57,427 is two floating point numbers. 731 00:32:57,427 --> 00:32:59,010 So it should take about twice the time 732 00:32:59,010 --> 00:33:00,650 to naively think, right? 733 00:33:00,650 --> 00:33:02,662 So twice the number of operations to add, 734 00:33:02,662 --> 00:33:04,120 the same number of complex numbers, 735 00:33:04,120 --> 00:33:05,430 the same number of real numbers. 736 00:33:05,430 --> 00:33:06,180 Yeah, and it does. 737 00:33:06,180 --> 00:33:08,730 It takes about 11 milliseconds. 738 00:33:08,730 --> 00:33:13,140 So 10 to the 7, which is about twice the 5 739 00:33:13,140 --> 00:33:16,800 milliseconds it took for the same number of real numbers. 740 00:33:16,800 --> 00:33:18,450 And the code works for everything. 741 00:33:18,450 --> 00:33:20,520 So why? 742 00:33:20,520 --> 00:33:22,260 OK, so what's going on here? 743 00:33:24,852 --> 00:33:28,980 So-- OK, so we saw this my sum function. 744 00:33:28,980 --> 00:33:32,570 I'll just take out the SIMD for now. 745 00:33:32,570 --> 00:33:34,180 And we did all that. 746 00:33:34,180 --> 00:33:34,680 OK. 747 00:33:38,070 --> 00:33:39,900 OK, and it works for any type. 748 00:33:39,900 --> 00:33:41,910 It doesn't even have to be an array. 749 00:33:41,910 --> 00:33:44,370 So, for example, there is another container type 750 00:33:44,370 --> 00:33:49,530 called a set in Julia, which is just an unordered collection 751 00:33:49,530 --> 00:33:51,270 of unique elements. 752 00:33:51,270 --> 00:33:52,530 But you can also loop over it. 753 00:33:52,530 --> 00:33:56,220 If it's a set of integers, you can also summit. 754 00:33:56,220 --> 00:34:00,340 And I'm waiting for the benchmarks to complete. 755 00:34:00,340 --> 00:34:01,680 So let me allocate a set. 756 00:34:05,950 --> 00:34:09,330 It says there's no type declarations here. 757 00:34:09,330 --> 00:34:10,830 Mysum a-- there was no a-- has to be 758 00:34:10,830 --> 00:34:13,545 an array of particular type or it doesn't even 759 00:34:13,545 --> 00:34:14,670 have to be an array, right? 760 00:34:14,670 --> 00:34:18,030 So set is a different data structure. 761 00:34:18,030 --> 00:34:22,624 And so it's a set of integers. 762 00:34:22,624 --> 00:34:24,710 The sets are unique. 763 00:34:24,710 --> 00:34:27,480 If I add something to already the set-- it's in there-- 764 00:34:27,480 --> 00:34:29,159 it won't add it twice. 765 00:34:29,159 --> 00:34:32,030 And it supports the fast checking. 766 00:34:32,030 --> 00:34:33,120 Is 2 in the set? 767 00:34:33,120 --> 00:34:33,810 Is 3 in the set? 768 00:34:33,810 --> 00:34:35,852 It doesn't have to look through all the elements. 769 00:34:35,852 --> 00:34:38,340 It's a hash internally. 770 00:34:38,340 --> 00:34:40,449 But I can call my mysum function on it, 771 00:34:40,449 --> 00:34:46,810 and it sums up 2 plus 17 plus 6.24, which is hopefully 49, 772 00:34:46,810 --> 00:34:47,310 right? 773 00:34:47,310 --> 00:34:51,610 So OK, so what's going on here? 774 00:34:51,610 --> 00:34:53,290 So suppose you define-- 775 00:34:53,290 --> 00:34:55,238 the key, one of the keys, there's 776 00:34:55,238 --> 00:34:57,530 several things that are going on in to make Julia fast. 777 00:34:57,530 --> 00:35:00,370 One key thing is that when you have a function like this 778 00:35:00,370 --> 00:35:02,260 mysum, or even here's a simpler function-- 779 00:35:02,260 --> 00:35:04,780 f of x equals x plus 1-- 780 00:35:04,780 --> 00:35:08,320 when I call it with a particular type of argument, 781 00:35:08,320 --> 00:35:10,270 like an integer, or an array of integers, 782 00:35:10,270 --> 00:35:16,720 or whatever, then it compiles a specialized version 783 00:35:16,720 --> 00:35:19,430 of that for that type. 784 00:35:19,430 --> 00:35:21,250 So here's f of x equals x plus 1. 785 00:35:21,250 --> 00:35:25,980 It works on any type supporting plus. 786 00:35:25,980 --> 00:35:28,690 So if I call f of 3-- 787 00:35:28,690 --> 00:35:31,680 so here I'm passing a 64-bit integer. 788 00:35:31,680 --> 00:35:35,150 When it did that, it says, OK, x is a 64-bit integer. 789 00:35:35,150 --> 00:35:36,930 I'm going to compile a specialized version 790 00:35:36,930 --> 00:35:39,630 of f with that knowledge. 791 00:35:39,630 --> 00:35:45,280 And then when I call with a different type, 3.1, 792 00:35:45,280 --> 00:35:46,920 now x is a floating point number. 793 00:35:46,920 --> 00:35:51,480 It will compile a specialized version with that value. 794 00:35:51,480 --> 00:35:53,880 If I call it with another integer, 795 00:35:53,880 --> 00:35:55,980 it says, oh, that version was already compiled. 796 00:35:55,980 --> 00:35:56,580 [INAUDIBLE] I'll re-use it. 797 00:35:56,580 --> 00:35:57,950 So it only compiles it the first time 798 00:35:57,950 --> 00:35:59,510 you call it with a particular type. 799 00:35:59,510 --> 00:36:00,720 If I call it with a string, it'll 800 00:36:00,720 --> 00:36:02,262 give an error because it doesn't know 801 00:36:02,262 --> 00:36:04,110 how to add plus to a string. 802 00:36:04,110 --> 00:36:08,040 So it's a particular-- 803 00:36:08,040 --> 00:36:10,455 OK, so what is going on? 804 00:36:10,455 --> 00:36:13,640 So we can actually look at the compiled code. 805 00:36:13,640 --> 00:36:17,790 So the function is called code, these macros called code 806 00:36:17,790 --> 00:36:19,030 llvm and code native. 807 00:36:19,030 --> 00:36:22,070 They say, OK, when I call f of 1, what's the-- 808 00:36:22,070 --> 00:36:23,430 do people know what llvm is? 809 00:36:23,430 --> 00:36:27,713 You guys-- OK, so llvm compiles to byte code first 810 00:36:27,713 --> 00:36:29,130 and then it goes to machine codes. 811 00:36:29,130 --> 00:36:31,200 So you can see the llvm bit code or byte 812 00:36:31,200 --> 00:36:32,760 code, or whatever it's called. 813 00:36:32,760 --> 00:36:34,870 And you can see the native machine code. 814 00:36:34,870 --> 00:36:36,580 So here's the llvm byte code that it 815 00:36:36,580 --> 00:36:39,330 compiles to [INAUDIBLE]. 816 00:36:39,330 --> 00:36:44,760 So it's a bit called add i640 basically 817 00:36:44,760 --> 00:36:48,780 one llvm instruction, which turns into one machine 818 00:36:48,780 --> 00:36:52,990 instruction, load effective address is actually 819 00:36:52,990 --> 00:36:58,220 a 64-bit edition function instruction. 820 00:36:58,220 --> 00:37:01,440 So let's think about what had to happen there. 821 00:37:01,440 --> 00:37:04,560 So you have f of x equals x plus 1. 822 00:37:04,560 --> 00:37:19,790 Now you want to compile for x is a int 64, so 64-bit integer, 823 00:37:19,790 --> 00:37:24,840 or in Julia, we'd say x colon colon int 64. 824 00:37:24,840 --> 00:37:28,780 So colon colon means that this is of this type, OK? 825 00:37:28,780 --> 00:37:30,860 So this is 64-bit integer type. 826 00:37:30,860 --> 00:37:33,250 So what does it have to do? 827 00:37:33,250 --> 00:37:35,860 It first has to figure out what plus function to call. 828 00:37:35,860 --> 00:37:39,740 So plus, it has lots of-- 829 00:37:39,740 --> 00:37:41,780 there is a plus for two matrices, a plus 830 00:37:41,780 --> 00:37:43,540 for lots of different things. 831 00:37:43,540 --> 00:37:45,810 So depending on the types of the arguments, 832 00:37:45,810 --> 00:37:48,740 it decides on which plus function to call. 833 00:37:48,740 --> 00:37:53,360 So it first realizes, oh, this is an integer. 834 00:37:53,360 --> 00:37:54,920 Oh, this is an integer. 835 00:37:54,920 --> 00:37:56,960 This is also a 64-bit integer. 836 00:37:56,960 --> 00:38:03,495 So that means I'm going to call the plus function for two 837 00:38:03,495 --> 00:38:03,995 integers. 838 00:38:09,080 --> 00:38:12,290 So I'm going to look into that function. 839 00:38:12,290 --> 00:38:18,400 And then, oh, that one returns an int 64. 840 00:38:18,400 --> 00:38:20,755 So that's a return value for my function. 841 00:38:20,755 --> 00:38:22,130 And oh, by the way, this function 842 00:38:22,130 --> 00:38:26,090 is so simple that I'm going to inline it. 843 00:38:26,090 --> 00:38:28,850 So it's type specializing. 844 00:38:28,850 --> 00:38:34,058 And this process of going from x is an integer to that, 845 00:38:34,058 --> 00:38:35,600 to figure out the type of the output, 846 00:38:35,600 --> 00:38:42,380 is called type inference, OK? 847 00:38:42,380 --> 00:38:43,550 So. 848 00:38:43,550 --> 00:38:50,900 in general, for type inference, it 849 00:38:50,900 --> 00:38:57,800 is given the types of the inputs, 850 00:38:57,800 --> 00:39:06,410 it tries to infer the types of the outputs and, in fact, 851 00:39:06,410 --> 00:39:08,250 all intermediate values as well. 852 00:39:15,170 --> 00:39:18,370 Now what makes it a dynamic language is this can fail. 853 00:39:18,370 --> 00:39:22,120 So in some languages like ml or some other languages, 854 00:39:22,120 --> 00:39:24,520 you don't really declare types. 855 00:39:24,520 --> 00:39:26,377 But they're designed so they could 856 00:39:26,377 --> 00:39:27,460 give the types the inputs. 857 00:39:27,460 --> 00:39:29,320 So you can figure out everything. 858 00:39:29,320 --> 00:39:31,030 And if it can't figure out everything, 859 00:39:31,030 --> 00:39:32,238 it gives an error, basically. 860 00:39:32,238 --> 00:39:34,480 It has to infer everything. 861 00:39:34,480 --> 00:39:35,920 So Julia is a dynamic language. 862 00:39:35,920 --> 00:39:37,920 This can fail and have a fallback. 863 00:39:37,920 --> 00:39:41,110 If it doesn't know the type, it can stick things in a box. 864 00:39:41,110 --> 00:39:44,740 But obviously, the fast path is when it succeeds. 865 00:39:44,740 --> 00:39:46,720 And one of the key things is you have 866 00:39:46,720 --> 00:39:50,230 to try and make this kind of thing work in a language. 867 00:39:50,230 --> 00:39:52,450 You have to design the language so 868 00:39:52,450 --> 00:39:55,870 that at least for all the built-in constructs, 869 00:39:55,870 --> 00:39:57,670 the standard library, in general, 870 00:39:57,670 --> 00:40:00,460 in the culture for people designing packages 871 00:40:00,460 --> 00:40:04,775 and so forth, to design things so that this type inference can 872 00:40:04,775 --> 00:40:05,275 succeed. 873 00:40:07,960 --> 00:40:11,270 And I'll give a counter example to that in a minute, right? 874 00:40:11,270 --> 00:40:13,870 So and this works recursively. 875 00:40:13,870 --> 00:40:15,940 So it's not suppose I define a function g 876 00:40:15,940 --> 00:40:19,570 of x equals f of x times 2, OK? 877 00:40:19,570 --> 00:40:21,410 And then I called g of 1. 878 00:40:21,410 --> 00:40:26,020 So it's going to say, OK, x here is an integer, OK? 879 00:40:26,020 --> 00:40:28,040 I'm going to call f with an integer argument. 880 00:40:28,040 --> 00:40:30,118 Oh, I should compile f for an integer argument, 881 00:40:30,118 --> 00:40:32,160 figure out its return type, use its returned type 882 00:40:32,160 --> 00:40:35,380 to figure out what time's function to call and do all 883 00:40:35,380 --> 00:40:37,150 of this at compile time, right? 884 00:40:37,150 --> 00:40:39,320 Not at runtime, ideally. 885 00:40:39,320 --> 00:40:41,350 So we can look at the llvm code for this. 886 00:40:41,350 --> 00:40:46,870 And, in fact, so remember, f of x adds 1 to x. 887 00:40:46,870 --> 00:40:49,370 And then we're multiplying by 2. 888 00:40:49,370 --> 00:40:52,750 So the result computes 2x plus 2. 889 00:40:52,750 --> 00:40:54,710 And llvm is smart enough. 890 00:40:54,710 --> 00:40:57,280 So f is so simple that it inlines it. 891 00:40:57,280 --> 00:41:00,560 And then llvm is smart enough that I don't have to-- 892 00:41:00,560 --> 00:41:04,570 well, I know it does it by one shift instruction 893 00:41:04,570 --> 00:41:06,040 to multiply x by 2. 894 00:41:06,040 --> 00:41:07,030 And then it adds 2. 895 00:41:07,030 --> 00:41:09,760 So it actually combines the times 2 and the plus 1. 896 00:41:09,760 --> 00:41:12,430 So it does constant folding, OK? 897 00:41:12,430 --> 00:41:14,210 And it can continue on. 898 00:41:14,210 --> 00:41:17,050 If you look at h of x equals g of x times 2, 899 00:41:17,050 --> 00:41:23,110 then that compiles to 1 shift instruction to multiply x by 4 900 00:41:23,110 --> 00:41:24,880 and then adding 4. 901 00:41:24,880 --> 00:41:29,710 So you want the-- so this process cascades. 902 00:41:29,710 --> 00:41:31,930 So you can even do it for recursive function. 903 00:41:31,930 --> 00:41:34,240 So here's a stupid implementation of the Fibonacci 904 00:41:34,240 --> 00:41:36,490 number and calculation of recursive limitation, right? 905 00:41:36,490 --> 00:41:39,120 So this is given n. 906 00:41:39,120 --> 00:41:40,960 It's an integer. 907 00:41:40,960 --> 00:41:43,420 OK, if n is less than 3, returns 1. 908 00:41:43,420 --> 00:41:46,870 Otherwise, it adds the previous two numbers. 909 00:41:46,870 --> 00:41:49,330 I can compute the first call, listen to the first 10 910 00:41:49,330 --> 00:41:49,880 integers. 911 00:41:49,880 --> 00:41:51,730 Here's the first 10 Fibonacci numbers. 912 00:41:51,730 --> 00:41:53,570 There's also an acute notation in Julia. 913 00:41:53,570 --> 00:41:55,090 You can say fib dot. 914 00:41:55,090 --> 00:41:58,510 So if you do f dot arguments, it calls the function element 915 00:41:58,510 --> 00:42:00,970 Y's on a collection and returns a collection. 916 00:42:00,970 --> 00:42:05,380 So F dot 1 to 10 returns the first 10 Fibonacci numbers. 917 00:42:05,380 --> 00:42:06,760 And I can call-- 918 00:42:06,760 --> 00:42:09,520 there's a function called code 1 type that'll tell me what-- 919 00:42:09,520 --> 00:42:13,160 it'll tell me the output of type inference. 920 00:42:13,160 --> 00:42:15,100 N is a 1. 921 00:42:15,100 --> 00:42:18,310 And it goes through-- this is kind of a hard to read format. 922 00:42:18,310 --> 00:42:19,450 But this is like the-- 923 00:42:19,450 --> 00:42:24,340 after one of the compiler passes called lowering, but yeah. 924 00:42:24,340 --> 00:42:27,580 It's figure out the types of every intermediate calls. 925 00:42:27,580 --> 00:42:30,580 So here it's invoking main dot fib. 926 00:42:30,580 --> 00:42:31,690 It's recursively. 927 00:42:31,690 --> 00:42:34,660 And it's figured out that the return type is also int 64. 928 00:42:34,660 --> 00:42:38,140 So it knows everything, OK? 929 00:42:38,140 --> 00:42:42,030 So you'll notice that here I declared a type. 930 00:42:42,030 --> 00:42:46,180 I've said that this is an integer, OK? 931 00:42:46,180 --> 00:42:48,280 I don't have to do that for type inference. 932 00:42:48,280 --> 00:42:51,190 This doesn't help the compiler at all 933 00:42:51,190 --> 00:42:54,480 because it does type inference depending on what I pass. 934 00:42:54,480 --> 00:42:56,320 So what this is is more like a filter. 935 00:42:56,320 --> 00:42:58,160 It says that if I pass-- 936 00:42:58,160 --> 00:42:59,900 this function only accepts integers. 937 00:42:59,900 --> 00:43:02,150 If you pass something else, you should throw an error. 938 00:43:02,150 --> 00:43:03,733 Because if I don't want this function, 939 00:43:03,733 --> 00:43:08,080 because if I pass 3.7, if I fill out any number, 940 00:43:08,080 --> 00:43:09,922 if you look at the 3.7, I can check 941 00:43:09,922 --> 00:43:11,130 whether it's less than three. 942 00:43:11,130 --> 00:43:12,640 You can call it recursively. 943 00:43:12,640 --> 00:43:14,320 I mean, the function would run. 944 00:43:14,320 --> 00:43:16,510 It would just give nonsense, right? 945 00:43:16,510 --> 00:43:19,510 So I want to prevent someone from passing nonsense for this. 946 00:43:19,510 --> 00:43:24,210 So that's one reason to do a type declaration. 947 00:43:24,210 --> 00:43:28,660 But another reason is to do something called dispatch. 948 00:43:28,660 --> 00:43:30,490 So what we can do is we can define 949 00:43:30,490 --> 00:43:32,260 different versions of the function 950 00:43:32,260 --> 00:43:33,470 for different arguments. 951 00:43:33,470 --> 00:43:34,720 So. for example. 952 00:43:34,720 --> 00:43:37,820 another nicer version of that is a factorial function. 953 00:43:37,820 --> 00:43:40,240 So here is a stupid recursive implementation 954 00:43:40,240 --> 00:43:43,000 of a factorial function that takes an integer argument 955 00:43:43,000 --> 00:43:46,680 and just recursively calls itself an n minus 1. 956 00:43:46,680 --> 00:43:50,410 You can call it a 10 factorial, OK? 957 00:43:50,410 --> 00:43:53,440 If I want 100 factorial, I need to use a different type, not 958 00:43:53,440 --> 00:43:54,310 64-bit integers. 959 00:43:54,310 --> 00:43:59,450 I need some arbitrary precision integer. 960 00:43:59,450 --> 00:44:05,800 And since I said it was an integer, if I call in 3.7, 961 00:44:05,800 --> 00:44:07,930 it'll given an error. 962 00:44:07,930 --> 00:44:08,710 So that's good. 963 00:44:08,710 --> 00:44:11,330 But now I can find a different version of this. 964 00:44:11,330 --> 00:44:14,100 So actually, there is a generalization 965 00:44:14,100 --> 00:44:20,450 of factorial to arbitrary real, in fact, even complex numbers 966 00:44:20,450 --> 00:44:22,440 called a gamma function. 967 00:44:22,440 --> 00:44:25,710 And so I can define a fallback that 968 00:44:25,710 --> 00:44:29,970 works for any type of number that calls a gamma 969 00:44:29,970 --> 00:44:31,950 function from someplace else. 970 00:44:31,950 --> 00:44:36,090 And then if I can pass it to floating point value, I can-- 971 00:44:36,090 --> 00:44:39,480 if you take the factorial minus 1/2, 972 00:44:39,480 --> 00:44:41,670 it turns out that's actually a square root of pi. 973 00:44:41,670 --> 00:44:45,820 So if I square it, it gives pi, all right? 974 00:44:45,820 --> 00:44:51,300 So now I have one function and I have two methods, all right? 975 00:44:54,000 --> 00:44:57,610 So these types here, so there's a hierarchy of types. 976 00:44:57,610 --> 00:44:59,610 So this is what's called an abstract type, which 977 00:44:59,610 --> 00:45:01,110 most of you have probably seen. 978 00:45:01,110 --> 00:45:05,150 So there's a type called number. 979 00:45:05,150 --> 00:45:08,760 And underneath, there's a class of subtypes called integer. 980 00:45:08,760 --> 00:45:15,060 And underneath, there is, for example, int 64 or int 8 981 00:45:15,060 --> 00:45:17,100 for 64-bit integers. 982 00:45:17,100 --> 00:45:20,190 And underneath number there's actually 983 00:45:20,190 --> 00:45:21,750 another subtype called real. 984 00:45:21,750 --> 00:45:24,270 And underneath that there's a couple of subtypes. 985 00:45:24,270 --> 00:45:30,420 And then there's say flow 64 or float 32 986 00:45:30,420 --> 00:45:33,727 for a single precision 32-bit floating 987 00:45:33,727 --> 00:45:34,810 point number and so forth. 988 00:45:34,810 --> 00:45:36,990 So there's a hierarchy of these things. 989 00:45:36,990 --> 00:45:39,500 When I specify something can take integer 990 00:45:39,500 --> 00:45:42,660 I'm just saying so this type is not help the compiler. 991 00:45:42,660 --> 00:45:44,250 It's to provide a filter. 992 00:45:44,250 --> 00:45:46,700 So this method only works for these types. 993 00:45:46,700 --> 00:45:49,740 And this other method only works-- 994 00:45:49,740 --> 00:45:52,893 my second method works for any number type. 995 00:45:52,893 --> 00:45:54,810 So I have one thing that works for any number. 996 00:45:54,810 --> 00:45:55,310 Whoops. 997 00:45:55,310 --> 00:45:56,570 Here it is-- 998 00:45:56,570 --> 00:45:58,560 One that works for any number type, 999 00:45:58,560 --> 00:46:00,960 and one method that only works for integers. 1000 00:46:00,960 --> 00:46:03,780 So when I call it for 3, which ones does it 1001 00:46:03,780 --> 00:46:07,200 call, because it actually called both methods. 1002 00:46:07,200 --> 00:46:09,592 And what it does is it calls the most specific one. 1003 00:46:09,592 --> 00:46:11,800 It calls the one that sort of farthest down the tree. 1004 00:46:11,800 --> 00:46:14,880 So if I have a method defined for number and one defined 1005 00:46:14,880 --> 00:46:17,143 for integer, if I pass an integer, it'll do this. 1006 00:46:17,143 --> 00:46:18,810 If I have one that's defined for number, 1007 00:46:18,810 --> 00:46:20,070 one that's defined by integer, and one that's 1008 00:46:20,070 --> 00:46:21,983 defined specifically for int 8, and I 1009 00:46:21,983 --> 00:46:23,400 call it a --pass an 8 bit integer, 1010 00:46:23,400 --> 00:46:24,858 it'll call that version, all right? 1011 00:46:24,858 --> 00:46:26,688 So it gives you a filter. 1012 00:46:26,688 --> 00:46:28,980 But, in general, you can do this on multiple arguments. 1013 00:46:28,980 --> 00:46:32,070 So this is like the key abstraction in Julia, something 1014 00:46:32,070 --> 00:46:35,010 called multiple dispatch. 1015 00:46:35,010 --> 00:46:36,990 So this was not invented by Julia. 1016 00:46:36,990 --> 00:46:41,073 I guess it was present in Small Talk, and Dylan. 1017 00:46:41,073 --> 00:46:42,490 It's been in a bunch of languages. 1018 00:46:42,490 --> 00:46:44,073 It's been floating around for a while. 1019 00:46:44,073 --> 00:46:46,530 But it's not been in a lot of mainstream languages, 1020 00:46:46,530 --> 00:46:48,630 not in a high performance way. 1021 00:46:48,630 --> 00:46:50,910 And you can think of it as a generalization 1022 00:46:50,910 --> 00:46:53,220 of advertising and programming. 1023 00:46:53,220 --> 00:46:58,660 So I'm sure all of you have done object oriented programming 1024 00:46:58,660 --> 00:47:01,280 in Python or C++ or something like this. 1025 00:47:01,280 --> 00:47:03,670 So in object-oriented programming typically 1026 00:47:03,670 --> 00:47:09,230 the way you think of it is this is you save an object. 1027 00:47:09,230 --> 00:47:17,050 It's usually spelled object dot method xy for example, right? 1028 00:47:17,050 --> 00:47:27,930 And what it does, is this type, the object type 1029 00:47:27,930 --> 00:47:32,520 determines the method, right? 1030 00:47:32,520 --> 00:47:35,202 So you can have a method called plus. 1031 00:47:35,202 --> 00:47:37,410 But it would actually call a different class function 1032 00:47:37,410 --> 00:47:40,500 for a complex number or a real number, something like that, 1033 00:47:40,500 --> 00:47:44,260 or a method called length, which for a Python list 1034 00:47:44,260 --> 00:47:49,540 would call a different function than for an NumPy array, OK? 1035 00:47:49,540 --> 00:47:54,930 In Julia, the way you would spell the same thing 1036 00:47:54,930 --> 00:47:56,265 would you'd say method. 1037 00:48:04,072 --> 00:48:06,535 And you wouldn't say object dot method. 1038 00:48:06,535 --> 00:48:07,660 So you don't think of the-- 1039 00:48:07,660 --> 00:48:11,344 here, you think of the object as sort of owning the method. 1040 00:48:11,344 --> 00:48:12,690 all right? 1041 00:48:12,690 --> 00:48:14,380 And Julia-- the object would just 1042 00:48:14,380 --> 00:48:15,670 be maybe the first argument. 1043 00:48:15,670 --> 00:48:18,640 In fact, under the hood, if you looking in Python, for example, 1044 00:48:18,640 --> 00:48:21,370 the object is passed as an implicit first argument called 1045 00:48:21,370 --> 00:48:22,450 self, all right? 1046 00:48:22,450 --> 00:48:24,130 So it actually is doing this. 1047 00:48:24,130 --> 00:48:27,310 It's just different spelling of the same thing. 1048 00:48:27,310 --> 00:48:30,720 But as soon as you read it this way, you realized what Python 1049 00:48:30,720 --> 00:48:32,603 and what op languages are doing is 1050 00:48:32,603 --> 00:48:34,645 they're looking at the type of the first argument 1051 00:48:34,645 --> 00:48:36,820 to determine the method. 1052 00:48:36,820 --> 00:48:39,460 But why just the first argument? 1053 00:48:39,460 --> 00:48:45,880 In a multiple dispatch language, you look at all the types. 1054 00:48:53,940 --> 00:48:57,520 So this is sometimes-- in Julia, this will sometimes 1055 00:48:57,520 --> 00:49:03,340 be called single dispatch because determining 1056 00:49:03,340 --> 00:49:09,220 the method is called dispatch, figuring out 1057 00:49:09,220 --> 00:49:13,180 which function is spelled length, which function actually 1058 00:49:13,180 --> 00:49:15,510 are you calling this dispatching to the right function. 1059 00:49:15,510 --> 00:49:17,010 So this is called multiple dispatch. 1060 00:49:22,870 --> 00:49:27,910 And it's clearest if you look at something like a plus function. 1061 00:49:27,910 --> 00:49:33,170 So a plus function, if you do a plus 1062 00:49:33,170 --> 00:49:37,830 b, which plus you do really should depend on both a and b, 1063 00:49:37,830 --> 00:49:38,330 right? 1064 00:49:38,330 --> 00:49:41,340 It shouldn't depend on just a or just b. 1065 00:49:41,340 --> 00:49:44,000 And so it's actually quite awkward in languages, 1066 00:49:44,000 --> 00:49:48,230 in o op languages like Python or especially C++ to overload 1067 00:49:48,230 --> 00:49:51,560 a plus operation that operates on sort of mixed types. 1068 00:49:51,560 --> 00:49:54,890 As a consequence, for example, in C++ there's a built-in 1069 00:49:54,890 --> 00:49:56,460 complex number type. 1070 00:49:56,460 --> 00:49:59,780 So you can have a complex float, or complex double, 1071 00:49:59,780 --> 00:50:03,080 complex and complex with different real types. 1072 00:50:03,080 --> 00:50:06,972 But you can't add a complex float to a complex double. 1073 00:50:06,972 --> 00:50:08,930 You can't add a single-precision complex number 1074 00:50:08,930 --> 00:50:11,750 to a double-precision complex number or any mixed operation 1075 00:50:11,750 --> 00:50:13,190 because-- 1076 00:50:13,190 --> 00:50:15,110 any mixed complex operation because it 1077 00:50:15,110 --> 00:50:18,150 can't figure out who owns the method. 1078 00:50:18,150 --> 00:50:20,510 It doesn't have a way of doing that kind of promotion, 1079 00:50:20,510 --> 00:50:21,300 all right? 1080 00:50:21,300 --> 00:50:24,830 So in Julia, so now you can have a method 1081 00:50:24,830 --> 00:50:31,790 for adding a float 32 to a float 32, 1082 00:50:31,790 --> 00:50:34,990 but also a method for adding a-- 1083 00:50:34,990 --> 00:50:41,120 I don't know-- let's see, adding a complex number 1084 00:50:41,120 --> 00:50:43,732 to a real number, for example. 1085 00:50:43,732 --> 00:50:45,440 You want to specialize-- or a real number 1086 00:50:45,440 --> 00:50:46,260 to a complex number. 1087 00:50:46,260 --> 00:50:47,510 You want to specialize things. 1088 00:50:47,510 --> 00:50:50,970 In fact, we can click on the link here and see the code. 1089 00:50:50,970 --> 00:50:54,140 So the complex number to a real number in Julia 1090 00:50:54,140 --> 00:50:55,420 looks like this. 1091 00:50:55,420 --> 00:50:57,180 It's the most obvious thing. 1092 00:50:57,180 --> 00:50:59,080 It's implemented in Julia. 1093 00:50:59,080 --> 00:51:01,335 Plus complex real creates new complex number. 1094 00:51:01,335 --> 00:51:02,960 But you only have to add the real part. 1095 00:51:02,960 --> 00:51:05,360 You can leave the imaginary part alone. 1096 00:51:05,360 --> 00:51:08,760 And this works on any complex type. 1097 00:51:08,760 --> 00:51:17,380 OK, so there's too many methods for-- 1098 00:51:17,380 --> 00:51:19,000 OK, I can shrink that. 1099 00:51:19,000 --> 00:51:19,773 Let's see. 1100 00:51:23,560 --> 00:51:28,670 So but there's another type inference thing called-- 1101 00:51:28,670 --> 00:51:31,240 I'll just mention it briefly. 1102 00:51:31,240 --> 00:51:35,140 So one of the things you've to do to make this type inference 1103 00:51:35,140 --> 00:51:38,240 process work is given the types of the arguments 1104 00:51:38,240 --> 00:51:41,440 you have to figure out the type of the return value, OK? 1105 00:51:41,440 --> 00:51:43,690 So that means when you assign a function, 1106 00:51:43,690 --> 00:51:46,320 it has to be what's called type stable. 1107 00:51:46,320 --> 00:51:48,040 The type of the result should depend 1108 00:51:48,040 --> 00:51:49,960 on the types of the arguments and not 1109 00:51:49,960 --> 00:51:52,327 on the values of the arguments because the types 1110 00:51:52,327 --> 00:51:53,410 are known at compile time. 1111 00:51:53,410 --> 00:51:55,810 The values are only known at runtime. 1112 00:51:55,810 --> 00:51:59,950 And it turns out if you don't have this in mind, in C, 1113 00:51:59,950 --> 00:52:01,480 you have no choice but to obey this. 1114 00:52:01,480 --> 00:52:03,100 But in something like Python and dynamic language, 1115 00:52:03,100 --> 00:52:05,517 like Python and Matlab, if you're not thinking about this, 1116 00:52:05,517 --> 00:52:08,603 it's really easy to design things so that it doesn't work, 1117 00:52:08,603 --> 00:52:09,520 so that it's not true. 1118 00:52:09,520 --> 00:52:12,790 So a classic example is a square root function. 1119 00:52:12,790 --> 00:52:16,720 All right, so suppose I pass an integer to it, OK? 1120 00:52:16,720 --> 00:52:19,090 So the square root of-- 1121 00:52:19,090 --> 00:52:24,000 let's do square root of 5, all right? 1122 00:52:24,000 --> 00:52:27,300 The result has to be floating point number, right? 1123 00:52:27,300 --> 00:52:28,920 It's 2.23-- whatever. 1124 00:52:28,920 --> 00:52:31,230 So if I do square root of 4, of course, 1125 00:52:31,230 --> 00:52:33,190 that square root is an integer. 1126 00:52:33,190 --> 00:52:34,950 But if I return an integer for that type, 1127 00:52:34,950 --> 00:52:36,617 then it wouldn't be type stable anymore. 1128 00:52:36,617 --> 00:52:39,217 Than the return value type would depend 1129 00:52:39,217 --> 00:52:41,550 on the value of the input, whether it's a perfect square 1130 00:52:41,550 --> 00:52:42,420 or not, all right? 1131 00:52:42,420 --> 00:52:44,457 So it was returned to floating point value, 1132 00:52:44,457 --> 00:52:45,790 even if the input is an integer. 1133 00:52:45,790 --> 00:52:46,720 Yes? 1134 00:52:46,720 --> 00:52:49,103 AUDIENCE: If you have enough methods 1135 00:52:49,103 --> 00:52:51,520 to find for a bunch of different types of lookup function? 1136 00:52:51,520 --> 00:52:52,970 Can that become really slow? 1137 00:52:52,970 --> 00:52:55,230 STEVEN JOHNSON: Well, so the lookup comes-- 1138 00:52:55,230 --> 00:52:56,500 it comes at compile time. 1139 00:52:56,500 --> 00:52:59,560 So it's really kind of irrelevant. 1140 00:52:59,560 --> 00:53:02,300 At least if type inference succeeds, 1141 00:53:02,300 --> 00:53:05,430 if type inference fails, then it's runtime, it's slower. 1142 00:53:05,430 --> 00:53:07,520 But it's not like-- 1143 00:53:07,520 --> 00:53:09,000 it's like a tree surge. 1144 00:53:09,000 --> 00:53:11,880 So it's not-- it's not as slow as you might think. 1145 00:53:11,880 --> 00:53:14,460 But most of the time you don't worry about that 1146 00:53:14,460 --> 00:53:16,440 because if you care about performance 1147 00:53:16,440 --> 00:53:20,170 you want to arrange your code so that type inference succeeds. 1148 00:53:20,170 --> 00:53:21,523 So you prototype maybe to-- 1149 00:53:21,523 --> 00:53:23,940 this is something that you do in performance optimization. 1150 00:53:23,940 --> 00:53:26,315 Like when you're prototyping, you don't care about types, 1151 00:53:26,315 --> 00:53:27,330 you say x equals 3. 1152 00:53:27,330 --> 00:53:29,490 And the next line you say x equals an array-- 1153 00:53:29,490 --> 00:53:30,030 whatever. 1154 00:53:30,030 --> 00:53:31,170 But when you're optimizing your code, 1155 00:53:31,170 --> 00:53:32,628 then, OK, you tweak it a little bit 1156 00:53:32,628 --> 00:53:35,100 to make sure that things don't change types willy nilly 1157 00:53:35,100 --> 00:53:37,260 and that the types of function depend 1158 00:53:37,260 --> 00:53:39,580 on the types of the arguments not on the values. 1159 00:53:39,580 --> 00:53:42,990 So, as I mentioned, square root is what really confuses people 1160 00:53:42,990 --> 00:53:46,410 at first, is if you take square root of minus 1, 1161 00:53:46,410 --> 00:53:49,690 you might think you should get a complex value. 1162 00:53:49,690 --> 00:53:53,310 And instead, it gives you an error, right? 1163 00:53:53,310 --> 00:53:57,510 And basically, what are the choices here? 1164 00:53:57,510 --> 00:53:59,460 It could give you an error. 1165 00:53:59,460 --> 00:54:01,420 It could give you a complex value. 1166 00:54:01,420 --> 00:54:03,090 But if it gave you a complex value, 1167 00:54:03,090 --> 00:54:04,770 then the return type of square root 1168 00:54:04,770 --> 00:54:08,532 would depend upon the value of the input, not just the type. 1169 00:54:08,532 --> 00:54:10,990 So Matlab, for example, if you take square root of minus 1, 1170 00:54:10,990 --> 00:54:13,010 it will happily give you a complex number. 1171 00:54:13,010 --> 00:54:15,660 But as a result, if you have Matlab, 1172 00:54:15,660 --> 00:54:17,760 Matlab has a compiler, right? 1173 00:54:17,760 --> 00:54:20,130 But it has many, many challenges/ but one simple 1174 00:54:20,130 --> 00:54:22,920 thing to understand is if the Matlab compiler suite sees 1175 00:54:22,920 --> 00:54:26,520 a square root function, anywhere in your function, 1176 00:54:26,520 --> 00:54:28,540 even if it knows the inputs that are real, 1177 00:54:28,540 --> 00:54:32,160 it doesn't know if the outputs are complex or real unless it 1178 00:54:32,160 --> 00:54:34,800 can prove that the inputs were positive or non-negative, 1179 00:54:34,800 --> 00:54:35,370 right? 1180 00:54:35,370 --> 00:54:39,540 And that means it could then compile two code 1181 00:54:39,540 --> 00:54:41,210 pass for the output, all right? 1182 00:54:41,210 --> 00:54:43,110 But then suppose it calls square root again 1183 00:54:43,110 --> 00:54:44,360 or some other function, right? 1184 00:54:44,360 --> 00:54:47,490 You quickly get a combinatorial explosion 1185 00:54:47,490 --> 00:54:51,007 of possible code paths because of possible types. 1186 00:54:51,007 --> 00:54:52,590 And so at some point, you just give up 1187 00:54:52,590 --> 00:54:53,605 and put things in a box. 1188 00:54:53,605 --> 00:54:55,230 But as soon as you put things in a box, 1189 00:54:55,230 --> 00:54:56,855 and you're looking up types at runtime, 1190 00:54:56,855 --> 00:54:59,860 you're dead from a performance perspective. 1191 00:54:59,860 --> 00:55:02,190 So Python, actually-- s if you want a complex result 1192 00:55:02,190 --> 00:55:04,440 from square root, you have to give it a complex input. 1193 00:55:04,440 --> 00:55:09,690 So in Julia, a complex number, the I is actually m. 1194 00:55:09,690 --> 00:55:12,930 They decide I is too useful for loop variables. 1195 00:55:12,930 --> 00:55:15,362 So I and J. So m is the complex unit. 1196 00:55:15,362 --> 00:55:17,320 And if you take square root of a complex input, 1197 00:55:17,320 --> 00:55:19,740 it gives you a complex output. 1198 00:55:19,740 --> 00:55:21,600 So Python actually does the same thing. 1199 00:55:21,600 --> 00:55:23,225 So if in Python if it takes square root 1200 00:55:23,225 --> 00:55:25,680 of a negative value, it gives an error 1201 00:55:25,680 --> 00:55:28,310 unless you give it a complex input. 1202 00:55:28,310 --> 00:55:32,220 But Python made other mistakes. 1203 00:55:32,220 --> 00:55:36,240 So, for example, in Python, an integer 1204 00:55:36,240 --> 00:55:38,250 is guaranteed never to overflow. 1205 00:55:38,250 --> 00:55:42,390 If you add 11 plus 1 plus 1 over and over again in Python, 1206 00:55:42,390 --> 00:55:45,240 eventually overflow the size of a 64-bit integer, 1207 00:55:45,240 --> 00:55:47,010 and Python will just switch under the hood 1208 00:55:47,010 --> 00:55:50,460 to an arbitrary position in an integer, which 1209 00:55:50,460 --> 00:55:52,980 seem like a nice idea probably at the time. 1210 00:55:52,980 --> 00:55:55,260 And the rest in Python is so slow 1211 00:55:55,260 --> 00:55:58,620 that the performance cost of this test 1212 00:55:58,620 --> 00:56:00,660 makes no difference in a typical Python code. 1213 00:56:00,660 --> 00:56:02,610 But it makes it very difficult to compile. 1214 00:56:02,610 --> 00:56:05,190 Because that means if you have integer inputs and you see x 1215 00:56:05,190 --> 00:56:11,010 plus 1 in Python, the compiler cannot just compile that to one 1216 00:56:11,010 --> 00:56:15,570 instruction, because unless you can somehow prove that x is 1217 00:56:15,570 --> 00:56:17,440 sufficiently small. 1218 00:56:17,440 --> 00:56:20,240 So in Julia, integer arithmetic will overflow. 1219 00:56:20,240 --> 00:56:24,610 But the default integer arithmetic it's 64 bits. 1220 00:56:24,610 --> 00:56:26,310 So in practice that never overflows 1221 00:56:26,310 --> 00:56:27,770 unless you're doing number theory. 1222 00:56:27,770 --> 00:56:29,895 And you usually know if you're doing number theory, 1223 00:56:29,895 --> 00:56:32,310 and then use arbitrary precision integers. 1224 00:56:32,310 --> 00:56:34,500 It was much worse in the days-- 1225 00:56:34,500 --> 00:56:36,510 this is something people worried a lot 1226 00:56:36,510 --> 00:56:40,230 before you were born when there were 16-bit machines, right? 1227 00:56:40,230 --> 00:56:42,180 And integers, it's really, really easy 1228 00:56:42,180 --> 00:56:49,510 to overflow 16 bits because the biggest sine value is then 1229 00:56:49,510 --> 00:56:52,962 32,767, right? 1230 00:56:52,962 --> 00:56:54,420 So it's really easy to overflow it. 1231 00:56:54,420 --> 00:56:56,460 So you're constantly worrying about overflow. 1232 00:56:56,460 --> 00:56:59,490 And even 32 bits, the biggest sign value is 2 billion. 1233 00:56:59,490 --> 00:57:02,550 It's really easy to overflow that, even just counting bytes, 1234 00:57:02,550 --> 00:57:03,050 right? 1235 00:57:03,050 --> 00:57:05,830 You can have files that are bigger than 2 gigabytes easily 1236 00:57:05,830 --> 00:57:06,330 nowadays. 1237 00:57:06,330 --> 00:57:08,580 So some people worried about this all the time. 1238 00:57:08,580 --> 00:57:10,260 There were 64-bit integers. 1239 00:57:10,260 --> 00:57:12,150 Basically, 64-bit integer will never 1240 00:57:12,150 --> 00:57:14,340 overflow if it's counting objects 1241 00:57:14,340 --> 00:57:18,262 that exist in the real universe, like bytes, or loop iterations, 1242 00:57:18,262 --> 00:57:19,220 or something like that. 1243 00:57:19,220 --> 00:57:21,860 So you just-- then you just say, OK, it's either 64 bits, 1244 00:57:21,860 --> 00:57:23,110 or you're doing number theory. 1245 00:57:23,110 --> 00:57:24,850 You should big ints. 1246 00:57:24,850 --> 00:57:26,300 So OK. 1247 00:57:26,300 --> 00:57:28,058 So let me talk about-- 1248 00:57:28,058 --> 00:57:30,100 the final thing I want to talk about-- let's see, 1249 00:57:30,100 --> 00:57:33,050 how much [INAUDIBLE] good-- 1250 00:57:33,050 --> 00:57:35,810 is defining our own types. 1251 00:57:35,810 --> 00:57:40,240 So this is the real test of the language, right? 1252 00:57:40,240 --> 00:57:43,207 So it's easy to make a language where 1253 00:57:43,207 --> 00:57:45,040 there is a certain built-in set of functions 1254 00:57:45,040 --> 00:57:48,580 and built-in types, and those things are fast. 1255 00:57:48,580 --> 00:57:51,040 So, for example, for Python, there actually 1256 00:57:51,040 --> 00:57:54,070 is a compiler called numba that does exactly what Julia does. 1257 00:57:54,070 --> 00:57:57,220 It looks at the arguments, type specializes things, 1258 00:57:57,220 --> 00:58:00,280 and then calls llvm and compiles it to fast code. 1259 00:58:00,280 --> 00:58:03,680 But it only works if you're only container type as an NumPy 1260 00:58:03,680 --> 00:58:06,760 array and you're only scalar type is one of the 12 scalar 1261 00:58:06,760 --> 00:58:08,470 types that NumPy supports. 1262 00:58:08,470 --> 00:58:11,110 If you have your own user defined number type 1263 00:58:11,110 --> 00:58:13,060 or your own user defined container type, 1264 00:58:13,060 --> 00:58:15,220 then it doesn't work. 1265 00:58:15,220 --> 00:58:16,870 And user-defined container types, 1266 00:58:16,870 --> 00:58:19,468 it's probably easy to understand why that's useful. 1267 00:58:19,468 --> 00:58:21,760 User defined number types are extremely useful as well. 1268 00:58:21,760 --> 00:58:25,480 So, for example, there's a package in Julia 1269 00:58:25,480 --> 00:58:29,090 that provides the number type called dual numbers. 1270 00:58:29,090 --> 00:58:31,210 And those have the property that if you pass them 1271 00:58:31,210 --> 00:58:35,030 into the function they compute the function in its derivative. 1272 00:58:35,030 --> 00:58:37,750 And just a slightly different. 1273 00:58:37,750 --> 00:58:40,520 It basically carries around function an derivative values 1274 00:58:40,520 --> 00:58:42,430 and has a slightly different plus and times 1275 00:58:42,430 --> 00:58:45,700 and so forth that just do the product rule and so forth. 1276 00:58:45,700 --> 00:58:47,470 And it just propagates derivatives. 1277 00:58:47,470 --> 00:58:51,610 And then if you have Julia code, like that Vandermonde function, 1278 00:58:51,610 --> 00:58:54,340 it will just compute its derivative as well. 1279 00:58:54,340 --> 00:58:56,503 OK, so I want to be able to find my own type. 1280 00:58:56,503 --> 00:58:58,420 So a very simple type that I might want to add 1281 00:58:58,420 --> 00:59:02,560 would be points 2D vectors in two space, right? 1282 00:59:02,560 --> 00:59:05,320 So, of course, I could have an array of two values. 1283 00:59:05,320 --> 00:59:08,140 But an array is a really heavyweight object 1284 00:59:08,140 --> 00:59:11,470 for just two values, right? 1285 00:59:11,470 --> 00:59:13,330 If I know at compile time there's 1286 00:59:13,330 --> 00:59:16,150 two values that I don't need to have a pointer to-- 1287 00:59:16,150 --> 00:59:17,980 I can actually store them in registers. 1288 00:59:17,980 --> 00:59:20,350 I I can unroll the loop over these and everything 1289 00:59:20,350 --> 00:59:21,740 should be faster. 1290 00:59:21,740 --> 00:59:24,370 You can get an order of magnitude and speed 1291 00:59:24,370 --> 00:59:29,140 by specializing on the number of elements for small arrays 1292 00:59:29,140 --> 00:59:33,340 compared to just a general ray data structure. 1293 00:59:33,340 --> 00:59:35,650 So let's make a point, OK? 1294 00:59:35,650 --> 00:59:40,530 So this is-- and I'm going to go through several iterations, 1295 00:59:40,530 --> 00:59:41,890 starting with a slow iteration. 1296 00:59:41,890 --> 00:59:45,160 I'm going to define a mutable struct. 1297 00:59:45,160 --> 00:59:47,480 OK, so this will be a mutable object 1298 00:59:47,480 --> 00:59:49,630 where I can add a [INAUDIBLE] point 1299 00:59:49,630 --> 00:59:51,280 that has two values x and y. 1300 00:59:51,280 --> 00:59:53,010 It can be of any type. 1301 00:59:53,010 --> 00:59:56,862 I'll define a plus function that can add them. 1302 00:59:56,862 --> 00:59:58,320 And it does the most obvious thing. 1303 00:59:58,320 --> 01:00:01,330 It adds the x components, adds the y components. 1304 01:00:01,330 --> 01:00:03,880 I'll define a 0 function that's the additive identity that 1305 01:00:03,880 --> 01:00:06,280 just returns the point 0, 0. 1306 01:00:06,280 --> 01:00:09,040 And then I can construct an object Point34. 1307 01:00:09,040 --> 01:00:11,500 I can say 034 plus 0.56. 1308 01:00:11,500 --> 01:00:13,150 It works. 1309 01:00:13,150 --> 01:00:15,010 It can hold actually-- right now it's 1310 01:00:15,010 --> 01:00:17,560 very generic, and probably too generic. 1311 01:00:17,560 --> 01:00:20,200 So they act like the real part can be a floating point 1312 01:00:20,200 --> 01:00:22,482 number and the imaginary. 1313 01:00:22,482 --> 01:00:24,190 The x can be a floating point number here 1314 01:00:24,190 --> 01:00:28,660 and the y is a complex number of two integers, 1315 01:00:28,660 --> 01:00:31,595 or even I can make a string and an array. 1316 01:00:31,595 --> 01:00:32,720 It doesn't even make sense. 1317 01:00:32,720 --> 01:00:35,020 So I probably should have restricted 1318 01:00:35,020 --> 01:00:36,417 the types of x and y a little bit 1319 01:00:36,417 --> 01:00:38,500 just to prevent the user from putting in something 1320 01:00:38,500 --> 01:00:42,340 that makes no sense at all, OK? 1321 01:00:42,340 --> 01:00:44,230 So these things, they can be anything. 1322 01:00:44,230 --> 01:00:48,430 So this type is not ideal in several ways. 1323 01:00:48,430 --> 01:00:52,900 So let's think about how this has to be stored in memory. 1324 01:00:52,900 --> 01:00:58,290 So this is a 0.1. 1325 01:00:58,290 --> 01:01:03,150 0.11, 3.7, right? 1326 01:01:03,150 --> 01:01:05,470 So in memory it's-- 1327 01:01:08,000 --> 01:01:11,680 there is an x and there is a y. 1328 01:01:11,680 --> 01:01:14,180 But x and y can be of any type. 1329 01:01:14,180 --> 01:01:18,370 So that means they have to be pointers to boxes. 1330 01:01:18,370 --> 01:01:21,460 There's pointer to int 1 and there's 1331 01:01:21,460 --> 01:01:29,090 a pointer to a float 64, in this case, 3.7. 1332 01:01:29,090 --> 01:01:30,550 So oh, this already, we know this 1333 01:01:30,550 --> 01:01:33,410 is not going to be good news for performance. 1334 01:01:33,410 --> 01:01:34,690 And it's mutable. 1335 01:01:34,690 --> 01:01:39,610 So that mutable struct means if I take p equals a point, 1336 01:01:39,610 --> 01:01:41,380 I can then say p dot x equals 7. 1337 01:01:41,380 --> 01:01:44,890 I can change the value, which seems like a harmless thing 1338 01:01:44,890 --> 01:01:49,640 to do, but actually is a big problem because, for example, 1339 01:01:49,640 --> 01:01:55,810 if I make an array of a three piece, 1340 01:01:55,810 --> 01:01:59,920 and then I say p dot y equals 8, and I look at that array, 1341 01:01:59,920 --> 01:02:03,650 it has to change the y component, OK? 1342 01:02:03,650 --> 01:02:06,772 So if I have a p, or in general, if I 1343 01:02:06,772 --> 01:02:11,540 have a p that's looking at that object, 1344 01:02:11,540 --> 01:02:13,720 if this is an object you can mutate, 1345 01:02:13,720 --> 01:02:17,842 it means that if I have another element, a q, 1346 01:02:17,842 --> 01:02:19,300 it's also pointing the same object. 1347 01:02:19,300 --> 01:02:24,280 And I say p, p dot x equals 4, then 1348 01:02:24,280 --> 01:02:29,330 q dot x had better also before at that point. 1349 01:02:29,330 --> 01:02:33,070 So to have mutable semantics, to have the semantics of something 1350 01:02:33,070 --> 01:02:35,580 you can change, and other references 1351 01:02:35,580 --> 01:02:38,110 can see that change, that means that this object has 1352 01:02:38,110 --> 01:02:40,235 to be stored in memory on the heap as a pointer 1353 01:02:40,235 --> 01:02:42,110 to two objects so that you have other pointer 1354 01:02:42,110 --> 01:02:44,410 to the same object, and I mutate it, 1355 01:02:44,410 --> 01:02:45,998 and the other references see it. 1356 01:02:45,998 --> 01:02:48,040 It can't just be stuck in a register or something 1357 01:02:48,040 --> 01:02:49,240 like that. 1358 01:02:49,240 --> 01:02:51,970 It has to be something that other references can see. 1359 01:02:51,970 --> 01:02:53,710 So this is bad. 1360 01:02:53,710 --> 01:02:58,210 So if I have, so I can call 0.1 dot aa 1361 01:02:58,210 --> 01:02:59,850 calls the constructor element-wise. 1362 01:02:59,850 --> 01:03:01,890 A is this array of 10 to the 7 random numbers. 1363 01:03:01,890 --> 01:03:03,182 I was benchmarking them before. 1364 01:03:03,182 --> 01:03:06,310 That was taking 10 milliseconds, OK? 1365 01:03:06,310 --> 01:03:09,790 And I can sum it. 1366 01:03:09,790 --> 01:03:11,915 I can call the built-in some function on this. 1367 01:03:11,915 --> 01:03:13,540 I can even call my sum function on this 1368 01:03:13,540 --> 01:03:15,150 because it supports a 0 function. 1369 01:03:15,150 --> 01:03:16,640 And so it supports a plus. 1370 01:03:16,640 --> 01:03:18,230 So here, I have an array. 1371 01:03:18,230 --> 01:03:22,000 If i just go back up, I have an array here of 10 1372 01:03:22,000 --> 01:03:25,330 to the 7 values of type 0.1. 1373 01:03:25,330 --> 01:03:28,870 So the type of 0.1 is attached to the array. 1374 01:03:31,500 --> 01:03:38,030 So the array and memory, so I have an array of 0.1, 1375 01:03:38,030 --> 01:03:40,030 the one here means it's a one-dimensional array. 1376 01:03:40,030 --> 01:03:42,220 There's also 2D, 3D, and so forth. 1377 01:03:42,220 --> 01:03:45,940 That looks like a 0.1 value, a 0.1 value, a 0.1 value. 1378 01:03:48,870 --> 01:03:50,290 But each one of those now-- 1379 01:03:50,290 --> 01:03:55,510 sorry-- has to be a pointer to an x and a y, 1380 01:03:55,510 --> 01:03:57,900 which themselves are pointers to boxes. 1381 01:03:57,900 --> 01:04:00,642 All right, so summing is going to be really slow 1382 01:04:00,642 --> 01:04:02,350 because there's a lot of pointer chasing. 1383 01:04:02,350 --> 01:04:04,750 It has to run time, check what's the type of x, 1384 01:04:04,750 --> 01:04:06,220 what's the type of y. 1385 01:04:06,220 --> 01:04:07,540 And, in fact, it was. 1386 01:04:07,540 --> 01:04:10,240 It took instead of 10 milliseconds, 1387 01:04:10,240 --> 01:04:14,470 it took 500 or 600 milliseconds. 1388 01:04:14,470 --> 01:04:17,390 So to do better, we need to do two things. 1389 01:04:17,390 --> 01:04:21,850 So, first of all, x and y, we have 1390 01:04:21,850 --> 01:04:25,050 to be able to see what type they are, OK? 1391 01:04:25,050 --> 01:04:27,140 It can't be just any arbitrary old thing 1392 01:04:27,140 --> 01:04:30,230 that has to be a pointer to a box, OK? 1393 01:04:30,230 --> 01:04:32,210 And the point object has to be mutable. 1394 01:04:32,210 --> 01:04:36,860 It has to be something where if I have p equals something, 1395 01:04:36,860 --> 01:04:39,870 q equals something, I can't change p and expect q 1396 01:04:39,870 --> 01:04:40,370 to see it. 1397 01:04:40,370 --> 01:04:47,030 Otherwise, if it's mutable, those semantics 1398 01:04:47,030 --> 01:04:50,020 have to be implemented as some pointer to an object someplace. 1399 01:04:50,020 --> 01:04:51,200 Then you're dead. 1400 01:04:51,200 --> 01:04:53,920 So I can just say struct. 1401 01:04:53,920 --> 01:04:56,530 So struct now is not mutable. 1402 01:04:56,530 --> 01:04:58,030 It doesn't have the mutable keyword. 1403 01:04:58,030 --> 01:04:59,488 And I can give the arguments types. 1404 01:04:59,488 --> 01:05:02,150 I can say they're both flow 64. 1405 01:05:02,150 --> 01:05:04,280 And x and y are both the same type. 1406 01:05:04,280 --> 01:05:05,030 They're both 64. 1407 01:05:05,030 --> 01:05:06,655 But floating point numbers, I'll define 1408 01:05:06,655 --> 01:05:11,750 plus the same way, 0 the same way, and now I can add them 1409 01:05:11,750 --> 01:05:12,830 and so forth. 1410 01:05:12,830 --> 01:05:15,810 But now if I make an array of these things, 1411 01:05:15,810 --> 01:05:18,463 and if I say p dot x equals 6, it will give an error. 1412 01:05:18,463 --> 01:05:19,880 It says you can't actually mutate. 1413 01:05:19,880 --> 01:05:23,030 Don't even try to mutate it because we can't support 1414 01:05:23,030 --> 01:05:24,980 those semantics on this type. 1415 01:05:24,980 --> 01:05:30,840 But that means so that type is actually-- 1416 01:05:30,840 --> 01:05:34,880 if you look at look at that in memory, 1417 01:05:34,880 --> 01:05:36,500 what the compiler is allowed to do 1418 01:05:36,500 --> 01:05:42,920 and what it does do for this is if you have an array of points 1419 01:05:42,920 --> 01:05:52,620 0.21, then it looks like just the x value, the y value, 1420 01:05:52,620 --> 01:05:56,040 The value, the y value, and so forth. 1421 01:05:56,040 --> 01:06:02,900 But each of these are exactly one 8 byte flow 64. 1422 01:06:02,900 --> 01:06:05,550 And all the types are known at compile time. 1423 01:06:05,550 --> 01:06:09,580 And so if I sum them, it should take about 0.20 milliseconds, 1424 01:06:09,580 --> 01:06:10,080 right? 1425 01:06:10,080 --> 01:06:12,888 Because summing real numbers was 10 milliseconds. 1426 01:06:12,888 --> 01:06:14,930 And this is twice as many because you have to sum 1427 01:06:14,930 --> 01:06:16,760 the x's, sum the y's. 1428 01:06:16,760 --> 01:06:19,580 And let's benchmark it. 1429 01:06:19,580 --> 01:06:22,270 And let's see. 1430 01:06:22,270 --> 01:06:25,130 Oh, actually, some of the real numbers took 5 milliseconds. 1431 01:06:25,130 --> 01:06:26,910 So something should take about 10. 1432 01:06:26,910 --> 01:06:28,550 Let's see if that's still true. 1433 01:06:28,550 --> 01:06:30,230 Yeah, it took about 10. 1434 01:06:30,230 --> 01:06:32,870 So actually, the compiler is smart enough. 1435 01:06:32,870 --> 01:06:35,630 So, first of all, it stores this in line 1436 01:06:35,630 --> 01:06:38,210 as one big block, consecutive block of memory. 1437 01:06:38,210 --> 01:06:41,450 And then when you sum them, remember our sum function. 1438 01:06:41,450 --> 01:06:43,910 Well, this is the built-in sum. 1439 01:06:43,910 --> 01:06:47,180 But our sum function will work in the same way. 1440 01:06:47,180 --> 01:06:48,890 The compiler-- llvm will be smart enough 1441 01:06:48,890 --> 01:06:51,260 to say, oh, I can load this into a register, 1442 01:06:51,260 --> 01:06:55,640 load y into a register, call, have a tight loop where 1443 01:06:55,640 --> 01:06:59,460 I basically call one instruction to sum the x's, 1444 01:06:59,460 --> 01:07:02,668 one instruction to sum the y's, and then repeat. 1445 01:07:02,668 --> 01:07:04,460 And so it's about as good as you could get. 1446 01:07:04,460 --> 01:07:07,370 But you paid a big price. 1447 01:07:07,370 --> 01:07:10,400 We've lost all generality, right? 1448 01:07:10,400 --> 01:07:14,780 These can only be two 64-bit floating point numbers. 1449 01:07:14,780 --> 01:07:17,840 I can't have two single-precision numbers or-- 1450 01:07:17,840 --> 01:07:23,300 this is like a struct of two doubles in C. 1451 01:07:23,300 --> 01:07:25,658 So if I have to do this to get performance in Julia, 1452 01:07:25,658 --> 01:07:26,450 then it would suck. 1453 01:07:26,450 --> 01:07:31,370 Basically, I'm basically writing C code 1454 01:07:31,370 --> 01:07:33,030 in a slightly higher level syntax. 1455 01:07:33,030 --> 01:07:36,830 I'm losing that benefit of using a high level language. 1456 01:07:36,830 --> 01:07:41,330 So the way you get around this is you define-- 1457 01:07:41,330 --> 01:07:45,448 what you want is to define something like this 0.2 type. 1458 01:07:45,448 --> 01:07:47,240 But you don't want to define just one type. 1459 01:07:47,240 --> 01:07:49,430 You want to define a whole family of types. 1460 01:07:49,430 --> 01:07:52,470 You want to define this for two float 64s, two float 32s. 1461 01:07:52,470 --> 01:07:54,870 In fact, you want to define an infinite family of types, 1462 01:07:54,870 --> 01:07:57,726 at two things of any type you want as long as they're two 1463 01:07:57,726 --> 01:08:00,240 real numbers, two real types. 1464 01:08:00,240 --> 01:08:04,040 And so the way you do that in Julia is a parametrized type. 1465 01:08:04,040 --> 01:08:06,170 This is called parametric polymorphism. 1466 01:08:06,170 --> 01:08:10,160 It's similar to what you see in C++ templates. 1467 01:08:10,160 --> 01:08:11,880 So now I have a struct. 1468 01:08:11,880 --> 01:08:12,890 It's not mutable-- 1469 01:08:12,890 --> 01:08:13,880 Point3. 1470 01:08:13,880 --> 01:08:16,439 But the curly braces t. 1471 01:08:16,439 --> 01:08:18,500 It says it's parametrized by type t. 1472 01:08:18,500 --> 01:08:19,399 So x and y-- 1473 01:08:19,399 --> 01:08:20,960 I've restricted it slightly here. 1474 01:08:20,960 --> 01:08:22,850 I've said x and y had to be the same type. 1475 01:08:22,850 --> 01:08:25,725 I didn't have to do that, but I could 1476 01:08:25,725 --> 01:08:27,850 have had two parameters, one for the type of x, one 1477 01:08:27,850 --> 01:08:28,558 to the type of y. 1478 01:08:28,558 --> 01:08:31,052 But most the time you'd be doing something like this, 1479 01:08:31,052 --> 01:08:32,760 you'd want them both to be the same type. 1480 01:08:32,760 --> 01:08:36,050 But they could be both 64s, both float 32s, 1481 01:08:36,050 --> 01:08:37,700 both integers, whatever. 1482 01:08:37,700 --> 01:08:43,020 So t is any type that less than colon means is a subtype of. 1483 01:08:43,020 --> 01:08:46,205 So t is any subtype of real. 1484 01:08:46,205 --> 01:08:47,450 It could be float 64. 1485 01:08:47,450 --> 01:08:48,470 It could be int 64. 1486 01:08:48,470 --> 01:08:49,323 It could be int 8. 1487 01:08:49,323 --> 01:08:50,240 It could be big float. 1488 01:08:50,240 --> 01:08:51,573 It could be a user defined type. 1489 01:08:51,573 --> 01:08:53,210 It doesn't care. 1490 01:08:53,210 --> 01:08:54,460 So this is really not-- 1491 01:08:54,460 --> 01:08:58,120 it's a Point3 here. 1492 01:08:58,120 --> 01:08:59,180 It is a whole hierarchy. 1493 01:08:59,180 --> 01:09:01,630 So I'm not defining one type. 1494 01:09:01,630 --> 01:09:05,410 I'm defining a whole set of types. 1495 01:09:05,410 --> 01:09:07,859 So Point3 is a set of types. 1496 01:09:07,859 --> 01:09:13,410 There's a point Point3 int 64. 1497 01:09:13,410 --> 01:09:24,590 There is a Point3 float 32, a float 64 and so on. 1498 01:09:24,590 --> 01:09:27,120 Infinitely, many types, as many as you want, 1499 01:09:27,120 --> 01:09:30,630 and basically, it'll create more types on the fly 1500 01:09:30,630 --> 01:09:32,359 just by instantiating. 1501 01:09:32,359 --> 01:09:34,910 So, for example, otherwise it looks the same. 1502 01:09:34,910 --> 01:09:36,880 The plus function is basically the same. 1503 01:09:36,880 --> 01:09:38,810 I add the x components, the y components. 1504 01:09:38,810 --> 01:09:42,770 The 0 function is the same. 1505 01:09:42,770 --> 01:09:46,310 Except now I make sure there's zeros of type t, whatever 1506 01:09:46,310 --> 01:09:47,590 that type is. 1507 01:09:47,590 --> 01:09:51,350 And now if I say Point34, now I'm 1508 01:09:51,350 --> 01:09:54,590 instantiating a particular instance of this. 1509 01:09:54,590 --> 01:09:57,298 And now that particular instance of Point3 we'll have-- 1510 01:09:57,298 --> 01:09:58,340 this is an abstract type. 1511 01:09:58,340 --> 01:10:00,050 We'll have one of these concrete types. 1512 01:10:00,050 --> 01:10:01,980 And the concrete type it has in this case 1513 01:10:01,980 --> 01:10:08,180 is a Point3 of two int 64s, two 64-bit integers, OK? 1514 01:10:08,180 --> 01:10:09,020 And I can add them. 1515 01:10:11,690 --> 01:10:14,990 And actually, adding mixed types will already work 1516 01:10:14,990 --> 01:10:22,030 because the plus, the addition function here, 1517 01:10:22,030 --> 01:10:24,430 it works for any 2.3s. 1518 01:10:24,430 --> 01:10:27,912 I didn't say there had to Point3s of the same type. 1519 01:10:27,912 --> 01:10:30,120 Any two of these, they don't have to be two of these. 1520 01:10:30,120 --> 01:10:32,290 They could be one of these and one of these. 1521 01:10:32,290 --> 01:10:34,390 And then it determines the type of the result 1522 01:10:34,390 --> 01:10:38,120 by the type of the [INAUDIBLE] it does type inference. 1523 01:10:38,120 --> 01:10:40,900 So if you have a point Point3 of two int 64s 1524 01:10:40,900 --> 01:10:45,850 and Point3 of two float 32s, it says, oh, p dot x is an int 64. 1525 01:10:45,850 --> 01:10:48,970 Q dot x is a full 64. 1526 01:10:48,970 --> 01:10:50,640 Oh, which plus function do I call? 1527 01:10:50,640 --> 01:10:52,750 There is a plus function from that mixing. 1528 01:10:52,750 --> 01:10:54,870 And it promotes the result of flow 64. 1529 01:10:54,870 --> 01:10:56,620 So that means that that sum is flow 64. 1530 01:10:56,620 --> 01:10:58,100 The other sum is flow 64. 1531 01:10:58,100 --> 01:11:02,020 Oh, then I'm creating a Point3 of flow 64s. 1532 01:11:02,020 --> 01:11:05,855 So this kind of mixed promotion is done automatically. 1533 01:11:05,855 --> 01:11:08,230 You can actually define your own promotion rules in Julia 1534 01:11:08,230 --> 01:11:10,600 as well. 1535 01:11:10,600 --> 01:11:12,670 And I can make an array. 1536 01:11:15,430 --> 01:11:25,980 And so now if I have an array of Point3 float 64, 1537 01:11:25,980 --> 01:11:29,040 so this type is attached to the whole array. 1538 01:11:29,040 --> 01:11:32,810 And this is not an arbitrary Point3. 1539 01:11:32,810 --> 01:11:34,550 It's a Point3 of two float 64s. 1540 01:11:34,550 --> 01:11:40,080 So it gets stored again as just 10 to the 7 elements of xy, 1541 01:11:40,080 --> 01:11:42,420 xy where each one is 8 bytes 8 byes, 1542 01:11:42,420 --> 01:11:43,840 8 bytes, one after the other. 1543 01:11:43,840 --> 01:11:45,220 The compiler knows all the types. 1544 01:11:45,220 --> 01:11:48,510 And when you submit, it knows everything at compile time. 1545 01:11:48,510 --> 01:11:51,330 And it will sum to these things. 1546 01:11:51,330 --> 01:11:52,830 But I loaded this into a register, 1547 01:11:52,830 --> 01:11:56,660 load this into a register called one instruction-- add them. 1548 01:11:56,660 --> 01:11:59,372 And so the sum function should be fast. 1549 01:11:59,372 --> 01:12:01,080 So we can call the built-in sum function. 1550 01:12:01,080 --> 01:12:02,583 We can call our own sum function. 1551 01:12:02,583 --> 01:12:04,500 Our own some function, I didn't put SIMD here, 1552 01:12:04,500 --> 01:12:06,600 so it's going to be twice as slow. 1553 01:12:06,600 --> 01:12:08,190 But Yeah. 1554 01:12:08,190 --> 01:12:10,680 Yeah? 1555 01:12:10,680 --> 01:12:12,365 AUDIENCE: Will this work with SIMD? 1556 01:12:12,365 --> 01:12:13,240 STEVEN JOHNSON: Yeah. 1557 01:12:13,240 --> 01:12:13,740 Yeah. 1558 01:12:13,740 --> 01:12:15,850 In fact, if you look, the built-in sum function, 1559 01:12:15,850 --> 01:12:17,933 the built-in sum function is implemented in Julia. 1560 01:12:17,933 --> 01:12:20,440 It just hasn't [INAUDIBLE] SIMD on the sum. 1561 01:12:20,440 --> 01:12:21,230 So yeah. 1562 01:12:21,230 --> 01:12:25,710 llvm is smart enough that if you give it a struct of two values 1563 01:12:25,710 --> 01:12:27,062 and load them-- 1564 01:12:27,062 --> 01:12:29,020 and if you tell it that you're adding these two 1565 01:12:29,020 --> 01:12:31,690 values to these two values these two values to these two values, 1566 01:12:31,690 --> 01:12:34,690 it will actually use SIMD instructions, I think. 1567 01:12:34,690 --> 01:12:36,710 Oh, maybe not. 1568 01:12:36,710 --> 01:12:37,210 No, wait. 1569 01:12:37,210 --> 01:12:38,770 Did my sum use SIMD? 1570 01:12:38,770 --> 01:12:40,280 I'm confused. 1571 01:12:40,280 --> 01:12:41,100 I thought it did. 1572 01:12:41,100 --> 01:12:41,950 AUDIENCE: [INAUDIBLE] removed it. 1573 01:12:41,950 --> 01:12:42,670 STEVEN JOHNSON: I thought I removed it. 1574 01:12:42,670 --> 01:12:45,130 Yeah, so maybe it's not smart enough to use SIMD. 1575 01:12:48,210 --> 01:12:52,200 I've seen in some cases where it's smart enough to use-- 1576 01:12:52,200 --> 01:12:52,760 huh, yeah. 1577 01:12:52,760 --> 01:12:54,800 OK, so they're the same speed. 1578 01:12:54,800 --> 01:12:56,120 OK, no. 1579 01:12:56,120 --> 01:12:56,880 I take it back. 1580 01:12:56,880 --> 01:12:59,790 So maybe llvm is not smart enough in this case 1581 01:12:59,790 --> 01:13:01,280 to use SIMD automatically. 1582 01:13:01,280 --> 01:13:03,880 We could try putting the SIMD annotation there 1583 01:13:03,880 --> 01:13:05,560 and try it again. 1584 01:13:05,560 --> 01:13:09,120 But I thought it was, but maybe not. 1585 01:13:09,120 --> 01:13:09,620 Let's see. 1586 01:13:09,620 --> 01:13:12,530 Let's put SIMD. 1587 01:13:12,530 --> 01:13:14,910 So redefine that. 1588 01:13:14,910 --> 01:13:17,810 And then just rerun this. 1589 01:13:17,810 --> 01:13:20,020 So it'll notice that I've changed the definition. 1590 01:13:20,020 --> 01:13:20,830 It'll recompile it. 1591 01:13:24,040 --> 01:13:27,040 But the B time, since it times at multiple times, 1592 01:13:27,040 --> 01:13:30,250 the first time it calls it, it's slow because it's compiling it. 1593 01:13:30,250 --> 01:13:32,350 But it takes the minimum over several times. 1594 01:13:32,350 --> 01:13:38,180 So let's see. 1595 01:13:44,680 --> 01:13:48,910 Yeah, this is the problem in general with vectorizing 1596 01:13:48,910 --> 01:13:51,550 compilers if they're not that smart if you're 1597 01:13:51,550 --> 01:13:54,400 using anything other than just an array of an elementary data 1598 01:13:54,400 --> 01:13:56,600 type. 1599 01:13:56,600 --> 01:13:57,100 Yeah, no. 1600 01:13:57,100 --> 01:13:58,090 It didn't make any difference. 1601 01:13:58,090 --> 01:13:58,840 So I took it back. 1602 01:13:58,840 --> 01:14:01,900 So for more complicated data structures, 1603 01:14:01,900 --> 01:14:04,990 you often have to use SIMD structure explicitly. 1604 01:14:04,990 --> 01:14:06,650 And there is a way to do that in Julia. 1605 01:14:06,650 --> 01:14:08,775 And there is a higher level library on top of that. 1606 01:14:08,775 --> 01:14:10,990 You can basically credit a tuple and then add things 1607 01:14:10,990 --> 01:14:14,710 and it will do SIMD acceleration. 1608 01:14:14,710 --> 01:14:16,450 But yeah. 1609 01:14:16,450 --> 01:14:18,530 So anyway, so that's the upside here. 1610 01:14:18,530 --> 01:14:19,840 There's a whole bunch of-- 1611 01:14:19,840 --> 01:14:22,730 like the story of why Julia can be compiled with fast code, 1612 01:14:22,730 --> 01:14:25,210 it's a combination of lots of little things. 1613 01:14:25,210 --> 01:14:27,590 But there are a few big things. 1614 01:14:27,590 --> 01:14:30,937 One is that its specialized thing is compile times. 1615 01:14:30,937 --> 01:14:33,020 But, of course, you could do that in any language. 1616 01:14:33,020 --> 01:14:34,930 So that relies on designing the language so 1617 01:14:34,930 --> 01:14:36,980 that you can do type inference. 1618 01:14:36,980 --> 01:14:41,670 It relies on having these kind of parametrized types 1619 01:14:41,670 --> 01:14:43,810 and giving you a way to talk about types 1620 01:14:43,810 --> 01:14:45,760 and attach types to other types. 1621 01:14:45,760 --> 01:14:50,543 So the array you notice probably-- let's see-- 1622 01:14:50,543 --> 01:14:52,960 and now that you understand what these little braces mean, 1623 01:14:52,960 --> 01:14:56,440 you can see that the array is defined in Julia 1624 01:14:56,440 --> 01:14:57,850 as another parametrized type. 1625 01:14:57,850 --> 01:14:59,800 It's parametrized by the type of the element 1626 01:14:59,800 --> 01:15:02,770 and also by the dimensionality. 1627 01:15:02,770 --> 01:15:06,430 So it uses the same mechanism to attach types to an array. 1628 01:15:06,430 --> 01:15:08,600 And you can have your own-- the array type in Julia 1629 01:15:08,600 --> 01:15:10,215 is implemented mostly in Julia. 1630 01:15:10,215 --> 01:15:11,590 And there are other packages that 1631 01:15:11,590 --> 01:15:13,930 implement their own types of arrays 1632 01:15:13,930 --> 01:15:16,705 that have the same performance. 1633 01:15:16,705 --> 01:15:19,600 One of the goals of Julia is to build in as little as possible 1634 01:15:19,600 --> 01:15:23,643 so that there's not some set of privileged types 1635 01:15:23,643 --> 01:15:25,810 that the compiler knows about and everything else is 1636 01:15:25,810 --> 01:15:26,770 second class. 1637 01:15:26,770 --> 01:15:32,412 It's like user code is just as good as the built-in code. 1638 01:15:32,412 --> 01:15:34,120 And, in fact, the built-in code is mostly 1639 01:15:34,120 --> 01:15:35,200 just implemented in Julia. 1640 01:15:35,200 --> 01:15:37,033 There's a small core that's implemented in C 1641 01:15:37,033 --> 01:15:40,250 for bootstrapping, basically. 1642 01:15:40,250 --> 01:15:40,750 Yeah. 1643 01:15:40,750 --> 01:15:45,760 So having parametrized types, having another technicalities, 1644 01:15:45,760 --> 01:15:50,110 having all concrete types are final in Julia. 1645 01:15:52,967 --> 01:15:55,550 A concrete type is something you can actually store in memory. 1646 01:15:55,550 --> 01:15:59,440 So Point3864 is something you can actually have, right? 1647 01:15:59,440 --> 01:16:01,960 An object of two integers is that type. 1648 01:16:01,960 --> 01:16:04,318 So it is concrete, as opposed to this thing. 1649 01:16:04,318 --> 01:16:05,360 This is an abstract type. 1650 01:16:05,360 --> 01:16:06,970 You can't actually have one of these. 1651 01:16:06,970 --> 01:16:08,553 You can only have one of the instances 1652 01:16:08,553 --> 01:16:09,490 of the concrete types. 1653 01:16:09,490 --> 01:16:10,600 So but there are no-- 1654 01:16:10,600 --> 01:16:11,580 this is final. 1655 01:16:11,580 --> 01:16:14,650 It's not possible to have a subtype of this 1656 01:16:14,650 --> 01:16:16,600 because if you could, then you'd be dead 1657 01:16:16,600 --> 01:16:20,650 because this is an array of these things. 1658 01:16:20,650 --> 01:16:23,950 If the compiler has to know it's actually these things and not 1659 01:16:23,950 --> 01:16:28,550 some subtype of this, all right, whereas in other languages, 1660 01:16:28,550 --> 01:16:31,065 like Python, you can have subtypes of concrete types. 1661 01:16:31,065 --> 01:16:32,440 And so then even if you said this 1662 01:16:32,440 --> 01:16:34,450 is an array of a particular Python type, 1663 01:16:34,450 --> 01:16:36,990 it wouldn't really know it's that type, 1664 01:16:36,990 --> 01:16:38,500 or it might be some subtype of that. 1665 01:16:38,500 --> 01:16:40,270 That's one of the reasons why you can't 1666 01:16:40,270 --> 01:16:42,680 implement NumPy in Python. 1667 01:16:42,680 --> 01:16:45,070 This is-- there's no way to say this is really 1668 01:16:45,070 --> 01:16:49,020 that type and nothing else in the language level. 1669 01:16:49,020 --> 01:16:49,790 Yeah? 1670 01:16:49,790 --> 01:16:52,000 AUDIENCE: Will this compilation in Julia work? 1671 01:16:52,000 --> 01:16:53,042 STEVEN JOHNSON: Oh, yeah. 1672 01:16:53,042 --> 01:16:54,840 So and it's calling llvm. 1673 01:16:54,840 --> 01:16:58,740 So basically, the stage is you call-- 1674 01:16:58,740 --> 01:17:01,130 so there's a few passes. 1675 01:17:01,130 --> 01:17:06,100 OK, so and one of the fun things is 1676 01:17:06,100 --> 01:17:09,250 you can actually inspect all the passes 1677 01:17:09,250 --> 01:17:12,380 and almost intercept all of them practically. 1678 01:17:12,380 --> 01:17:14,000 So, of course, typing code like this, 1679 01:17:14,000 --> 01:17:16,810 first, it gets parsed, OK? 1680 01:17:16,810 --> 01:17:20,350 And you can macro those things [INAUDIBLE] 1681 01:17:20,350 --> 01:17:23,290 actually are functions that are called right after parsing. 1682 01:17:23,290 --> 01:17:24,700 They can just take the parse code 1683 01:17:24,700 --> 01:17:26,660 and rewrite it arbitrarily. 1684 01:17:26,660 --> 01:17:28,910 So they can extend the language that way. 1685 01:17:28,910 --> 01:17:32,380 But then it parsed, maybe rewritten by a macro. 1686 01:17:32,380 --> 01:17:34,700 And then you get an abstract syntax tree. 1687 01:17:34,700 --> 01:17:38,530 And then when you call it, let's say f of 3, then says, 1688 01:17:38,530 --> 01:17:39,860 oh, x is an integer. 1689 01:17:39,860 --> 01:17:42,880 Int 64, it runs a type inference pass. 1690 01:17:42,880 --> 01:17:47,170 It tries to figure out what's the type of everything, 1691 01:17:47,170 --> 01:17:49,990 which version of plus to call and so forth. 1692 01:17:49,990 --> 01:17:53,050 Then it decides whether to inline some things. 1693 01:17:53,050 --> 01:17:56,500 And then once it's done all that, 1694 01:17:56,500 --> 01:17:59,470 it spits out llvm byte code, then calls llvm, 1695 01:17:59,470 --> 01:18:01,637 and compiles it to machine code. 1696 01:18:01,637 --> 01:18:03,970 And then it caches that some place for the next time you 1697 01:18:03,970 --> 01:18:08,390 call, you call f of 4, f with another integer. 1698 01:18:08,390 --> 01:18:10,090 It doesn't repeat the same processes. 1699 01:18:10,090 --> 01:18:12,670 Notice it's cached. 1700 01:18:12,670 --> 01:18:14,603 So that's-- so yeah. 1701 01:18:14,603 --> 01:18:16,270 At the lowest level, it's just the llvm. 1702 01:18:20,960 --> 01:18:23,240 So then there's tons of things I haven't showed you. 1703 01:18:23,240 --> 01:18:24,290 So I haven't showed you-- 1704 01:18:24,290 --> 01:18:25,700 I mentioned metaprogramming. 1705 01:18:25,700 --> 01:18:28,010 So it has this macro facility. 1706 01:18:28,010 --> 01:18:30,080 So you can basically write syntax 1707 01:18:30,080 --> 01:18:32,060 that rewrites other syntax, which is really 1708 01:18:32,060 --> 01:18:33,770 cool for code generation. 1709 01:18:33,770 --> 01:18:37,400 You can also intercept it after the type inference phase. 1710 01:18:37,400 --> 01:18:39,890 You can write something called the generated function that 1711 01:18:39,890 --> 01:18:41,660 basically takes-- because at parse time, 1712 01:18:41,660 --> 01:18:44,000 it knows how things are spelled. 1713 01:18:44,000 --> 01:18:45,667 And you can rewrite how they're spelled. 1714 01:18:45,667 --> 01:18:47,708 But it doesn't know what anything actually means. 1715 01:18:47,708 --> 01:18:48,880 It does knows x is a symbol. 1716 01:18:48,880 --> 01:18:50,672 It doesn't know x as an integer-- whatever. 1717 01:18:50,672 --> 01:18:51,860 It just knows the spelling. 1718 01:18:51,860 --> 01:18:54,260 So when you actually compile f of x, at that point, 1719 01:18:54,260 --> 01:18:56,530 it knows x is an integer. 1720 01:18:56,530 --> 01:18:59,630 And so you can write something called a generator or a stage 1721 01:18:59,630 --> 01:19:03,470 function that basically runs at that time and says, 1722 01:19:03,470 --> 01:19:04,980 oh, you told me x is an integer. 1723 01:19:04,980 --> 01:19:07,080 Now I'll rewrite the code based on that. 1724 01:19:07,080 --> 01:19:09,760 And so this is really useful for-- 1725 01:19:09,760 --> 01:19:13,010 there's some cool facilities for multidimensional arrays. 1726 01:19:13,010 --> 01:19:15,140 Because the dimensionality of the array 1727 01:19:15,140 --> 01:19:18,050 is actually part of the type. 1728 01:19:18,050 --> 01:19:20,360 So you can say, oh, this is a three-dimensional array. 1729 01:19:20,360 --> 01:19:21,690 I'll write three loops. 1730 01:19:21,690 --> 01:19:23,480 Oh, you have a four-dimensional array. 1731 01:19:23,480 --> 01:19:24,860 I'll write four loops. 1732 01:19:24,860 --> 01:19:29,050 And it can rewrite the code depending on the dimensionality 1733 01:19:29,050 --> 01:19:30,290 with code generation. 1734 01:19:30,290 --> 01:19:33,350 So you can have code that basically generates 1735 01:19:33,350 --> 01:19:35,550 any number of nested loops depending 1736 01:19:35,550 --> 01:19:36,800 on the types of the arguments. 1737 01:19:36,800 --> 01:19:38,780 And all the generation is done in compiled time 1738 01:19:38,780 --> 01:19:39,830 after type inference. 1739 01:19:39,830 --> 01:19:43,670 So it knows the dimensionality of the array. 1740 01:19:43,670 --> 01:19:50,150 And yeah, so there's lots of fun things like that. 1741 01:19:50,150 --> 01:19:53,180 Of course, it has parallel facilities. 1742 01:19:53,180 --> 01:19:55,530 They're not quite as advanced as Cilk at this point, 1743 01:19:55,530 --> 01:19:59,030 but that's the direction there they're heading. 1744 01:19:59,030 --> 01:20:02,390 There's no global interpreter lock like in Python. 1745 01:20:02,390 --> 01:20:04,250 There's no interpreter. 1746 01:20:04,250 --> 01:20:07,810 So there's a threading facility. 1747 01:20:07,810 --> 01:20:09,530 And there's a pool of workers. 1748 01:20:09,530 --> 01:20:12,470 And you can thread a loop. 1749 01:20:12,470 --> 01:20:17,270 And the garbage collection is threading aware. 1750 01:20:17,270 --> 01:20:18,770 So that's safe. 1751 01:20:18,770 --> 01:20:22,160 And they're gradually having more and more powerful 1752 01:20:22,160 --> 01:20:23,930 run times, hopefully, eventually hooking 1753 01:20:23,930 --> 01:20:27,890 into some of Professor Leiserson's advanced threading 1754 01:20:27,890 --> 01:20:31,180 compiler, taper compiler, or whatever it is. 1755 01:20:31,180 --> 01:20:34,100 And there's also-- most of what I 1756 01:20:34,100 --> 01:20:37,190 do in my research is more coarse grained distributed memory 1757 01:20:37,190 --> 01:20:40,960 parallelism, so running on supercomputers and stuff 1758 01:20:40,960 --> 01:20:41,460 like that. 1759 01:20:41,460 --> 01:20:43,730 And there's MPI. 1760 01:20:43,730 --> 01:20:45,950 There is a remote procedure call library. 1761 01:20:45,950 --> 01:20:50,010 There's different flavors of that. 1762 01:20:50,010 --> 01:20:51,570 But yeah. 1763 01:20:51,570 --> 01:20:55,740 So any other questions? 1764 01:20:55,740 --> 01:20:56,240 Yeah? 1765 01:20:56,240 --> 01:20:59,200 AUDIENCE: How do you implement the big number type? 1766 01:20:59,200 --> 01:21:01,630 STEVEN JOHNSON: The big num type in Julia 1767 01:21:01,630 --> 01:21:03,630 is actually calling GIMP. 1768 01:21:10,850 --> 01:21:12,570 So that's one of those things. 1769 01:21:12,570 --> 01:21:18,600 Let me just-- let me make a new notebook. 1770 01:21:18,600 --> 01:21:26,670 So if I say I know big int 3, 3,000, 1771 01:21:26,670 --> 01:21:31,200 and then I'd say that to the, say, factorial. 1772 01:21:31,200 --> 01:21:33,944 I think there's a built-in factorial of that. 1773 01:21:33,944 --> 01:21:40,590 All right, so this is called the big num type, right? 1774 01:21:40,590 --> 01:21:43,410 It's something where the number of digits changes at run time. 1775 01:21:43,410 --> 01:21:45,243 So, of course, these are orders of magnitude 1776 01:21:45,243 --> 01:21:47,165 slower than hardware things. 1777 01:21:47,165 --> 01:21:48,540 Basically, it has to implement it 1778 01:21:48,540 --> 01:21:51,395 as a loop of digits in some base. 1779 01:21:51,395 --> 01:21:52,770 And when you add or multiply, you 1780 01:21:52,770 --> 01:21:54,750 have to loop over those at runtime. 1781 01:21:58,881 --> 01:22:01,710 These big num libraries, they are quite large 1782 01:22:01,710 --> 01:22:02,850 and heavily optimized. 1783 01:22:02,850 --> 01:22:06,000 And so nobody has reimplemented one in Julia. 1784 01:22:06,000 --> 01:22:08,700 They're just calling out to a C library called the GNU 1785 01:22:08,700 --> 01:22:10,340 multi-precision library. 1786 01:22:10,340 --> 01:22:14,030 And for floating point values, there 1787 01:22:14,030 --> 01:22:16,302 is something called big float. 1788 01:22:16,302 --> 01:22:21,295 So big float of pi is that I can actually-- 1789 01:22:21,295 --> 01:22:22,960 let's set precision. 1790 01:22:29,820 --> 01:22:35,310 Big float to 1000. 1791 01:22:35,310 --> 01:22:37,210 That's 1,000 binary digits. 1792 01:22:37,210 --> 01:22:42,450 And then say big float of pi. 1793 01:22:42,450 --> 01:22:43,980 And [INAUDIBLE] more. 1794 01:22:43,980 --> 01:22:48,070 By the way, you might have-- so I can have a variable alpha-- 1795 01:22:48,070 --> 01:22:53,130 oops-- alpha hat sub 2 equals 17. 1796 01:22:53,130 --> 01:22:53,790 That's allowed. 1797 01:22:58,060 --> 01:23:01,570 All that's happening here is that Julia 1798 01:23:01,570 --> 01:23:05,920 allows almost arbitrary unicode things for identifiers. 1799 01:23:05,920 --> 01:23:06,860 So I can have-- 1800 01:23:06,860 --> 01:23:19,090 make it bigger so we can have an identifier Koala, right? 1801 01:23:21,695 --> 01:23:22,820 So there's two issues here. 1802 01:23:22,820 --> 01:23:25,220 So one is just you have a language that allows 1803 01:23:25,220 --> 01:23:26,500 those things as identifiers. 1804 01:23:26,500 --> 01:23:29,090 So Python 3 also allows Unicode identifiers, 1805 01:23:29,090 --> 01:23:32,280 although I think Julia out of all the existing-- 1806 01:23:32,280 --> 01:23:34,550 the common languages-- it's probably 1807 01:23:34,550 --> 01:23:36,390 the widest unicode support. 1808 01:23:36,390 --> 01:23:40,760 Most languages only allow a very narrow range 1809 01:23:40,760 --> 01:23:42,840 of unicode characters for identifiers. 1810 01:23:42,840 --> 01:23:46,010 So Python would allow the koala, but Python 3 1811 01:23:46,010 --> 01:23:51,260 would not allow with alpha hat sub 2 1812 01:23:51,260 --> 01:23:53,760 because the numeric subscript unicode 1813 01:23:53,760 --> 01:23:56,300 characters it doesn't allow. 1814 01:23:56,300 --> 01:23:58,430 The other thing is how do you type these things. 1815 01:23:58,430 --> 01:24:00,180 And that's more of an editor thing. 1816 01:24:00,180 --> 01:24:05,520 And so in Julia, we implemented initially in the repl 1817 01:24:05,520 --> 01:24:07,160 and in Jupiter. 1818 01:24:07,160 --> 01:24:08,660 And now all the editors support, you 1819 01:24:08,660 --> 01:24:10,400 can just to tab completion of latex. 1820 01:24:10,400 --> 01:24:15,500 So I can type in gamma, tab, and the tab 1821 01:24:15,500 --> 01:24:17,310 completes to the unicode character. 1822 01:24:17,310 --> 01:24:19,780 I can say dot. 1823 01:24:19,780 --> 01:24:25,100 And it puts a dot over it and backslash superscript 4. 1824 01:24:25,100 --> 01:24:26,290 And it puts a 4. 1825 01:24:26,290 --> 01:24:30,300 And that's allowed. 1826 01:24:30,300 --> 01:24:31,845 So it's quite nice. 1827 01:24:31,845 --> 01:24:34,220 So when I'm typing emails, and I put equations in emails, 1828 01:24:34,220 --> 01:24:37,677 I go to the Julia rappel and tab complete all my LaTeX 1829 01:24:37,677 --> 01:24:39,260 characters so that I can put equations 1830 01:24:39,260 --> 01:24:41,330 in emails because It's the easiest way to type 1831 01:24:41,330 --> 01:24:44,070 these Unicode math characters. 1832 01:24:44,070 --> 01:24:44,570 But yeah. 1833 01:24:44,570 --> 01:24:47,860 So IPython borrowed this. 1834 01:24:47,860 --> 01:24:53,800 So now do the same thing in the IPython notebooks as well. 1835 01:24:53,800 --> 01:24:56,750 So it's really fun. 1836 01:24:56,750 --> 01:25:00,290 Because if you read old math codes, especially old Fortran 1837 01:25:00,290 --> 01:25:02,660 codes or things, you see lots of variables 1838 01:25:02,660 --> 01:25:05,330 that are named alpha hat or something like that, 1839 01:25:05,330 --> 01:25:07,157 alpha hat underscore 3. 1840 01:25:07,157 --> 01:25:08,990 It's so much nicer to have a variable that's 1841 01:25:08,990 --> 01:25:10,760 actually the alpha hat sub 3. 1842 01:25:10,760 --> 01:25:11,903 So that's cute. 1843 01:25:11,903 --> 01:25:13,820 CHARLES E. LEISERSON: Steve, thanks very much. 1844 01:25:13,820 --> 01:25:14,510 Thanks. 1845 01:25:14,510 --> 01:25:15,170 This was great. 1846 01:25:15,170 --> 01:25:16,880 [APPLAUSE] 1847 01:25:16,880 --> 01:25:19,580 We are, as Steve mentioned, looking actually 1848 01:25:19,580 --> 01:25:22,670 at a project to merge the Julia technology with the Cilk 1849 01:25:22,670 --> 01:25:23,690 technology. 1850 01:25:23,690 --> 01:25:28,600 And so we're right now in the process of putting together 1851 01:25:28,600 --> 01:25:30,340 the grant proposal. 1852 01:25:30,340 --> 01:25:34,360 And if that gets funded, there may be some UROPS.