1
00:00:01,550 --> 00:00:03,920
The following content is
provided under a Creative

2
00:00:03,920 --> 00:00:05,310
Commons license.

3
00:00:05,310 --> 00:00:07,520
Your support will help
MIT OpenCourseWare

4
00:00:07,520 --> 00:00:11,610
continue to offer high quality
educational resources for free.

5
00:00:11,610 --> 00:00:14,180
To make a donation or to
view additional materials

6
00:00:14,180 --> 00:00:18,140
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,140 --> 00:00:19,026
at ocw.mit.edu.

8
00:00:23,077 --> 00:00:23,660
PROFESSOR: OK.

9
00:00:23,660 --> 00:00:26,720
So I thought I'd
begin today with,

10
00:00:26,720 --> 00:00:29,750
as we're coming to the
end of the sort of focus

11
00:00:29,750 --> 00:00:35,340
on linear algebra and moving
on to a little probability,

12
00:00:35,340 --> 00:00:42,170
a little more optimization,
and a lot of deep learning.

13
00:00:42,170 --> 00:00:44,930
So this was like,
by way of review,

14
00:00:44,930 --> 00:00:50,540
to write down the big
factorizations of a matrix.

15
00:00:50,540 --> 00:00:55,490
And so my idea, and
I kind of enjoyed it,

16
00:00:55,490 --> 00:00:59,780
is checking that the
number of free parameters,

17
00:00:59,780 --> 00:01:04,640
say an L and U or a Q
and R or every-- each

18
00:01:04,640 --> 00:01:07,940
of those, that the
number of free parameters

19
00:01:07,940 --> 00:01:11,690
agrees with the number of
parameters in A itself,

20
00:01:11,690 --> 00:01:13,890
like n squared, usually.

21
00:01:13,890 --> 00:01:16,340
So A usually has n squared.

22
00:01:16,340 --> 00:01:21,560
And then can we replace A if--
after we've computed L and U,

23
00:01:21,560 --> 00:01:23,150
can we throw away A?

24
00:01:23,150 --> 00:01:27,680
Yes, because all the
information is in L and U.

25
00:01:27,680 --> 00:01:32,520
And it fills that
same n by n matrix.

26
00:01:32,520 --> 00:01:39,200
Well, that's kind of obvious
because L is lower triangular,

27
00:01:39,200 --> 00:01:43,520
and the diagonal, all ones,
are not free parameters.

28
00:01:43,520 --> 00:01:47,780
And U is triangular,
upper triangular.

29
00:01:47,780 --> 00:01:50,180
And it's diagonal to the pivots.

30
00:01:50,180 --> 00:01:52,580
Those are free
parameters so that--

31
00:01:52,580 --> 00:01:55,490
but can I just write
down the count?

32
00:01:55,490 --> 00:02:00,080
So I'll go through each
of these just quickly

33
00:02:00,080 --> 00:02:05,330
after I've figured out how--
these are sort of the building

34
00:02:05,330 --> 00:02:06,500
blocks.

35
00:02:06,500 --> 00:02:11,480
So how many free parameters are
there in these two triangular

36
00:02:11,480 --> 00:02:12,740
matrices?

37
00:02:12,740 --> 00:02:17,660
Well, I think the answer
is 1/2 n, n minus 1,

38
00:02:17,660 --> 00:02:22,350
and 1/2 n, n plus 1.

39
00:02:22,350 --> 00:02:24,500
That's a familiar number.

40
00:02:28,280 --> 00:02:32,980
You recognize that as the
sum of 1 plus 2, up to n.

41
00:02:32,980 --> 00:02:35,930
And you have one free para--

42
00:02:35,930 --> 00:02:37,910
in the upper
triangular U. You've

43
00:02:37,910 --> 00:02:42,470
got one free parameter up in
the corner, two in the next one.

44
00:02:42,470 --> 00:02:44,510
And as you're coming
down, you end up

45
00:02:44,510 --> 00:02:46,820
with n on the main diagonal.

46
00:02:46,820 --> 00:02:48,500
And they add up to that.

47
00:02:48,500 --> 00:02:52,310
And you see that those
two are different by n,

48
00:02:52,310 --> 00:02:54,620
which is what we want.

49
00:02:54,620 --> 00:02:55,520
OK.

50
00:02:55,520 --> 00:02:56,300
Diagonal.

51
00:02:56,300 --> 00:02:58,340
The answer is obviously n.

52
00:03:00,980 --> 00:03:02,705
How about the
eigenvector matrix?

53
00:03:05,910 --> 00:03:08,760
This whole exercise
is like something

54
00:03:08,760 --> 00:03:12,390
I've never seen in a textbook.

55
00:03:12,390 --> 00:03:17,910
But for me it brings
back all these key--

56
00:03:17,910 --> 00:03:21,900
really the condensed
course in linear algebra

57
00:03:21,900 --> 00:03:23,580
is on that top line.

58
00:03:23,580 --> 00:03:27,760
So how many free parameters
in an eigenvector matrix?

59
00:03:27,760 --> 00:03:28,620
OK.

60
00:03:28,620 --> 00:03:31,400
And of course, if
you're sort of thinking,

61
00:03:31,400 --> 00:03:36,980
what's the rule for
free parameters?

62
00:03:36,980 --> 00:03:41,010
My answer is going to be, for
the number of free parameters,

63
00:03:41,010 --> 00:03:46,360
so this is an n by n matrix
with the n eigenvectors in it.

64
00:03:46,360 --> 00:03:50,250
But there's a certain
freedom there.

65
00:03:50,250 --> 00:03:51,290
And what is that?

66
00:03:51,290 --> 00:03:54,290
What freedom do we have in
choosing the eigenvector

67
00:03:54,290 --> 00:03:56,540
matrix?

68
00:03:56,540 --> 00:04:01,730
Every eigenvector can be
multiplied by a scalar.

69
00:04:01,730 --> 00:04:04,250
If x is an
eigenvector, so is 2x.

70
00:04:04,250 --> 00:04:05,540
So is 3x.

71
00:04:05,540 --> 00:04:09,740
So we could make a convention
that the first component

72
00:04:09,740 --> 00:04:11,570
was always 1.

73
00:04:11,570 --> 00:04:15,080
Maybe that wouldn't be the
most intelligent convention

74
00:04:15,080 --> 00:04:16,140
in the world.

75
00:04:16,140 --> 00:04:19,610
But it would show that
that top row of ones

76
00:04:19,610 --> 00:04:21,320
were not to be counted.

77
00:04:21,320 --> 00:04:26,070
So I get n squared
minus n for that.

78
00:04:26,070 --> 00:04:26,570
Oh, yeah.

79
00:04:26,570 --> 00:04:32,240
Well, having done those two,
let me look at this one.

80
00:04:32,240 --> 00:04:35,090
Does that come out a
total of n squared?

81
00:04:35,090 --> 00:04:38,960
Yes, because the
eigenvector x has n

82
00:04:38,960 --> 00:04:42,890
squared minus n by this
reasoning, little hokey

83
00:04:42,890 --> 00:04:45,050
reasoning that I just gave.

84
00:04:45,050 --> 00:04:49,490
And then there are n more
for the eigenvalue matrix.

85
00:04:49,490 --> 00:04:52,730
And there's nothing
left for the eigen--

86
00:04:52,730 --> 00:04:55,760
the inverse because
it's determined by x.

87
00:04:55,760 --> 00:05:00,650
So do you see the count adding
up to n squared for those?

88
00:05:00,650 --> 00:05:02,950
Now, I left open
the orthogonal one.

89
00:05:02,950 --> 00:05:06,110
I think we kind of talked
about that during the--

90
00:05:06,110 --> 00:05:08,120
when we met it.

91
00:05:08,120 --> 00:05:10,970
And it's a little less obvious.

92
00:05:10,970 --> 00:05:12,140
But do you remember?

93
00:05:12,140 --> 00:05:17,510
So I'm talking about an n by
n orthogonal matrix, Q. So

94
00:05:17,510 --> 00:05:20,900
how many free parameters
in column one?

95
00:05:20,900 --> 00:05:24,410
That column is what
we always call Q1.

96
00:05:24,410 --> 00:05:26,390
Does it have n free parameters?

97
00:05:26,390 --> 00:05:31,280
Or is there a condition
that cuts that back?

98
00:05:31,280 --> 00:05:34,020
There is a condition, right?

99
00:05:34,020 --> 00:05:36,440
And what's the condition
on the first column

100
00:05:36,440 --> 00:05:40,780
that removes one parameter?

101
00:05:40,780 --> 00:05:42,400
It's normalized.

102
00:05:42,400 --> 00:05:43,900
Its length is 1.

103
00:05:43,900 --> 00:05:49,560
So I only get n minus 1
from the first column.

104
00:05:49,560 --> 00:05:52,330
And now if I move over
to the second column,

105
00:05:52,330 --> 00:05:55,030
how many free parameters there?

106
00:05:55,030 --> 00:05:57,350
Again, it's a unit vector.

107
00:05:57,350 --> 00:06:02,290
But also, it is
orthogonal to the first.

108
00:06:02,290 --> 00:06:06,760
So two parameters got you--
two rules got imposed.

109
00:06:06,760 --> 00:06:08,860
And two parameters got removed.

110
00:06:08,860 --> 00:06:11,050
So this is n minus 2.

111
00:06:11,050 --> 00:06:12,880
And then finally, whatever.

112
00:06:12,880 --> 00:06:14,380
So I think that that--

113
00:06:14,380 --> 00:06:18,910
sum of these guys is exactly
the same that we had up here.

114
00:06:18,910 --> 00:06:26,290
I think it's also 1/2 n, n minus
1, or 1/2n squared minus n.

115
00:06:26,290 --> 00:06:26,940
Yeah.

116
00:06:26,940 --> 00:06:28,690
Yeah, so not as
many as you might

117
00:06:28,690 --> 00:06:32,930
think because the matrix
is size n squared.

118
00:06:32,930 --> 00:06:35,380
Now, can I use those?

119
00:06:35,380 --> 00:06:37,090
Because these are the--

120
00:06:37,090 --> 00:06:38,770
like the building blocks.

121
00:06:38,770 --> 00:06:40,100
Can I just check these?

122
00:06:40,100 --> 00:06:40,870
Let's see.

123
00:06:40,870 --> 00:06:42,240
I'll just go along the list.

124
00:06:42,240 --> 00:06:46,120
L times U. So L had this.

125
00:06:46,120 --> 00:06:47,170
And U had that.

126
00:06:47,170 --> 00:06:50,670
And when I add those,
it adds up to n squared.

127
00:06:50,670 --> 00:06:51,170
Right?

128
00:06:51,170 --> 00:06:53,560
The minus cancels the plus.

129
00:06:53,560 --> 00:06:56,750
And the 1/2n squared
twice gives me n squared.

130
00:06:56,750 --> 00:06:58,750
So good for that one.

131
00:06:58,750 --> 00:07:01,750
What about QR?

132
00:07:01,750 --> 00:07:06,580
Well, R is upper
triangular like so.

133
00:07:06,580 --> 00:07:09,910
And then Q, we just
got it right there.

134
00:07:09,910 --> 00:07:15,790
So for Q times R, it's that
plus that again, adding to n

135
00:07:15,790 --> 00:07:17,580
squared.

136
00:07:17,580 --> 00:07:18,630
Good for that one.

137
00:07:18,630 --> 00:07:20,630
n squared for that one.

138
00:07:20,630 --> 00:07:23,810
And this one we just did.

139
00:07:23,810 --> 00:07:26,690
n squared minus n in x.

140
00:07:26,690 --> 00:07:27,980
n on the diagonal.

141
00:07:27,980 --> 00:07:29,540
Total n squared.

142
00:07:29,540 --> 00:07:30,530
What about this guy?

143
00:07:33,400 --> 00:07:36,870
What about the big,
really fundamental one

144
00:07:36,870 --> 00:07:41,400
that I would normally write
to matrix as S instead of A

145
00:07:41,400 --> 00:07:43,260
to remind us that it--

146
00:07:43,260 --> 00:07:47,400
that the matrix
here is symmetric?

147
00:07:47,400 --> 00:07:53,040
So I'm not expecting n
squared for a symmetric ma--

148
00:07:53,040 --> 00:07:54,960
oh, I should've put
that on my list.

149
00:07:54,960 --> 00:07:57,150
What's the count for
a symmetric matrix?

150
00:08:00,120 --> 00:08:02,760
Because this is an S here.

151
00:08:02,760 --> 00:08:05,880
So I'm not expecting
to get n squared.

152
00:08:05,880 --> 00:08:12,230
I'm only expecting to get
the number of symmetric S.

153
00:08:12,230 --> 00:08:15,230
What's the number of free
parameters that I would--

154
00:08:15,230 --> 00:08:20,240
that I start with that I hope
will reappear in Q and lambda?

155
00:08:24,601 --> 00:08:26,575
What's the deal for
a symmetric matrix?

156
00:08:29,130 --> 00:08:30,590
Let's see.

157
00:08:30,590 --> 00:08:32,309
I'm free to choose.

158
00:08:32,309 --> 00:08:35,340
Is it the same count as this?

159
00:08:35,340 --> 00:08:38,220
Yeah, because I'm free to
choose the upper triangular

160
00:08:38,220 --> 00:08:42,929
part and the diagonal, but I'm
not free to choose the lower.

161
00:08:42,929 --> 00:08:48,150
So I'd say that's
1/2n times n minus 1.

162
00:08:48,150 --> 00:08:49,530
And plus 1.

163
00:08:49,530 --> 00:08:50,205
Sorry.

164
00:08:50,205 --> 00:08:52,720
The diagonal's in there.

165
00:08:52,720 --> 00:08:53,430
OK.

166
00:08:53,430 --> 00:08:58,650
So do I get that total,
1/2 of n squared plus n,

167
00:08:58,650 --> 00:09:02,220
from these guys?

168
00:09:02,220 --> 00:09:04,890
Well, I probably do.

169
00:09:04,890 --> 00:09:07,330
The diagonal guy gives me n.

170
00:09:07,330 --> 00:09:08,970
This gives me n.

171
00:09:08,970 --> 00:09:14,580
And that's a Q, which is my
other favorite number there.

172
00:09:14,580 --> 00:09:20,730
And when I add that to that,
that becomes a plus sign.

173
00:09:20,730 --> 00:09:22,760
And I'm good.

174
00:09:22,760 --> 00:09:24,000
Yeah.

175
00:09:24,000 --> 00:09:26,510
You see how I enjoy
doing this, right?

176
00:09:26,510 --> 00:09:27,630
But I'm near the end.

177
00:09:27,630 --> 00:09:33,370
But the last one is
kind of not well known.

178
00:09:33,370 --> 00:09:33,870
OK.

179
00:09:33,870 --> 00:09:37,350
Q times S. Do you remember
that factorization?

180
00:09:37,350 --> 00:09:40,470
That's called the
polar decomposition.

181
00:09:40,470 --> 00:09:44,730
It's an orthogonal
times the symmetric.

182
00:09:44,730 --> 00:09:50,550
And it is often used
in engineering as a way

183
00:09:50,550 --> 00:09:56,980
to decompose a
displacement, strain matrix.

184
00:09:56,980 --> 00:09:59,470
Anyway, Q times S. And it--

185
00:09:59,470 --> 00:10:03,370
actually, it's very,
very close to the SVD.

186
00:10:03,370 --> 00:10:05,440
And I have friends
who say, better

187
00:10:05,440 --> 00:10:09,870
to compute QS than the SVD
and then just move along.

188
00:10:09,870 --> 00:10:16,120
Anyway, Q times S.
So Q is this guy.

189
00:10:16,120 --> 00:10:19,890
And S. What's S?

190
00:10:19,890 --> 00:10:20,650
Symmetric.

191
00:10:20,650 --> 00:10:21,550
That's this guy.

192
00:10:24,220 --> 00:10:27,640
So that's Q. Let me
write that letter Q and S

193
00:10:27,640 --> 00:10:30,280
so I don't lose it.

194
00:10:30,280 --> 00:10:31,690
What do those add up to?

195
00:10:34,650 --> 00:10:35,900
N squared.

196
00:10:35,900 --> 00:10:36,910
Happy.

197
00:10:36,910 --> 00:10:37,720
OK.

198
00:10:37,720 --> 00:10:40,730
So finally, the SVD.

199
00:10:40,730 --> 00:10:43,170
Finally, the SVD.

200
00:10:43,170 --> 00:10:45,560
What's the count?

201
00:10:45,560 --> 00:10:49,900
Now I've got rectangular
stuff in there.

202
00:10:49,900 --> 00:10:52,360
I'm ready for this one.

203
00:10:52,360 --> 00:10:53,840
And I have to
think a little bit.

204
00:10:57,450 --> 00:10:58,710
And we may have done this.

205
00:11:01,940 --> 00:11:06,920
Let's suppose that m
is less or equal n.

206
00:11:06,920 --> 00:11:09,680
Suppose that.

207
00:11:09,680 --> 00:11:10,980
Yeah.

208
00:11:10,980 --> 00:11:14,100
Otherwise, we would just
transpose and look at SVD.

209
00:11:14,100 --> 00:11:17,310
So let's say m less or equal n.

210
00:11:17,310 --> 00:11:19,200
So let's say it's got full rank.

211
00:11:22,230 --> 00:11:26,640
And what's the largest rank
that the matrix can have?

212
00:11:26,640 --> 00:11:28,110
m, clearly.

213
00:11:28,110 --> 00:11:29,820
Full rank m.

214
00:11:29,820 --> 00:11:35,130
So the SVD will be m by m.

215
00:11:35,130 --> 00:11:40,950
Let's remember the U, the
sigma, and the V transpose.

216
00:11:40,950 --> 00:11:43,440
This will be m by n.

217
00:11:43,440 --> 00:11:45,960
And this will be n by n.

218
00:11:45,960 --> 00:11:48,360
For the full scale SVD.

219
00:11:48,360 --> 00:11:55,770
And if the rank is equal to m,
then I really expect to get--

220
00:11:55,770 --> 00:11:58,430
I expect it to add
up to the total

221
00:11:58,430 --> 00:12:06,770
for A. For A, the
original A has mn, right?

222
00:12:06,770 --> 00:12:09,740
It's an m by n matrix.

223
00:12:09,740 --> 00:12:18,400
The matrix A is m by n with the
m less or equal n, giving me

224
00:12:18,400 --> 00:12:18,970
these things.

225
00:12:18,970 --> 00:12:21,670
So it has mn parameters.

226
00:12:25,570 --> 00:12:29,380
So do we get m
times n from this?

227
00:12:29,380 --> 00:12:31,190
I hope we do.

228
00:12:31,190 --> 00:12:33,120
I know how many
we get from sigma.

229
00:12:33,120 --> 00:12:33,620
What?

230
00:12:33,620 --> 00:12:36,390
How many was the
count for sigma?

231
00:12:36,390 --> 00:12:38,450
m.

232
00:12:38,450 --> 00:12:41,670
And what's the count for V?

233
00:12:41,670 --> 00:12:43,410
So that's an n by n.

234
00:12:43,410 --> 00:12:46,370
And what's the count for U?

235
00:12:46,370 --> 00:12:47,090
OK.

236
00:12:47,090 --> 00:12:47,870
Yeah.

237
00:12:47,870 --> 00:12:49,330
They're orthogonal matrices.

238
00:12:49,330 --> 00:12:52,720
So I should be able
to go up to that line.

239
00:12:52,720 --> 00:12:55,270
This was an m by n one.

240
00:12:55,270 --> 00:12:57,700
Is that a 1/2n, n minus 1?

241
00:12:57,700 --> 00:13:04,420
Am I copying that correctly
out of this circle there?

242
00:13:04,420 --> 00:13:07,130
That's an m by m
orthogonal matrix.

243
00:13:07,130 --> 00:13:08,600
Oh, but I have to write m.

244
00:13:08,600 --> 00:13:10,250
That was foolish.

245
00:13:10,250 --> 00:13:12,774
OK. m.

246
00:13:12,774 --> 00:13:14,120
m.

247
00:13:14,120 --> 00:13:17,690
Yeah, because that
matrix is of size m.

248
00:13:17,690 --> 00:13:20,270
So that's an m.

249
00:13:20,270 --> 00:13:23,030
And then I have that.

250
00:13:23,030 --> 00:13:28,540
And then I have whatever
V transpose n by n.

251
00:13:28,540 --> 00:13:30,328
Oh, what's the deal in there?

252
00:13:30,328 --> 00:13:30,828
Hmm.

253
00:13:33,760 --> 00:13:38,350
Do I want all of
the 1/2n, n minus 1?

254
00:13:41,773 --> 00:13:43,730
Oh, God.

255
00:13:43,730 --> 00:13:46,920
I thought I had
got this straight.

256
00:13:46,920 --> 00:13:47,510
Let's see.

257
00:13:51,380 --> 00:13:55,730
I could subtract this from this
and find out what I should say.

258
00:13:55,730 --> 00:13:57,860
Whoa.

259
00:13:57,860 --> 00:14:01,350
Students have been
known to do this too.

260
00:14:01,350 --> 00:14:02,740
Let's see.

261
00:14:02,740 --> 00:14:05,300
Well, let's try to think anyway.

262
00:14:05,300 --> 00:14:07,730
So I have this n by n symmet--

263
00:14:07,730 --> 00:14:09,935
this n by n orthogonal matrix.

264
00:14:13,100 --> 00:14:17,880
First, it could be
any orthogonal matrix.

265
00:14:17,880 --> 00:14:20,210
Yeah.

266
00:14:20,210 --> 00:14:26,860
But is it only the first m
columns that I really need?

267
00:14:26,860 --> 00:14:31,120
The rest I could
just throw away.

268
00:14:31,120 --> 00:14:35,620
Let me try to imagine
that it's just the first.

269
00:14:35,620 --> 00:14:39,400
Well, then I won't
have any n in here.

270
00:14:39,400 --> 00:14:42,610
So maybe I better take a 1/2n--

271
00:14:42,610 --> 00:14:44,170
no.

272
00:14:44,170 --> 00:14:44,670
Help.

273
00:14:44,670 --> 00:14:47,040
Oh, oh, yes, of course.

274
00:14:47,040 --> 00:14:48,450
Ha.

275
00:14:48,450 --> 00:14:55,330
I've got only m columns
that matter, the--

276
00:14:55,330 --> 00:14:59,825
everybody sort of now
understands that SVD.

277
00:14:59,825 --> 00:15:01,090
The rank is m.

278
00:15:01,090 --> 00:15:02,690
Don't forget that.

279
00:15:02,690 --> 00:15:03,370
OK.

280
00:15:03,370 --> 00:15:10,270
Then the first R, the first
m columns of V are important.

281
00:15:10,270 --> 00:15:15,070
Those are the singular vectors
that go with nonzero singular

282
00:15:15,070 --> 00:15:18,280
values that really matter.

283
00:15:18,280 --> 00:15:21,310
And the rest really
don't matter.

284
00:15:21,310 --> 00:15:23,010
So I'm going to just--

285
00:15:23,010 --> 00:15:25,090
I have to count how many--

286
00:15:25,090 --> 00:15:26,930
so, sorry.

287
00:15:26,930 --> 00:15:39,930
V, the important part of V
has how many on the m columns.

288
00:15:39,930 --> 00:15:42,590
But it's an n by n matrix.

289
00:15:42,590 --> 00:15:44,330
And those columns
are orthogonal.

290
00:15:44,330 --> 00:15:47,600
So the answer is
not mn for this guy.

291
00:15:47,600 --> 00:15:51,560
I have to go through this
foolish reasoning again.

292
00:15:51,560 --> 00:16:00,560
I have n minus 1, plus n minus
2, plus so on, plus n minus m.

293
00:16:05,230 --> 00:16:07,060
There were n minus
1 parameters in

294
00:16:07,060 --> 00:16:09,020
the first orthogonal vector--

295
00:16:09,020 --> 00:16:12,820
unit vector, n minus 2 in
the second one, up to n minus

296
00:16:12,820 --> 00:16:14,020
m in the third.

297
00:16:14,020 --> 00:16:17,240
And then V has some
more columns that

298
00:16:17,240 --> 00:16:24,400
are coming, really, from a null
space, that are not important.

299
00:16:24,400 --> 00:16:29,780
I believe this is the
right thing to do.

300
00:16:29,780 --> 00:16:33,120
I'm hoping you agree.

301
00:16:33,120 --> 00:16:38,650
And I mean, I'm hoping even more
that those add up to m times n.

302
00:16:38,650 --> 00:16:39,150
OK.

303
00:16:39,150 --> 00:16:40,820
I have a 1/2n s--

304
00:16:40,820 --> 00:16:43,750
oh, I really have
to total this thing.

305
00:16:43,750 --> 00:16:44,250
OK.

306
00:16:44,250 --> 00:16:47,640
This had m terms.

307
00:16:47,640 --> 00:16:52,620
So there's m of these n's.

308
00:16:52,620 --> 00:16:57,660
And then I have to subtract
off 1 plus 2 plus 3, up to m.

309
00:16:57,660 --> 00:17:00,510
And so what am I
subtracting off?

310
00:17:00,510 --> 00:17:01,500
What's that sum?

311
00:17:01,500 --> 00:17:03,630
1 plus 2 plus 3, stopping at m?

312
00:17:07,079 --> 00:17:10,099
It's one of these guys, 1/2--

313
00:17:10,099 --> 00:17:13,670
is it 1/2m, m plus 1?

314
00:17:13,670 --> 00:17:17,010
Yeah, 1/2m, m plus 1.

315
00:17:17,010 --> 00:17:17,510
Sorry.

316
00:17:17,510 --> 00:17:21,020
1/2m, m plus 1.

317
00:17:21,020 --> 00:17:22,430
I'm supposed to enjoy this.

318
00:17:22,430 --> 00:17:26,089
And now it gets
a little nervous.

319
00:17:26,089 --> 00:17:27,290
But OK.

320
00:17:27,290 --> 00:17:30,470
So I believe that that is that.

321
00:17:30,470 --> 00:17:31,280
OK.

322
00:17:31,280 --> 00:17:32,510
Well, we have the mn.

323
00:17:32,510 --> 00:17:36,180
That's a good sign that
we're shooting for.

324
00:17:36,180 --> 00:17:39,020
So does the rest of
it add to nothing?

325
00:17:39,020 --> 00:17:42,440
Well, I guess, yeah,
I guess it does.

326
00:17:42,440 --> 00:17:47,600
When I put these two together,
I have 1/2m, m plus 1.

327
00:17:47,600 --> 00:17:49,950
And then I'm subtracting
it away again.

328
00:17:49,950 --> 00:17:50,980
So I get mn.

329
00:17:50,980 --> 00:17:53,420
Hooray.

330
00:17:53,420 --> 00:17:58,090
Well, it had to happen,
or we wouldn't--

331
00:17:58,090 --> 00:18:01,040
anything-- before I erase
that board and consign

332
00:18:01,040 --> 00:18:05,120
that to history, is there--
should I pause a little more?

333
00:18:05,120 --> 00:18:06,210
Minute?

334
00:18:06,210 --> 00:18:09,500
This will be, like, I'm
hoping, a one-page appendix

335
00:18:09,500 --> 00:18:11,840
to the notes and the book.

336
00:18:11,840 --> 00:18:12,980
And you'll see it.

337
00:18:12,980 --> 00:18:18,050
But I do have one
more count to do.

338
00:18:18,050 --> 00:18:21,750
And then I'm good
with this review

339
00:18:21,750 --> 00:18:27,120
and ready to move onward to
the topic of saddle points

340
00:18:27,120 --> 00:18:29,220
and ready to move
onward after that.

341
00:18:29,220 --> 00:18:33,990
Well, I'll say a little bit
about the next lab homework

342
00:18:33,990 --> 00:18:35,880
that I'm creating.

343
00:18:35,880 --> 00:18:42,330
And then our next topic will
be, like, covariance matrices,

344
00:18:42,330 --> 00:18:45,890
a little statistics this week.

345
00:18:45,890 --> 00:18:49,560
Then we get a week off
we could-- to digest it.

346
00:18:49,560 --> 00:18:55,440
And then come back for gradient
descent, and deep learning,

347
00:18:55,440 --> 00:18:57,550
and those things.

348
00:18:57,550 --> 00:18:59,190
OK.

349
00:18:59,190 --> 00:19:01,020
Everybody happy with that?

350
00:19:01,020 --> 00:19:04,260
So what's my final question?

351
00:19:04,260 --> 00:19:20,770
My final question is the SVD
for any matrix of rank R.

352
00:19:20,770 --> 00:19:25,550
So it's an m by n matrix.

353
00:19:25,550 --> 00:19:30,950
But the rank is only R.
It's a natural question--

354
00:19:30,950 --> 00:19:36,020
how many parameters are
there in a rank R matrix?

355
00:19:39,020 --> 00:19:41,840
We may even have touched
on this question.

356
00:19:41,840 --> 00:19:44,840
And I have two
ways to answer it.

357
00:19:44,840 --> 00:19:48,250
And one way is the SVD.

358
00:19:48,250 --> 00:19:52,460
And that will be similar to
what I just pushed up there.

359
00:19:52,460 --> 00:19:58,880
So if the rank is R, the SVD
of this typical rank R matrix

360
00:19:58,880 --> 00:20:02,510
will be U sigma V transpose.

361
00:20:02,510 --> 00:20:05,590
But U, now this is the--

362
00:20:05,590 --> 00:20:10,210
like the condensed thing,
where I've thrown away

363
00:20:10,210 --> 00:20:14,860
stuff that's automatically zero
because if the rank is only R,

364
00:20:14,860 --> 00:20:18,520
like if the rank was 1,
suppose the rank was 1,

365
00:20:18,520 --> 00:20:24,790
then I'd have 1 column times
1 sigma times 1 row, right?

366
00:20:24,790 --> 00:20:28,780
And I could do that
count for R equal 1.

367
00:20:28,780 --> 00:20:31,340
Now I have R columns.

368
00:20:31,340 --> 00:20:40,195
So this is m by R. Then
sigma is diagonal, of course.

369
00:20:40,195 --> 00:20:42,760
So I'm going to get R
numbers out of that.

370
00:20:42,760 --> 00:20:45,500
And this one is now R by n.

371
00:20:45,500 --> 00:20:51,760
In other words, maybe I should,
like, save this little bit here

372
00:20:51,760 --> 00:20:53,380
that was helpful.

373
00:20:53,380 --> 00:20:57,400
But now I've got
m is reduced to R.

374
00:20:57,400 --> 00:21:00,700
So I believe that if
I count these three,

375
00:21:00,700 --> 00:21:04,540
I'll get the right number of
parameters for a rank R matrix.

376
00:21:04,540 --> 00:21:10,270
And that's not so obvious
because the rank R matrices

377
00:21:10,270 --> 00:21:11,230
are not a--

378
00:21:11,230 --> 00:21:13,990
we don't have a subspace.

379
00:21:13,990 --> 00:21:17,720
If I add a rank R matrix
to another rank R matrix,

380
00:21:17,720 --> 00:21:20,680
well, the rank could be as big
as 2R and probably will be.

381
00:21:25,200 --> 00:21:27,940
You know, it's just
a little interesting

382
00:21:27,940 --> 00:21:32,240
to get your hands on
matrices of rank R

383
00:21:32,240 --> 00:21:38,500
because they're kind of a
thin, like a, well, a mass--

384
00:21:38,500 --> 00:21:42,250
person would call it a
manifold, some kind of a surface

385
00:21:42,250 --> 00:21:45,190
within matrix space.

386
00:21:45,190 --> 00:21:47,220
Have you ever thought
about matrix space?

387
00:21:47,220 --> 00:21:50,580
So that's vector space
because we can add matrices.

388
00:21:50,580 --> 00:21:52,840
We can multiply
them by constants.

389
00:21:52,840 --> 00:21:55,630
We can take linear combinations.

390
00:21:55,630 --> 00:21:58,240
We could call them
vectors if we like.

391
00:21:58,240 --> 00:22:02,740
There would be a vector
space of m by n matrices.

392
00:22:02,740 --> 00:22:07,030
What would be the
dimension of that space?

393
00:22:07,030 --> 00:22:12,280
So the vector space of
all 3 by 4 matrices.

394
00:22:12,280 --> 00:22:15,260
That has what dimension?

395
00:22:15,260 --> 00:22:16,400
12.

396
00:22:16,400 --> 00:22:19,400
12, because you've got
12 numbers to choose.

397
00:22:19,400 --> 00:22:22,070
And it is a space
because you can add.

398
00:22:22,070 --> 00:22:27,890
Now if I say 3 by 4
matrices of rank 2,

399
00:22:27,890 --> 00:22:30,840
I don't have a space anymore.

400
00:22:30,840 --> 00:22:34,830
That word, space, is
seriously preserved

401
00:22:34,830 --> 00:22:38,010
for meaning vector
space, meaning

402
00:22:38,010 --> 00:22:39,390
I can take combinations.

403
00:22:39,390 --> 00:22:43,110
But if I take a rank 2 matrix
plus a rank 2 matrix, I'm not--

404
00:22:43,110 --> 00:22:49,220
so it's sort of a surface
within 12d, the 2--

405
00:22:49,220 --> 00:22:51,830
the 3 by 4 matrices of rank 2.

406
00:22:51,830 --> 00:22:55,790
And we're about to find the
dimension of that surface.

407
00:22:55,790 --> 00:23:01,910
Does your mind sort of visualize
a surface in 12 dimensions?

408
00:23:01,910 --> 00:23:05,540
Yeah, well, give
it a shot anyway.

409
00:23:05,540 --> 00:23:08,420
But that surface could have--

410
00:23:08,420 --> 00:23:15,460
be 11 dimensional, so to speak,
like, meaning locally, the--

411
00:23:15,460 --> 00:23:17,210
it wouldn't have to be a pl--

412
00:23:17,210 --> 00:23:22,370
an 11 dimensional plane
going through the origin.

413
00:23:22,370 --> 00:23:24,170
In fact, it wouldn't
go through the origin

414
00:23:24,170 --> 00:23:26,910
because the origin
won't have rank R.

415
00:23:26,910 --> 00:23:28,910
So it's some kind of a surface.

416
00:23:28,910 --> 00:23:31,610
And maybe it's got
some different pieces.

417
00:23:31,610 --> 00:23:34,550
Probably, some
smart person knows

418
00:23:34,550 --> 00:23:36,890
what that surface looks like.

419
00:23:36,890 --> 00:23:40,220
But we're just going
to find out something

420
00:23:40,220 --> 00:23:44,210
about its number of parameters,
its local dimension.

421
00:23:44,210 --> 00:23:50,840
Well, I know that this answer
is R because I've got R sigmas.

422
00:23:50,840 --> 00:23:53,450
And this one, I'm
pretty good at.

423
00:23:53,450 --> 00:23:55,760
But now it's not--

424
00:23:55,760 --> 00:23:59,020
it's R by n, so it's--

425
00:23:59,020 --> 00:24:02,765
instead of-- here R was m.

426
00:24:02,765 --> 00:24:11,435
But now, down here, R is R.
So I think it's rn minus 1/2.

427
00:24:14,120 --> 00:24:14,900
What's that?

428
00:24:14,900 --> 00:24:16,790
Is that an m?

429
00:24:16,790 --> 00:24:18,830
So now it's an r.

430
00:24:18,830 --> 00:24:21,120
r plus 1.

431
00:24:21,120 --> 00:24:23,100
I think.

432
00:24:23,100 --> 00:24:25,500
I think.

433
00:24:25,500 --> 00:24:28,770
And what about the U?

434
00:24:28,770 --> 00:24:34,040
So U is going to be similar,
except instead of the n here,

435
00:24:34,040 --> 00:24:35,690
we've got an m.

436
00:24:35,690 --> 00:24:42,960
So I think for you, we'll have m
minus 1, plus m minus 2, plus--

437
00:24:42,960 --> 00:24:45,110
so let me write it here.

438
00:24:45,110 --> 00:24:46,966
m minus 1.

439
00:24:46,966 --> 00:24:51,290
So U, I'm talking about U
here, it's got R columns.

440
00:24:51,290 --> 00:24:54,740
The first one has m minus
1 because I throw away 1

441
00:24:54,740 --> 00:24:59,130
because it's a unit
vector, up to m minus r.

442
00:24:59,130 --> 00:25:01,190
That's r's column.

443
00:25:01,190 --> 00:25:02,150
OK.

444
00:25:02,150 --> 00:25:05,990
And now so how-- what
does that add up to?

445
00:25:05,990 --> 00:25:08,200
Well, I put all
the m's together.

446
00:25:08,200 --> 00:25:14,930
So that's rm, or let me say mr.
And then I'm subtracting on 1

447
00:25:14,930 --> 00:25:17,540
plus 2 plus 3, up to r.

448
00:25:17,540 --> 00:25:20,750
Now tell me again
what that adds up to.

449
00:25:20,750 --> 00:25:23,900
1 plus 2 plus 3, stop at r.

450
00:25:23,900 --> 00:25:26,640
That's what we had here.

451
00:25:26,640 --> 00:25:30,240
And we've got it for V. And
we've got it again here.

452
00:25:30,240 --> 00:25:35,850
Minus 1/2 r, r plus 1.

453
00:25:35,850 --> 00:25:37,890
Are you OK with that?

454
00:25:37,890 --> 00:25:41,370
And now I just want
to add them up.

455
00:25:41,370 --> 00:25:45,240
So I have mr. And I have nr.

456
00:25:45,240 --> 00:25:46,980
And then I have two of these.

457
00:25:46,980 --> 00:25:49,440
So let me get it here.

458
00:25:49,440 --> 00:25:53,250
mr and nr.

459
00:25:53,250 --> 00:25:56,340
And now I have to look
at-- so mr, check.

460
00:25:56,340 --> 00:25:58,140
nr, check.

461
00:25:58,140 --> 00:25:59,910
Now I have two of these guys.

462
00:25:59,910 --> 00:26:04,170
So they combine into
r squared plus r.

463
00:26:04,170 --> 00:26:09,860
And then I-- r squared,
yeah, minus r squared plus r.

464
00:26:09,860 --> 00:26:10,490
Sorry.

465
00:26:10,490 --> 00:26:13,580
They combine into
minus r squared plus r.

466
00:26:13,580 --> 00:26:16,730
And then here's r
coming in with a plus.

467
00:26:16,730 --> 00:26:18,620
I think we have a
minus r squared.

468
00:26:21,600 --> 00:26:24,280
And that is the right answer.

469
00:26:24,280 --> 00:26:25,010
Yeah.

470
00:26:25,010 --> 00:26:26,000
OK.

471
00:26:26,000 --> 00:26:30,110
So I took a bit longer
than I intended.

472
00:26:30,110 --> 00:26:35,150
But this is a number
that's sort of interesting.

473
00:26:35,150 --> 00:26:37,390
I mentioned saddle points
sort of, like, separately

474
00:26:37,390 --> 00:26:42,580
from maxima and minima just
because they are definitely not

475
00:26:42,580 --> 00:26:44,710
as easy to work with.

476
00:26:44,710 --> 00:26:47,870
You understand what I
mean by saddle points?

477
00:26:47,870 --> 00:26:52,280
The matrices involved have--

478
00:26:52,280 --> 00:26:54,850
are not positive definite.

479
00:26:54,850 --> 00:26:58,510
Those would go with a maximum.

480
00:26:58,510 --> 00:27:02,250
They're not negati--
they're not--

481
00:27:02,250 --> 00:27:04,750
well, those would go
with maxima and minima.

482
00:27:04,750 --> 00:27:06,880
But we're looking in between.

483
00:27:06,880 --> 00:27:08,390
So saddle points.

484
00:27:08,390 --> 00:27:08,910
OK.

485
00:27:08,910 --> 00:27:11,270
Well, I'll get going on those.

486
00:27:11,270 --> 00:27:11,770
OK.

487
00:27:11,770 --> 00:27:16,450
I sort of realized that
there are two main sources

488
00:27:16,450 --> 00:27:19,630
of saddle points.

489
00:27:19,630 --> 00:27:24,210
One of them is when I
have problems that--

490
00:27:24,210 --> 00:27:26,870
when I-- let's say I minimize.

491
00:27:26,870 --> 00:27:30,892
So this will be the constraint.

492
00:27:30,892 --> 00:27:33,960
The saddle points have
come from the constraint.

493
00:27:33,960 --> 00:27:37,740
So Lagrange is going
to be responsible

494
00:27:37,740 --> 00:27:39,450
for these saddle points.

495
00:27:39,450 --> 00:27:43,530
So we might have some minimum
problem like minimize,

496
00:27:43,530 --> 00:27:47,110
some positive definite thing.

497
00:27:47,110 --> 00:27:51,520
And of course, if we don't say
anymore, the minimum is zero.

498
00:27:51,520 --> 00:27:52,670
Right?

499
00:27:52,670 --> 00:27:54,860
Because otherwise,
it's positive.

500
00:27:54,860 --> 00:27:59,720
But we're going to put on
constraints, Ax equal b.

501
00:27:59,720 --> 00:28:06,440
So this is the classical
constrained optimization

502
00:28:06,440 --> 00:28:12,310
problem, quadratic cost
function, linear constraints.

503
00:28:12,310 --> 00:28:16,000
We could solve this exactly.

504
00:28:16,000 --> 00:28:19,065
But let's just see where saddle
points is going to arise.

505
00:28:21,690 --> 00:28:25,440
So this S is positive definite.

506
00:28:25,440 --> 00:28:28,410
But now how do we deal
with that problem?

507
00:28:28,410 --> 00:28:30,990
Well, Lagrange said what to do.

508
00:28:30,990 --> 00:28:36,500
Lagrange said, look
at the Lagrangium.

509
00:28:36,500 --> 00:28:38,220
Well, OK.

510
00:28:38,220 --> 00:28:39,525
He introduced lambda.

511
00:28:42,420 --> 00:28:45,270
This x is in n dimensions.

512
00:28:45,270 --> 00:28:47,100
That's an n by n matrix.

513
00:28:47,100 --> 00:28:48,810
But I have m constraints.

514
00:28:48,810 --> 00:28:50,970
So the matrix A is m by n.

515
00:28:56,460 --> 00:28:58,010
I've m constraints.

516
00:28:58,010 --> 00:29:01,790
And then I'm going to follow
the rules and introduce m,

517
00:29:01,790 --> 00:29:03,080
Lagrange multipliers.

518
00:29:06,970 --> 00:29:07,540
That's an m.

519
00:29:10,530 --> 00:29:13,975
And then the neat
part of the Legra--

520
00:29:13,975 --> 00:29:14,475
and what?

521
00:29:14,475 --> 00:29:15,750
What is this?

522
00:29:15,750 --> 00:29:19,650
Well, it's-- I
take the function,

523
00:29:19,650 --> 00:29:22,530
and then I introduce--

524
00:29:22,530 --> 00:29:26,100
remember, lambda's a vector
now, not just a number.

525
00:29:26,100 --> 00:29:28,550
We had some application
where it was just--

526
00:29:28,550 --> 00:29:29,940
there was just one constraint.

527
00:29:29,940 --> 00:29:32,050
But now I have m constraints.

528
00:29:32,050 --> 00:29:33,150
So I take a lambda.

529
00:29:33,150 --> 00:29:39,690
So lambda transposed
times Ax minus b.

530
00:29:39,690 --> 00:29:44,220
And the plus or the minus
sign here is not important.

531
00:29:44,220 --> 00:29:46,320
I mean, you can choose
it because that will

532
00:29:46,320 --> 00:29:48,440
determine the sign of lambda.

533
00:29:48,440 --> 00:29:52,260
But either way, it's correct.

534
00:29:52,260 --> 00:29:53,310
OK.

535
00:29:53,310 --> 00:30:00,690
So we've introduced a function
that now depends on x and also

536
00:30:00,690 --> 00:30:02,940
on lambda.

537
00:30:02,940 --> 00:30:04,110
And there is a l--

538
00:30:04,110 --> 00:30:08,010
and they multiply
each other in there.

539
00:30:08,010 --> 00:30:14,400
And my point is that Lagrange
says, take the derivatives

540
00:30:14,400 --> 00:30:16,410
with respect to x and lambda.

541
00:30:16,410 --> 00:30:19,200
So that's the cool thing
that he's contributed.

542
00:30:19,200 --> 00:30:22,750
He says if you only
create my function,

543
00:30:22,750 --> 00:30:26,610
now you can take x derivative
and lambda derivative.

544
00:30:26,610 --> 00:30:31,080
That will give you n
equation for the x--

545
00:30:31,080 --> 00:30:33,120
from this one, from
the x derivative

546
00:30:33,120 --> 00:30:35,400
and the m equation from
the lambda derivative.

547
00:30:35,400 --> 00:30:37,260
It will be n plus m.

548
00:30:37,260 --> 00:30:40,200
It will determine the
good x and the lambda.

549
00:30:40,200 --> 00:30:44,190
But I'm saying that's all
true and all important.

550
00:30:44,190 --> 00:30:47,010
But I'm saying that
the x and that pair x

551
00:30:47,010 --> 00:30:50,880
lambda will be a saddle
point of this function.

552
00:30:50,880 --> 00:30:56,590
This function has saddle
points, not a maximum.

553
00:30:56,590 --> 00:30:57,260
OK.

554
00:30:57,260 --> 00:31:00,210
Let's just take the derivatives
and see what we get.

555
00:31:00,210 --> 00:31:05,650
So the derivatives with
respect to x, d by dx--

556
00:31:05,650 --> 00:31:10,330
x is now a vector, so I
really should say the gradient

557
00:31:10,330 --> 00:31:12,910
in the x direction.

558
00:31:12,910 --> 00:31:14,500
I get Sx.

559
00:31:17,050 --> 00:31:21,460
And here, the derivative with
respect to x, what that's--

560
00:31:21,460 --> 00:31:27,070
that is A transpose lambda
because this is the dot product

561
00:31:27,070 --> 00:31:29,360
of A transpose lambda with x.

562
00:31:29,360 --> 00:31:32,220
You know, I've put
parentheses around it

563
00:31:32,220 --> 00:31:34,610
and followed the transpose rule.

564
00:31:34,610 --> 00:31:38,200
So that's the dot product of
A transpose lambda with x.

565
00:31:38,200 --> 00:31:39,370
It's linear in x.

566
00:31:39,370 --> 00:31:40,300
So it's derivative.

567
00:31:40,300 --> 00:31:42,460
It's just A transpose lambda.

568
00:31:42,460 --> 00:31:45,950
And that's zero.

569
00:31:45,950 --> 00:31:51,506
And now I take the other
one, the lambda derivative.

570
00:31:51,506 --> 00:31:54,305
The lambda derivative, this
doesn't depend on lambda.

571
00:31:54,305 --> 00:31:57,080
The lambda derivative
is just Ax minus b.

572
00:31:57,080 --> 00:31:59,360
It brings back the constraints.

573
00:32:03,400 --> 00:32:06,130
So that's pretty simple.

574
00:32:06,130 --> 00:32:08,170
It doesn't even
require much thought

575
00:32:08,170 --> 00:32:11,290
because you just know the
constraints are coming back.

576
00:32:11,290 --> 00:32:16,240
And of course, b should
be put over on this side

577
00:32:16,240 --> 00:32:18,310
because it's a constant.

578
00:32:18,310 --> 00:32:21,910
So there we see two--

579
00:32:21,910 --> 00:32:23,560
a block.

580
00:32:23,560 --> 00:32:27,910
We see an important, very
important class of problems.

581
00:32:27,910 --> 00:32:29,740
And the matrix we're
seeing, we could

582
00:32:29,740 --> 00:32:37,020
write this in block matrix
form, S minus A transpose.

583
00:32:37,020 --> 00:32:39,840
Oh, I'm going to
change that into a plus

584
00:32:39,840 --> 00:32:42,660
because I'm more
of a plus person.

585
00:32:42,660 --> 00:32:43,230
OK.

586
00:32:43,230 --> 00:32:49,050
A transpose and A, yeah, yeah.

587
00:32:49,050 --> 00:32:51,930
When I took the derivative
with respect to lambda,

588
00:32:51,930 --> 00:32:54,210
I didn't put the
minus sign in here.

589
00:32:54,210 --> 00:32:55,260
And I didn't want to.

590
00:32:55,260 --> 00:32:57,380
So let's make it a plus.

591
00:32:57,380 --> 00:33:03,420
A and then there's
nothing there.

592
00:33:03,420 --> 00:33:11,740
And the x, and the lambda,
and the zero, and the B.

593
00:33:11,740 --> 00:33:16,380
That is the model of
a constrained minimum,

594
00:33:16,380 --> 00:33:18,940
a minimum problem
with constraint.

595
00:33:18,940 --> 00:33:22,928
It's the model because the
function here is quadratic

596
00:33:22,928 --> 00:33:24,220
and the constraints are linear.

597
00:33:26,770 --> 00:33:33,850
In Course 6, it's everywhere,
constantly appearing

598
00:33:33,850 --> 00:33:35,780
as the simplest model.

599
00:33:35,780 --> 00:33:36,280
OK.

600
00:33:36,280 --> 00:33:40,950
And my point today is
just that the solution

601
00:33:40,950 --> 00:33:50,180
x lambda, that total solution,
the x together with a lambda,

602
00:33:50,180 --> 00:33:58,190
that that is a saddle point
of the Lagrangian function L.

603
00:33:58,190 --> 00:34:00,860
It's a saddle point,
not a minimum.

604
00:34:00,860 --> 00:34:05,425
It's sort of a minimum
in the x direction

605
00:34:05,425 --> 00:34:10,230
because this is
positive definite.

606
00:34:10,230 --> 00:34:14,280
As a function of
x, it's going up.

607
00:34:14,280 --> 00:34:19,460
But somehow the
appearance of lambda

608
00:34:19,460 --> 00:34:22,940
makes this matrix indefinite.

609
00:34:22,940 --> 00:34:27,469
It starts positive definite,
but it has this A transpose A

610
00:34:27,469 --> 00:34:29,060
and that 0.

611
00:34:29,060 --> 00:34:31,940
It couldn't be p-- actually,
if I look at that matrix,

612
00:34:31,940 --> 00:34:34,199
I see it's not
positive definite.

613
00:34:34,199 --> 00:34:35,760
What do I see?

614
00:34:35,760 --> 00:34:37,580
Why do I say that immediately?

615
00:34:37,580 --> 00:34:41,239
When I look at that matrix, it's
not a positive definite matrix

616
00:34:41,239 --> 00:34:48,460
because when I see
that 0 on the diagonal,

617
00:34:48,460 --> 00:34:50,230
that shoots positive definite.

618
00:34:50,230 --> 00:34:50,980
Couldn't be.

619
00:34:50,980 --> 00:34:56,830
Take, as an example, S
equal 3, 1, and 1, 0.

620
00:34:56,830 --> 00:34:58,510
Take that matrix.

621
00:34:58,510 --> 00:35:01,990
Just random.

622
00:35:01,990 --> 00:35:06,250
I made it 2 by 2 instead
of size m plus n.

623
00:35:06,250 --> 00:35:06,760
Do you see?

624
00:35:06,760 --> 00:35:11,530
Or how do I know that the
eigenvalues of that matrix, one

625
00:35:11,530 --> 00:35:14,810
is plus and one is minus?

626
00:35:14,810 --> 00:35:19,990
The determinant is negative.

627
00:35:19,990 --> 00:35:22,810
So that tells me right away that
one is plus and one is minus.

628
00:35:22,810 --> 00:35:23,470
Thanks.

629
00:35:23,470 --> 00:35:23,970
Yes.

630
00:35:23,970 --> 00:35:24,470
Yeah, yeah.

631
00:35:24,470 --> 00:35:26,020
The determinant is negative.

632
00:35:26,020 --> 00:35:27,890
And somehow here,
the determinate,

633
00:35:27,890 --> 00:35:31,900
a similar calculation,
would produce A transpose A

634
00:35:31,900 --> 00:35:37,000
or something with a minus
because I'm going this way.

635
00:35:37,000 --> 00:35:42,010
Well, I could do
better than that.

636
00:35:42,010 --> 00:35:45,190
But you saw the point.

637
00:35:45,190 --> 00:35:53,660
That simple example of this
has eigenvalues of both signs.

638
00:35:53,660 --> 00:35:58,580
Let me just quickly say,
and I'll put it in the notes

639
00:35:58,580 --> 00:36:02,500
or in that chapter, I guess
that all this is coming--

640
00:36:02,500 --> 00:36:07,900
is still 3.2.

641
00:36:07,900 --> 00:36:09,970
That was originally 4.2.

642
00:36:09,970 --> 00:36:13,130
And you will see it.

643
00:36:13,130 --> 00:36:14,450
So what do I want to say?

644
00:36:14,450 --> 00:36:17,960
I'd like to say that
that example is pretty

645
00:36:17,960 --> 00:36:27,440
convincing to me that these KKT
matrices, if you talk to people

646
00:36:27,440 --> 00:36:37,680
in optimization, that's
Karush, Kuhn, and Tucker,

647
00:36:37,680 --> 00:36:42,460
three famous guys, and
these are the KKT conditions

648
00:36:42,460 --> 00:36:47,000
that they derived
following Lagrange.

649
00:36:47,000 --> 00:36:47,730
Right.

650
00:36:47,730 --> 00:36:49,480
And my point is--

651
00:36:49,480 --> 00:36:55,650
and this is a typical sort,
so it's an indefinite matrix.

652
00:36:55,650 --> 00:37:02,800
I believe it has that if I
do an elimination, yeah, tell

653
00:37:02,800 --> 00:37:04,590
me this.

654
00:37:04,590 --> 00:37:07,080
This is a good
way to look at it.

655
00:37:07,080 --> 00:37:10,830
Suppose I do elimination
on this one or on this one.

656
00:37:10,830 --> 00:37:13,200
Well, suppose I do
elimination there.

657
00:37:13,200 --> 00:37:16,070
What is the first pivot?

658
00:37:16,070 --> 00:37:16,960
3.

659
00:37:16,960 --> 00:37:18,170
Positive.

660
00:37:18,170 --> 00:37:20,270
So now let me turn down to here.

661
00:37:20,270 --> 00:37:26,120
What if I do elimination
on this block matrix?

662
00:37:26,120 --> 00:37:27,250
Then I start up here.

663
00:37:27,250 --> 00:37:30,420
And that first pivot is?

664
00:37:30,420 --> 00:37:31,890
Positive again, right?

665
00:37:31,890 --> 00:37:34,530
This S is a positive
definite matrix.

666
00:37:34,530 --> 00:37:35,520
Don't forget.

667
00:37:35,520 --> 00:37:39,300
In fact, the first n
pivots will all be positive

668
00:37:39,300 --> 00:37:41,730
because the first
n pivots, you're

669
00:37:41,730 --> 00:37:44,130
working away in this corner.

670
00:37:44,130 --> 00:37:47,280
And if you're only
thinking about the first n,

671
00:37:47,280 --> 00:37:51,870
this corner is size n by
n, then you don't even

672
00:37:51,870 --> 00:37:55,410
see A. You're doing
some subtractions.

673
00:37:55,410 --> 00:37:56,700
And I'll do those.

674
00:37:56,700 --> 00:37:59,230
But the pivots
themselves are coming--

675
00:37:59,230 --> 00:38:01,770
all coming from S. And
S is positive definite.

676
00:38:01,770 --> 00:38:05,490
So we know that one of the tests
for a positive definite matrix

677
00:38:05,490 --> 00:38:07,830
is all pivots are positive.

678
00:38:07,830 --> 00:38:12,000
So I think all n of the first
pivots will be positive.

679
00:38:12,000 --> 00:38:14,790
And when we use
them, let's just see

680
00:38:14,790 --> 00:38:16,100
what happens when we use them.

681
00:38:18,990 --> 00:38:25,030
So here is the KKT
matrix that I start with.

682
00:38:25,030 --> 00:38:26,380
And what do I end up with?

683
00:38:29,610 --> 00:38:35,430
Well, really, what I'm doing is
I'm multiplying that block row

684
00:38:35,430 --> 00:38:37,310
by something to--

685
00:38:37,310 --> 00:38:42,880
and subtracting to kill
that A. So these rows--

686
00:38:42,880 --> 00:38:44,180
well, near enough.

687
00:38:44,180 --> 00:38:46,740
Let me do block elimination.

688
00:38:46,740 --> 00:38:48,740
Block elimination
is, like, easier.

689
00:38:48,740 --> 00:38:52,830
I don't have to write down
all little tiny numbers.

690
00:38:52,830 --> 00:38:57,450
So I just want to multiply
this row by something.

691
00:38:57,450 --> 00:38:59,040
Tell me what.

692
00:38:59,040 --> 00:39:02,520
And subtract from
this second row.

693
00:39:02,520 --> 00:39:07,110
Suppose they're
numbers or letters.

694
00:39:07,110 --> 00:39:09,110
I guess they are letters.

695
00:39:09,110 --> 00:39:12,965
What do I multiply that
first row by and subtract?

696
00:39:18,130 --> 00:39:19,576
Let's see.

697
00:39:19,576 --> 00:39:27,240
If these were just little tiny
numbers, as like in 3, 1, 1, 0,

698
00:39:27,240 --> 00:39:31,200
what do I multiply that row
by and subtract from this?

699
00:39:31,200 --> 00:39:33,870
I multiply by A over S, right?

700
00:39:33,870 --> 00:39:37,400
I do multiply by A over
S, which puts an A there.

701
00:39:37,400 --> 00:39:38,730
Then I subtract.

702
00:39:38,730 --> 00:39:42,600
So here I'll multiply by A
over S. But these are matrices,

703
00:39:42,600 --> 00:39:48,750
so I multiply by S--

704
00:39:48,750 --> 00:39:51,210
by AS inverse, right?

705
00:39:51,210 --> 00:39:54,960
When I multiply by AS inverse
times this S, I get A.

706
00:39:54,960 --> 00:39:56,220
And then I subtract.

707
00:39:56,220 --> 00:39:58,080
And I get the 0.

708
00:39:58,080 --> 00:40:01,440
And when I multiply by
this guy and subtract,

709
00:40:01,440 --> 00:40:06,660
I get minus because I'm
subtracting this thing, minus

710
00:40:06,660 --> 00:40:10,680
AS inverse, A transpose.

711
00:40:14,600 --> 00:40:18,705
That was block elimination,
which just, in other words,

712
00:40:18,705 --> 00:40:19,205
it's just--

713
00:40:22,730 --> 00:40:25,910
you've learned about
2 by 2 matrices,

714
00:40:25,910 --> 00:40:29,270
3x plus 4y equals 7 and stuff.

715
00:40:29,270 --> 00:40:32,600
Now I'm just doing
it with blocks

716
00:40:32,600 --> 00:40:34,280
instead of single numbers.

717
00:40:34,280 --> 00:40:39,200
But you see, this produced
those positive pivots.

718
00:40:39,200 --> 00:40:43,080
And what can you tell
me about that matrix?

719
00:40:43,080 --> 00:40:45,370
What kind of-- what
can you tell me

720
00:40:45,370 --> 00:40:48,820
about the signs or the
eigenvalues or whatever

721
00:40:48,820 --> 00:40:49,720
of this matrix?

722
00:40:53,810 --> 00:40:56,430
Suppose S was the identity.

723
00:40:56,430 --> 00:41:01,310
What could you tell me
about minus AA transpose?

724
00:41:01,310 --> 00:41:03,080
Minus AA transpose.

725
00:41:03,080 --> 00:41:06,120
And my voice should
emphasize that minus.

726
00:41:06,120 --> 00:41:12,060
It's that matrix there
is negative definite.

727
00:41:12,060 --> 00:41:16,140
So all the next set of m
pivots that come from here

728
00:41:16,140 --> 00:41:17,290
will all be negative.

729
00:41:17,290 --> 00:41:28,045
So I get m or rather n, n
positive, and n negative

730
00:41:28,045 --> 00:41:28,545
pivots.

731
00:41:31,820 --> 00:41:34,710
And then I remember
that the pivots actually

732
00:41:34,710 --> 00:41:37,590
have the same sign
as the eigenvalues.

733
00:41:37,590 --> 00:41:39,390
That's just a beautiful fact.

734
00:41:39,390 --> 00:41:43,640
We know that for
positive definite ones.

735
00:41:43,640 --> 00:41:45,200
The eigenvalues
are all positive.

736
00:41:45,200 --> 00:41:47,220
The pivots are all positive.

737
00:41:47,220 --> 00:41:49,550
But it's even better than that.

738
00:41:49,550 --> 00:41:54,140
If we have some mixture for
the signs of the pivots,

739
00:41:54,140 --> 00:41:56,990
that tells us the signs
of the eigenvalues.

740
00:41:56,990 --> 00:41:59,180
That's a really neat fact.

741
00:41:59,180 --> 00:42:02,120
So I'll just write that down.

742
00:42:02,120 --> 00:42:09,280
Plus and minus signs
of pivots give us

743
00:42:09,280 --> 00:42:16,480
the plus and minus signs
of the eigenvalues.

744
00:42:16,480 --> 00:42:22,190
So I've sneaked in a
nice matrix there that--

745
00:42:22,190 --> 00:42:25,020
for symmetric matrices.

746
00:42:25,020 --> 00:42:26,920
This is symmetric matrices.

747
00:42:26,920 --> 00:42:27,420
OK.

748
00:42:30,510 --> 00:42:37,100
That's what I wanted to say
about constraint and saddle

749
00:42:37,100 --> 00:42:39,390
points coming from there.

750
00:42:39,390 --> 00:42:41,610
And then I now want
to say something

751
00:42:41,610 --> 00:42:44,310
about constraints and--

752
00:42:44,310 --> 00:42:45,990
not constraints now.

753
00:42:45,990 --> 00:42:53,350
I'm going to look at a second
source of saddle points.

754
00:42:53,350 --> 00:43:04,090
So these will be saddles
from this remarkable function

755
00:43:04,090 --> 00:43:10,480
that we know.

756
00:43:13,360 --> 00:43:18,770
So I now have a
symmetric matrix S. Could

757
00:43:18,770 --> 00:43:20,150
be even positive definite.

758
00:43:20,150 --> 00:43:23,020
Usually, it is here.

759
00:43:23,020 --> 00:43:25,390
Do you know what
the name for R is?

760
00:43:25,390 --> 00:43:27,910
It's a ratio or a quotient.

761
00:43:27,910 --> 00:43:32,920
It's named after somebody
starting with R. Who's that?

762
00:43:32,920 --> 00:43:33,680
Rayleigh.

763
00:43:33,680 --> 00:43:34,210
Right.

764
00:43:34,210 --> 00:43:35,168
It's Rayleigh quotient.

765
00:43:41,590 --> 00:43:44,503
And what is the largest
value, possible value

766
00:43:44,503 --> 00:43:45,545
of the Rayleigh quotient?

767
00:43:49,670 --> 00:43:52,990
We've seen this idea.

768
00:43:52,990 --> 00:43:55,965
It is the maximum value
of that Rayleigh quotient,

769
00:43:55,965 --> 00:44:01,220
of that ratio, is lambda max.

770
00:44:01,220 --> 00:44:01,720
Right.

771
00:44:01,720 --> 00:44:04,030
Lambda 1, the biggest one.

772
00:44:04,030 --> 00:44:07,900
And the x that does
it is the eigenvector.

773
00:44:07,900 --> 00:44:08,710
Right?

774
00:44:08,710 --> 00:44:18,070
So max is lambda 1
and at x equal q1

775
00:44:18,070 --> 00:44:26,020
because q1 transpose Sq
1, over transpose q1.

776
00:44:26,020 --> 00:44:29,680
So I'm plugging in this winner.

777
00:44:29,680 --> 00:44:33,470
And Sq1 is lambda 1q1.

778
00:44:33,470 --> 00:44:33,970
Right?

779
00:44:33,970 --> 00:44:37,000
It's the first eigenvector.

780
00:44:37,000 --> 00:44:39,080
And so a lambda 1 comes out.

781
00:44:39,080 --> 00:44:39,880
So I get lambda 1.

782
00:44:43,260 --> 00:44:45,820
I know everything about that.

783
00:44:45,820 --> 00:44:51,800
And what I know is if I put
in any x, what do I know?

784
00:44:51,800 --> 00:44:56,100
If I put in any x whatever
and look at this number,

785
00:44:56,100 --> 00:44:59,570
what do I know
about that number?

786
00:44:59,570 --> 00:45:02,890
It's smaller than lambda 1.

787
00:45:02,890 --> 00:45:04,450
Or it might hit lambda 1.

788
00:45:04,450 --> 00:45:05,410
But it's not bigger.

789
00:45:05,410 --> 00:45:07,540
That's why maxima are easy.

790
00:45:07,540 --> 00:45:10,990
You put in any vector, and
you know what's happening.

791
00:45:10,990 --> 00:45:12,150
You know, it doesn't--

792
00:45:12,150 --> 00:45:15,430
it's not above the
max, obviously.

793
00:45:15,430 --> 00:45:16,530
And what about the min?

794
00:45:19,900 --> 00:45:21,940
That's equally
simple, of course.

795
00:45:21,940 --> 00:45:23,470
It's at the bottom.

796
00:45:23,470 --> 00:45:27,720
So what would be the
minimum of that Rayleigh--

797
00:45:27,720 --> 00:45:29,400
of that quotient
if I was looking

798
00:45:29,400 --> 00:45:32,190
for what eigenvector
and eigenvalue will

799
00:45:32,190 --> 00:45:36,870
I find when I look at
the bottom of this?

800
00:45:36,870 --> 00:45:41,010
I will find lambda
n, the last guy.

801
00:45:41,010 --> 00:45:41,790
Lambda min.

802
00:45:45,220 --> 00:45:48,310
At the winning x will
be its eigenvector.

803
00:45:48,310 --> 00:45:52,270
And again, this stuff
will equal lambda n.

804
00:45:55,030 --> 00:45:55,900
So that's easy.

805
00:45:55,900 --> 00:46:01,260
I know that if I put
in any vector whatever,

806
00:46:01,260 --> 00:46:03,810
just choose any
vector in dimensions

807
00:46:03,810 --> 00:46:08,220
and compute r of x, what do
I now also know about our--

808
00:46:08,220 --> 00:46:10,660
that R of that vector?

809
00:46:10,660 --> 00:46:16,730
It's greater than lambda n.

810
00:46:16,730 --> 00:46:18,740
Below the max, above the min.

811
00:46:22,480 --> 00:46:24,550
Now what about
the other lambdas?

812
00:46:24,550 --> 00:46:28,200
Well, the point is that
those are saddle points.

813
00:46:28,200 --> 00:46:30,760
The beautiful thing about
this Rayleigh quotient

814
00:46:30,760 --> 00:46:36,250
is its derivative equals 0
right at the saddle point--

815
00:46:36,250 --> 00:46:38,790
at the eigenvectors.

816
00:46:38,790 --> 00:46:44,290
And its value at the
eigenvectors is the eigenvalue.

817
00:46:44,290 --> 00:46:46,030
You see what I'm saying?

818
00:46:46,030 --> 00:46:51,370
I have lambda 1 here, a
max, lambda n here, a min.

819
00:46:51,370 --> 00:46:55,270
And in between I have a
bunch of other lambdas,

820
00:46:55,270 --> 00:46:57,460
which are saddle points.

821
00:46:57,460 --> 00:47:04,000
And if I put an x into r of x
and look to see what happens,

822
00:47:04,000 --> 00:47:09,100
I have no idea whether I'm
on this side, below it,

823
00:47:09,100 --> 00:47:12,040
or this side, above lambda i.

824
00:47:12,040 --> 00:47:15,430
So the saddle points
are more difficult

825
00:47:15,430 --> 00:47:20,630
and take a little more patience.

826
00:47:20,630 --> 00:47:22,520
So that's the other
source of saddle points.

827
00:47:30,740 --> 00:47:33,650
Let me just emphasize
again what I'm saying.

828
00:47:33,650 --> 00:47:48,320
At lambda at x equal qk,
I have some number the--

829
00:47:48,320 --> 00:47:53,740
r of x has some number
of positive eigenvalues

830
00:47:53,740 --> 00:47:58,330
and some number of negative
ones for the things

831
00:47:58,330 --> 00:48:00,880
above and below qk.

832
00:48:00,880 --> 00:48:01,780
OK.

833
00:48:01,780 --> 00:48:04,660
I've run out of
time to follow up

834
00:48:04,660 --> 00:48:08,833
on the saddle
point part of the--

835
00:48:08,833 --> 00:48:11,185
on the details of this picture.

836
00:48:13,870 --> 00:48:15,200
That will be on the notes.

837
00:48:15,200 --> 00:48:21,580
And I might come back to it at
the very start of next time.

838
00:48:21,580 --> 00:48:30,180
Before that, you will
have the lab number three.

839
00:48:30,180 --> 00:48:33,330
And then I think we should
discuss it because I

840
00:48:33,330 --> 00:48:36,610
haven't done this lab.

841
00:48:36,610 --> 00:48:41,130
It's intended to give you
some feeling for overfitting

842
00:48:41,130 --> 00:48:43,650
and also intended to give
you a little introduction

843
00:48:43,650 --> 00:48:46,390
to deep learning.

844
00:48:46,390 --> 00:48:52,360
And so I'll get it to you, and
we can talk about it Wednesday.

845
00:48:52,360 --> 00:48:56,330
And again, it won't be due
until the Wednesday after break.

846
00:48:56,330 --> 00:48:56,830
OK.

847
00:48:56,830 --> 00:48:57,330
Thanks.

848
00:48:57,330 --> 00:48:59,191
So I'll see you Wednesday.