1
00:00:01,550 --> 00:00:03,920
The following content is
provided under a Creative

2
00:00:03,920 --> 00:00:05,310
Commons license.

3
00:00:05,310 --> 00:00:07,520
Your support will help
MIT OpenCourseWare

4
00:00:07,520 --> 00:00:11,610
continue to offer high quality
educational resources for free.

5
00:00:11,610 --> 00:00:14,180
To make a donation or to
view additional materials

6
00:00:14,180 --> 00:00:18,140
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,140 --> 00:00:19,026
at ocw.mit.edu.

8
00:00:23,170 --> 00:00:27,920
GILBERT STRANG: OK, so kind of
a few things in mind for today.

9
00:00:27,920 --> 00:00:31,340
One is to answer those two
questions on the second line.

10
00:00:33,860 --> 00:00:38,080
We found those two formulas
on the first line last time,

11
00:00:38,080 --> 00:00:40,850
the derivative of a inverse.

12
00:00:40,850 --> 00:00:43,460
So the derivative of A
squared ought to be easy.

13
00:00:43,460 --> 00:00:48,020
But if we can't do that,
we need to be sure we can.

14
00:00:48,020 --> 00:00:51,560
And then this was the
derivative of an eigenvalue.

15
00:00:51,560 --> 00:00:54,980
And then it's natural to
ask about the derivative

16
00:00:54,980 --> 00:00:56,870
of the singular value.

17
00:00:56,870 --> 00:01:00,110
And I had a happy day
yesterday in the snow,

18
00:01:00,110 --> 00:01:03,380
realizing that that
has a nice formula too.

19
00:01:03,380 --> 00:01:06,230
Of course, I'm not the first.

20
00:01:06,230 --> 00:01:13,260
I'm sure that Wikipedia
already knows this formula.

21
00:01:13,260 --> 00:01:14,970
But it was new to me.

22
00:01:14,970 --> 00:01:19,460
And I should say Professor
Edelman has carried it

23
00:01:19,460 --> 00:01:21,120
to the second derivative.

24
00:01:21,120 --> 00:01:27,320
Again, not new, but it's
more difficult to find

25
00:01:27,320 --> 00:01:31,040
second derivatives,
and interesting.

26
00:01:31,040 --> 00:01:34,740
But we'll just stay
with first derivatives.

27
00:01:34,740 --> 00:01:39,050
OK, so that's my
first item of sort

28
00:01:39,050 --> 00:01:41,260
of business from last time.

29
00:01:41,260 --> 00:01:44,840
And then I'd like to say
something about the lab

30
00:01:44,840 --> 00:01:49,160
homeworks and ask your advice
and begin to say something

31
00:01:49,160 --> 00:01:51,050
about a project.

32
00:01:51,050 --> 00:01:58,550
And then I will move to
these topics in Section 4.4

33
00:01:58,550 --> 00:02:01,430
that you have already.

34
00:02:01,430 --> 00:02:06,860
And you might notice
I skipped 4.3.

35
00:02:06,860 --> 00:02:10,610
And the reason is
that on Friday,

36
00:02:10,610 --> 00:02:13,070
actually arriving
at MIT tomorrow

37
00:02:13,070 --> 00:02:19,910
is Professor Townsend,
4.3 is all about his work.

38
00:02:19,910 --> 00:02:24,320
And he's the best
lecturer I know.

39
00:02:24,320 --> 00:02:29,390
He was here as an instructor
and did 18.06 and was

40
00:02:29,390 --> 00:02:31,540
a big success.

41
00:02:31,540 --> 00:02:36,110
Actually, he's also
just won a prize

42
00:02:36,110 --> 00:02:44,570
for the SIAG/LA, international
prize for young investigators,

43
00:02:44,570 --> 00:02:48,800
young faculty in
applied linear algebra.

44
00:02:48,800 --> 00:02:53,300
So he goes to Hong Kong
to get that prize too.

45
00:02:53,300 --> 00:02:58,700
Anyway, he will be on the videos
and in here in class Friday,

46
00:02:58,700 --> 00:03:00,860
if all goes well.

47
00:03:00,860 --> 00:03:06,110
OK, so in order
then, the first thing

48
00:03:06,110 --> 00:03:09,260
is the derivative of A squared.

49
00:03:09,260 --> 00:03:16,750
And you might think it's
2A dA dt, but it's not.

50
00:03:16,750 --> 00:03:18,760
And if you realize
that it's not,

51
00:03:18,760 --> 00:03:22,480
then you realize what it is,
you will get these things right

52
00:03:22,480 --> 00:03:23,750
in the future.

53
00:03:23,750 --> 00:03:32,250
So the answer to the derivative
of A squared is not 2A dA dt.

54
00:03:36,340 --> 00:03:37,690
And why isn't it?

55
00:03:37,690 --> 00:03:40,720
And what is the right answer?

56
00:03:40,720 --> 00:03:43,450
So I do that maybe
just below here.

57
00:03:50,590 --> 00:03:53,030
Well, I could ask you to
guess the right answer,

58
00:03:53,030 --> 00:03:55,660
but why don't we do
it systematically.

59
00:03:55,660 --> 00:03:59,800
So how do you find
the derivative?

60
00:03:59,800 --> 00:04:01,180
It's a limit.

61
00:04:01,180 --> 00:04:03,700
First you have a delta A, right.

62
00:04:03,700 --> 00:04:05,210
And then you take a limit.

63
00:04:05,210 --> 00:04:15,700
So I look at A plus delta
A squared minus A squared.

64
00:04:15,700 --> 00:04:18,279
So that's the
change in A squared.

65
00:04:18,279 --> 00:04:21,820
And I divide it by delta t.

66
00:04:21,820 --> 00:04:24,370
And then delta t goes to 0.

67
00:04:24,370 --> 00:04:26,920
So that's the derivative
I'm looking for,

68
00:04:26,920 --> 00:04:29,020
the derivative of A squared.

69
00:04:29,020 --> 00:04:34,210
And now, if I write that out,
you'll see why this is wrong,

70
00:04:34,210 --> 00:04:38,080
but something very close to it,
of course-- can't be far away--

71
00:04:38,080 --> 00:04:38,980
is right.

72
00:04:38,980 --> 00:04:41,100
So what happens if
I write this out?

73
00:04:41,100 --> 00:04:44,770
The A squared will
cancel the A squared.

74
00:04:44,770 --> 00:04:45,480
What will I have?

75
00:04:45,480 --> 00:04:49,540
Will I have 2A delta A?

76
00:04:49,540 --> 00:04:52,795
Why don't I write
2A delta A next?

77
00:04:55,900 --> 00:05:01,090
Because when you're squaring
a sum of two matrices,

78
00:05:01,090 --> 00:05:11,180
one term is A delta A, and
another term is delta A A.

79
00:05:11,180 --> 00:05:15,230
And those are
different in general.

80
00:05:15,230 --> 00:05:20,210
And then plus delta A squared.

81
00:05:20,210 --> 00:05:24,680
And now I divide
it all by delta t.

82
00:05:24,680 --> 00:05:31,550
So you're now seeing my point
that now I let delta t go to 0.

83
00:05:31,550 --> 00:05:34,760
So I'm just doing
matrix calculus.

84
00:05:34,760 --> 00:05:40,490
And it's not altogether simple,
but if you follow the rules,

85
00:05:40,490 --> 00:05:42,200
it comes out right.

86
00:05:42,200 --> 00:05:48,770
So now what answer do I
get as delta t goes to 0?

87
00:05:48,770 --> 00:05:51,950
I get A dA dt--

88
00:05:51,950 --> 00:05:56,240
that's the definition of the--

89
00:05:56,240 --> 00:05:58,550
that ratio goes to dA dt.

90
00:05:58,550 --> 00:06:01,460
That's the whole idea
of the derivative of A.

91
00:06:01,460 --> 00:06:04,730
And now what's the other term?

92
00:06:04,730 --> 00:06:13,190
It's dA dt A. So it
was simply that point

93
00:06:13,190 --> 00:06:20,510
that I wanted you to pick up on,
that the derivative might not

94
00:06:20,510 --> 00:06:24,770
commute with A. Matrices
don't commute in general.

95
00:06:24,770 --> 00:06:31,415
And so you'll notice that we
had a similar expression there.

96
00:06:35,350 --> 00:06:38,480
We had to pay attention to
the order of things there.

97
00:06:38,480 --> 00:06:39,770
And now we get it right.

98
00:06:39,770 --> 00:06:51,810
It's not this, but A
dA dt plus dA dt A. OK.

99
00:06:51,810 --> 00:06:52,970
Good.

100
00:06:52,970 --> 00:06:54,990
Now, can I do the other one?

101
00:06:54,990 --> 00:07:01,200
Which is a little more serious,
but it's a beautiful formula.

102
00:07:01,200 --> 00:07:04,470
And it's parallel to this guy.

103
00:07:04,470 --> 00:07:07,050
You might even guess it.

104
00:07:07,050 --> 00:07:10,050
So I'm looking for the
derivative of a singular value.

105
00:07:10,050 --> 00:07:12,780
The matrix A is changing.

106
00:07:12,780 --> 00:07:17,830
dA dt tells me how it's changing
at the moment, at the instant.

107
00:07:17,830 --> 00:07:22,400
And I want to know how is sigma
changing at that same instant.

108
00:07:22,400 --> 00:07:26,700
And sort of in parallel
with this is a nice--

109
00:07:26,700 --> 00:07:27,870
the nice formula--

110
00:07:27,870 --> 00:07:36,150
u transpose dA dt v of t.

111
00:07:36,150 --> 00:07:39,030
Boy, you couldn't ask for a
nicer formula than that, right?

112
00:07:43,310 --> 00:07:46,440
You remember this
is the eigenvector.

113
00:07:46,440 --> 00:07:50,050
And that's the eigenvector
of A transpose.

114
00:07:50,050 --> 00:07:52,650
So this is the
singular vector of A.

115
00:07:52,650 --> 00:07:56,280
And you could say this is a
singular vector of A transpose,

116
00:07:56,280 --> 00:08:04,470
or it's the left singular vector
of A. So that's our formula.

117
00:08:04,470 --> 00:08:07,360
And if we can just
recall how to prove it,

118
00:08:07,360 --> 00:08:10,260
which is going to be parallel
to the proof of that one,

119
00:08:10,260 --> 00:08:14,870
then I'm a happy person and
we can get on with life.

120
00:08:14,870 --> 00:08:20,340
So let's remember this,
because it will help us

121
00:08:20,340 --> 00:08:22,140
to remember the other one, too.

122
00:08:22,140 --> 00:08:25,620
OK, so where do I start?

123
00:08:25,620 --> 00:08:28,410
I start with a
formula for sigma.

124
00:08:28,410 --> 00:08:35,340
So I believe that sigma is
u transpose times A times

125
00:08:35,340 --> 00:08:41,780
v. Everybody agree with that?

126
00:08:41,780 --> 00:08:45,530
Everything's depending
on t in this formula.

127
00:08:45,530 --> 00:08:48,890
As time changes,
everything changes.

128
00:08:48,890 --> 00:08:52,340
But I didn't write
in the parentheses,

129
00:08:52,340 --> 00:08:56,390
t three more times.

130
00:08:56,390 --> 00:08:59,500
Can we just remember
about the SVD.

131
00:08:59,500 --> 00:09:04,756
The SVD says that
A times v equals--

132
00:09:04,756 --> 00:09:05,710
AUDIENCE: Sigma u.

133
00:09:05,710 --> 00:09:06,770
GILBERT STRANG: Sigma u.

134
00:09:06,770 --> 00:09:08,030
Thanks.

135
00:09:08,030 --> 00:09:09,490
Av is sigma u.

136
00:09:09,490 --> 00:09:10,410
That's the SVD.

137
00:09:13,290 --> 00:09:18,380
So when I put in for
Av, I put in sigma u.

138
00:09:18,380 --> 00:09:19,760
Sigma is just a number.

139
00:09:19,760 --> 00:09:21,700
So I bring it outside.

140
00:09:21,700 --> 00:09:24,110
And I'm left with u transpose u.

141
00:09:24,110 --> 00:09:26,970
And what's u transpose u?

142
00:09:26,970 --> 00:09:28,660
1.

143
00:09:28,660 --> 00:09:30,160
So I've used these two facts.

144
00:09:32,980 --> 00:09:35,890
Or I could have
gone the other way

145
00:09:35,890 --> 00:09:39,580
and said that this
is the transpose of--

146
00:09:39,580 --> 00:09:43,060
this is A transpose u transpose.

147
00:09:43,060 --> 00:09:49,060
I could look at it
that way times v.

148
00:09:49,060 --> 00:09:50,860
And if I look at
it that way, I'm

149
00:09:50,860 --> 00:09:53,530
interested in what
is A transpose u.

150
00:09:53,530 --> 00:09:57,370
And what is A transpose u?

151
00:09:57,370 --> 00:10:04,900
It's sigma v. And it's
transpose, so sigma v

152
00:10:04,900 --> 00:10:07,090
transpose v.

153
00:10:07,090 --> 00:10:09,643
And what is sigma v transpose v?

154
00:10:09,643 --> 00:10:10,310
AUDIENCE: Sigma.

155
00:10:10,310 --> 00:10:12,400
GILBERT STRANG: It's
sigma again, of course.

156
00:10:12,400 --> 00:10:14,070
Got sigma both ways.

157
00:10:14,070 --> 00:10:15,520
OK.

158
00:10:15,520 --> 00:10:18,910
Now, I'm ready to
take the derivative.

159
00:10:18,910 --> 00:10:23,680
That's the formula
I have for sigma,

160
00:10:23,680 --> 00:10:25,540
completely parallel
to the formula

161
00:10:25,540 --> 00:10:27,970
that we started out
with for lambda.

162
00:10:27,970 --> 00:10:31,900
The eigenvalue was
y transpose Ax.

163
00:10:31,900 --> 00:10:34,510
And now we've got
u transpose Av.

164
00:10:34,510 --> 00:10:38,170
And, by the way, when
would those two formulas

165
00:10:38,170 --> 00:10:40,410
be one and the same?

166
00:10:40,410 --> 00:10:45,060
When does the SVD just
tell us nothing new

167
00:10:45,060 --> 00:10:50,520
beyond the eigenvalue stuff for
what matrices are the singular

168
00:10:50,520 --> 00:10:53,430
values, the same as the
eigenvalues, and singular

169
00:10:53,430 --> 00:10:57,870
vectors the same as this
as the eigenvectors for--

170
00:10:57,870 --> 00:10:58,755
For?

171
00:10:58,755 --> 00:11:00,240
AUDIENCE: Symmetric.

172
00:11:00,240 --> 00:11:02,340
GILBERT STRANG: Symmetric, good.

173
00:11:02,340 --> 00:11:08,490
Symmetric, square,
and-- the two words

174
00:11:08,490 --> 00:11:11,820
that I'm always looking
for in this course.

175
00:11:11,820 --> 00:11:13,560
If you want an A in
this course, just

176
00:11:13,560 --> 00:11:17,970
write down positive definite
in the answer to any question,

177
00:11:17,970 --> 00:11:21,510
because sigmas are by
definition positive.

178
00:11:21,510 --> 00:11:24,570
And if they're going to agree
totally with the lambdas,

179
00:11:24,570 --> 00:11:26,460
then the lambdas
have to be positive.

180
00:11:26,460 --> 00:11:30,237
Or could be 0, so positive
semidefinite definite

181
00:11:30,237 --> 00:11:31,320
would be the right answer.

182
00:11:31,320 --> 00:11:33,060
Anyway, this is our start.

183
00:11:36,170 --> 00:11:38,460
And what do we do
with that formula?

184
00:11:38,460 --> 00:11:42,340
So this was all the same,
because v transpose v was 1.

185
00:11:45,870 --> 00:11:48,000
Here I had v transpose
v. And that's 1.

186
00:11:48,000 --> 00:11:49,260
So it gave me sigma.

187
00:11:49,260 --> 00:11:50,240
Yeah, good.

188
00:11:50,240 --> 00:11:52,190
Everybody's with us.

189
00:11:52,190 --> 00:11:53,580
OK, what do I do?

190
00:11:53,580 --> 00:11:55,110
Take the derivative.

191
00:11:55,110 --> 00:11:58,160
Takes the derivative of
that equation in the box.

192
00:11:58,160 --> 00:12:00,570
It's exactly what
I did last time

193
00:12:00,570 --> 00:12:03,480
with the corresponding
equation for lambda.

194
00:12:03,480 --> 00:12:04,620
Same thing.

195
00:12:04,620 --> 00:12:07,140
And I'm going to get again--

196
00:12:07,140 --> 00:12:11,130
it's a product rule, because
I have three things multiplied

197
00:12:11,130 --> 00:12:12,760
on the right-hand side.

198
00:12:12,760 --> 00:12:15,700
So I've got three terms
from the product rule.

199
00:12:15,700 --> 00:12:21,780
So d sigma dt,
coming from the box,

200
00:12:21,780 --> 00:12:35,430
is du transpose dt Av
plus u transpose dA dt v

201
00:12:35,430 --> 00:12:41,370
plus the third guy, which
will be u transpose A dv dt.

202
00:12:44,530 --> 00:12:46,090
Did I get the three terms there?

203
00:12:46,090 --> 00:12:48,200
Yep.

204
00:12:48,200 --> 00:12:49,980
And which term do I want?

205
00:12:49,980 --> 00:12:54,300
Which term do I believe is going
to survive and be the answer?

206
00:12:57,090 --> 00:12:59,620
Well, this is what I'm after.

207
00:12:59,620 --> 00:13:01,750
So it's the middle term.

208
00:13:01,750 --> 00:13:03,220
The middle term is just right.

209
00:13:06,320 --> 00:13:09,770
And the other two terms
had better be zero.

210
00:13:09,770 --> 00:13:12,200
So that will be the proof.

211
00:13:12,200 --> 00:13:14,540
The other two
terms will be zero.

212
00:13:14,540 --> 00:13:17,840
So can we just take
one of those two terms

213
00:13:17,840 --> 00:13:22,100
and show that it's
zero like this one?

214
00:13:22,100 --> 00:13:24,140
OK, what have I got here?

215
00:13:24,140 --> 00:13:27,030
I want to know that
that term is 0.

216
00:13:27,030 --> 00:13:28,060
So what have I got.

217
00:13:28,060 --> 00:13:37,370
I've got du transpose
dt times Av.

218
00:13:37,370 --> 00:13:43,200
And everybody says, OK, in
place of Av, write in sigma u.

219
00:13:43,200 --> 00:13:47,760
And sigma's a number, so I
don't mind putting it there.

220
00:13:47,760 --> 00:13:53,310
So I've got sigma, a number of
times the derivative of u times

221
00:13:53,310 --> 00:13:55,140
u itself, the dot product--

222
00:13:55,140 --> 00:13:58,410
the derivative of u
with dot product with u.

223
00:13:58,410 --> 00:14:00,990
And that equals?

224
00:14:00,990 --> 00:14:05,280
0, I hope, because of this.

225
00:14:08,100 --> 00:14:09,120
Because of that.

226
00:14:12,240 --> 00:14:14,790
This comes from the
derivative of that.

227
00:14:17,580 --> 00:14:23,210
But you see, now we've got
dot products, ordinary dot

228
00:14:23,210 --> 00:14:26,800
products, and a number
on the right-hand side.

229
00:14:26,800 --> 00:14:29,750
We're in dimension
1, you could say.

230
00:14:29,750 --> 00:14:34,300
So this tells me immediately
that the derivative

231
00:14:34,300 --> 00:14:44,870
of u with u plus u transpose
times the derivative of u

232
00:14:44,870 --> 00:14:51,680
is the derivative
of 1, which is 0.

233
00:14:51,680 --> 00:14:56,120
All I'm saying is that
these are the same.

234
00:14:56,120 --> 00:15:01,535
You know, vectors, x transpose
y is the same as y transpose

235
00:15:01,535 --> 00:15:04,460
x when I'm talking
about real numbers.

236
00:15:04,460 --> 00:15:08,030
If I was doing complex
things, which I could do,

237
00:15:08,030 --> 00:15:16,070
then I'd have to pay attention
and take complex conjugates

238
00:15:16,070 --> 00:15:16,920
at the right moment.

239
00:15:16,920 --> 00:15:19,250
But let's not bother.

240
00:15:19,250 --> 00:15:23,600
So you see, this is
just two of these.

241
00:15:23,600 --> 00:15:26,960
And it gives me 0.

242
00:15:26,960 --> 00:15:28,070
So that term's gone.

243
00:15:30,690 --> 00:15:34,470
And similarly, totally
similarly, this term is gone.

244
00:15:34,470 --> 00:15:40,820
This is A transpose
u, all transpose.

245
00:15:40,820 --> 00:15:45,410
I'm just doing the
same thing times dv dt.

246
00:15:45,410 --> 00:15:48,300
And what is A transpose u?

247
00:15:48,300 --> 00:15:55,580
It's sigma v. So this is
sigma v transpose dv dt.

248
00:15:55,580 --> 00:15:59,090
And again 0, because of this.

249
00:16:02,630 --> 00:16:07,630
So in a way this was a
slightly easier thing--

250
00:16:07,630 --> 00:16:12,830
the last time was completely
parallel computation.

251
00:16:12,830 --> 00:16:17,410
But the first and third terms
had to cancel each other with

252
00:16:17,410 --> 00:16:19,840
the x's and y's.

253
00:16:19,840 --> 00:16:29,690
Now, they disappear separately,
leaving the right answer.

254
00:16:29,690 --> 00:16:32,860
You might think, how did
we get into derivatives

255
00:16:32,860 --> 00:16:35,470
of singular values?

256
00:16:35,470 --> 00:16:40,210
Well, I think if we're
going to understand the SVD,

257
00:16:40,210 --> 00:16:45,040
then the first derivative
of the sigma is--

258
00:16:45,040 --> 00:16:47,320
well, except that I've
survived all these years

259
00:16:47,320 --> 00:16:48,230
without knowing it.

260
00:16:48,230 --> 00:16:50,440
So you could say it's not--

261
00:16:53,340 --> 00:16:58,330
you can live without it, but
it's a pretty nice formula.

262
00:16:58,330 --> 00:17:05,780
OK, that completes
that Section 3.1.

263
00:17:05,780 --> 00:17:09,770
And more to say about 3.2,
which was the interlacing

264
00:17:09,770 --> 00:17:11,960
part that I introduced.

265
00:17:11,960 --> 00:17:14,720
OK, so where am I?

266
00:17:14,720 --> 00:17:26,220
I guess I'm thinking about the
neat topics about interlacing

267
00:17:26,220 --> 00:17:28,060
of eigenvalues.

268
00:17:28,060 --> 00:17:33,810
So may I pick up on that theme,
interlacing of eigenvalues

269
00:17:33,810 --> 00:17:39,730
and say what's in the notes
and what's the general idea?

270
00:17:39,730 --> 00:17:40,230
OK.

271
00:17:43,290 --> 00:17:48,480
So we're leaving the
derivatives and moving

272
00:17:48,480 --> 00:17:54,570
to finite changes in the
eigenvalues and singular

273
00:17:54,570 --> 00:17:58,020
values, and we are
recognizing that we

274
00:17:58,020 --> 00:18:02,730
can't get exact
formulas for the change,

275
00:18:02,730 --> 00:18:06,200
but we can get
bounds for change.

276
00:18:06,200 --> 00:18:07,990
And they are pretty cool.

277
00:18:07,990 --> 00:18:12,060
So let me remind you what
that is, what they are.

278
00:18:12,060 --> 00:18:15,260
So I have a matrix--

279
00:18:15,260 --> 00:18:18,450
let's see, a symmetric
matrix S that

280
00:18:18,450 --> 00:18:22,080
has eigenvalues lambda 1,
greater equal lambda 2,

281
00:18:22,080 --> 00:18:25,920
greater equal so on.

282
00:18:25,920 --> 00:18:28,680
Then I change S by some amount.

283
00:18:28,680 --> 00:18:35,520
I think in the notes there is
a number, theta times 1 matrix.

284
00:18:35,520 --> 00:18:40,080
That has eigenvalues mu
1, greater equal mu 2,

285
00:18:40,080 --> 00:18:43,530
greater equal something.

286
00:18:43,530 --> 00:18:49,610
And these are what I can't
give you an exact formula for.

287
00:18:49,610 --> 00:18:52,850
You just would have
to compute them.

288
00:18:52,850 --> 00:18:57,410
But I can give you
bounds for them.

289
00:18:57,410 --> 00:18:59,470
And the bounds come
from the lambdas.

290
00:19:02,030 --> 00:19:04,100
So this was a positive.

291
00:19:04,100 --> 00:19:05,700
This is a positive change.

292
00:19:09,590 --> 00:19:14,140
So the eigenvalues will
go up, or stay still,

293
00:19:14,140 --> 00:19:16,760
but they won't go down.

294
00:19:16,760 --> 00:19:20,830
So the mu's will be
bigger than the lambdas.

295
00:19:20,830 --> 00:19:27,130
But the neat thing is that mu
2 will not pass up lambda 1.

296
00:19:27,130 --> 00:19:29,240
So here is the interlacing.

297
00:19:29,240 --> 00:19:32,110
Mu 1 is greater equal lambda 1.

298
00:19:32,110 --> 00:19:35,350
That says that the highest
eigenvalue, the top eigenvalue

299
00:19:35,350 --> 00:19:39,690
went up, or didn't move.

300
00:19:39,690 --> 00:19:44,640
But mu 2 is below lambda 1.

301
00:19:44,640 --> 00:19:46,540
This is the new--
everybody's with me here?

302
00:19:46,540 --> 00:19:50,210
This is a new, and
this is the old.

303
00:19:50,210 --> 00:19:56,510
New and old being old is S,
new is with the change in S.

304
00:19:56,510 --> 00:20:01,540
And that mu 2 is
greater equal lambda 2.

305
00:20:01,540 --> 00:20:04,010
So the second
eigenvalues went up.

306
00:20:04,010 --> 00:20:05,158
And then so on.

307
00:20:10,910 --> 00:20:13,790
That's a great fact.

308
00:20:13,790 --> 00:20:16,810
And I guess that I sent
out a puzzle question.

309
00:20:16,810 --> 00:20:19,215
Did it arrive in email?

310
00:20:25,100 --> 00:20:29,820
Did anybody see that puzzle
question and think about it?

311
00:20:29,820 --> 00:20:31,060
It worried me for a while.

312
00:20:36,480 --> 00:20:41,820
Suppose this is the
second eigenvalue value--

313
00:20:41,820 --> 00:20:44,310
eigenvector.

314
00:20:44,310 --> 00:20:50,520
So I'm adding on, I'm hyping
up the second eigenvector,

315
00:20:50,520 --> 00:20:52,830
hyping up the matrix
in the direction

316
00:20:52,830 --> 00:20:54,370
of the second eigenvector.

317
00:20:57,890 --> 00:21:02,250
So the second
eigenvalue was lambda 2.

318
00:21:02,250 --> 00:21:05,280
And its mu 2, the new
second eigenvalue,

319
00:21:05,280 --> 00:21:06,860
is going to be bigger by theta.

320
00:21:11,190 --> 00:21:15,120
But then I lost a little
sleep in thinking, OK,

321
00:21:15,120 --> 00:21:20,130
if the second eigenvalue
is mu 2 plus theta--

322
00:21:20,130 --> 00:21:22,980
sorry, if the second
eigenvalue mu 2--

323
00:21:22,980 --> 00:21:24,300
so let me write it here.

324
00:21:24,300 --> 00:21:35,460
If mu 2, the second eigenvalue,
is the old lambda 2 plus theta

325
00:21:35,460 --> 00:21:45,390
then bad news, because theta
can be as big as I want.

326
00:21:45,390 --> 00:21:48,180
It can be 20, 200, 2,000.

327
00:21:48,180 --> 00:21:57,300
And if I'm just adding theta
to lambda 2 to get the second--

328
00:21:57,300 --> 00:22:01,440
because it's a second
eigenvector that's

329
00:22:01,440 --> 00:22:10,140
getting pumped up, then after a
while, mu 2 will pass lambda 1.

330
00:22:10,140 --> 00:22:11,520
This will be totally true.

331
00:22:11,520 --> 00:22:13,200
I have no worries about this.

332
00:22:13,200 --> 00:22:14,610
The old lambda 1--

333
00:22:14,610 --> 00:22:18,150
actually, the old--

334
00:22:18,150 --> 00:22:21,000
I'll even have
equality here, because

335
00:22:21,000 --> 00:22:27,600
for this particular change,
it's not affecting lambda 1.

336
00:22:27,600 --> 00:22:30,430
So I think mu 1
would be lambda 1

337
00:22:30,430 --> 00:22:34,080
in my hypothetical possibility.

338
00:22:34,080 --> 00:22:35,550
What I'm trying to
get you to do is

339
00:22:35,550 --> 00:22:39,210
to think through what this
means, because it's quite

340
00:22:39,210 --> 00:22:43,170
easy to write that line there.

341
00:22:43,170 --> 00:22:46,950
But then when you think about
it, you get some questions.

342
00:22:46,950 --> 00:22:50,810
And it looks as
if it might fail,

343
00:22:50,810 --> 00:22:57,110
because if theta is really
big, that mu 2 would pass up

344
00:22:57,110 --> 00:22:57,860
lambda 1.

345
00:22:57,860 --> 00:23:00,500
And the thing would fail.

346
00:23:00,500 --> 00:23:02,570
And there has to be a catch.

347
00:23:02,570 --> 00:23:05,960
There has to be a catch.

348
00:23:05,960 --> 00:23:11,540
So does anybody-- you
saw that in the email.

349
00:23:11,540 --> 00:23:16,400
And I'll now explain
what how I understood

350
00:23:16,400 --> 00:23:24,650
that everything can work and I'm
not reaching a contradiction.

351
00:23:24,650 --> 00:23:27,110
And here's my thinking.

352
00:23:27,110 --> 00:23:32,810
So it's perfectly true that the
eigenvalue that goes with u2--

353
00:23:32,810 --> 00:23:36,320
or maybe I should be calling
them x2, because usually I

354
00:23:36,320 --> 00:23:38,750
call the eigenvectors x2--

355
00:23:38,750 --> 00:23:42,950
it's perfectly true that mu
2, that that one goes up.

356
00:23:46,020 --> 00:23:51,900
But what happens when
it reaches lambda 1?

357
00:23:51,900 --> 00:23:54,435
Actually, lambda 1,
the first eigenvalue,

358
00:23:54,435 --> 00:23:57,930
is staying put, because it's
not getting any push from this.

359
00:23:57,930 --> 00:24:01,700
But the second eigenvalue is
getting a push of size theta.

360
00:24:01,700 --> 00:24:07,290
So what happens when lambda
2 plus theta, which is mu 2--

361
00:24:07,290 --> 00:24:09,480
mu 2 is lambda 2 plus theta--

362
00:24:09,480 --> 00:24:12,720
what happens when it
comes up to lambda 1

363
00:24:12,720 --> 00:24:15,120
and I start worrying
that it passes lambda 1?

364
00:24:18,410 --> 00:24:21,740
Do you see what's
happening there?

365
00:24:21,740 --> 00:24:25,250
What happens when mu 2 passes--

366
00:24:25,250 --> 00:24:26,720
when mu 2, which is--

367
00:24:26,720 --> 00:24:28,220
I'm just going to copy here--

368
00:24:28,220 --> 00:24:31,875
it's the old lambda 2 plus
the theta, the number.

369
00:24:31,875 --> 00:24:34,250
What happens when theta gets
bigger and bigger and bigger

370
00:24:34,250 --> 00:24:37,670
and this hits this thing
and then goes beyond?

371
00:24:37,670 --> 00:24:40,850
Just to see the logic here.

372
00:24:40,850 --> 00:24:46,760
What happens is that this lambda
2 plus theta, which was mu 2,

373
00:24:46,760 --> 00:24:49,070
mu 2 until they got here.

374
00:24:49,070 --> 00:24:55,570
But what is lambda 2 plus
theta after it passes lambda 1?

375
00:24:55,570 --> 00:24:56,800
It's lambda 1 now.

376
00:24:59,340 --> 00:25:02,190
It passed up, so it's
the top eigenvalue

377
00:25:02,190 --> 00:25:07,390
of the altered matrix.

378
00:25:07,390 --> 00:25:10,380
And therefore, it's just fine.

379
00:25:10,380 --> 00:25:11,130
It's out here.

380
00:25:11,130 --> 00:25:13,740
No problem.

381
00:25:13,740 --> 00:25:15,810
Maybe I'll just say it again.

382
00:25:15,810 --> 00:25:20,010
When theta is big
enough that mu 2 reaches

383
00:25:20,010 --> 00:25:23,520
lambda 1, if I increase
theta beyond that,

384
00:25:23,520 --> 00:25:30,060
then this becomes not
mu 2 any more, but mu 1.

385
00:25:30,060 --> 00:25:35,130
And then totally
everybody's happy.

386
00:25:35,130 --> 00:25:40,260
I won't say more on that,
because that's just like a way

387
00:25:40,260 --> 00:25:44,760
that I found to make me think,
what do these things mean?

388
00:25:44,760 --> 00:25:48,070
OK, enough said on
that small point.

389
00:25:48,070 --> 00:25:51,730
But then the main point
is, why is this true?

390
00:25:51,730 --> 00:25:59,240
This interlacing, which is
really a nice, beautiful fact.

391
00:25:59,240 --> 00:26:05,500
And you could
imagine that we have

392
00:26:05,500 --> 00:26:09,220
more different perturbations
than just rank 1s.

393
00:26:13,300 --> 00:26:19,750
So let me tell you the
inequality, so named

394
00:26:19,750 --> 00:26:23,650
after the discoverer,
Weyl's inequality.

395
00:26:27,790 --> 00:26:39,400
So his inequality is for
the eigenvalues of S plus T.

396
00:26:39,400 --> 00:26:41,980
So T is the change.

397
00:26:41,980 --> 00:26:43,170
S is where I start.

398
00:26:43,170 --> 00:26:45,340
It has eigenvalues lambda.

399
00:26:45,340 --> 00:26:48,520
But now, I'm looking at the
eigenvalues of S plus T.

400
00:26:48,520 --> 00:26:50,860
So I'm making a change.

401
00:26:50,860 --> 00:26:53,350
Over here, in my
little puzzle question,

402
00:26:53,350 --> 00:26:56,710
that was T. It was
a rank 1 change.

403
00:26:56,710 --> 00:26:59,860
Now I will allow other ranks.

404
00:26:59,860 --> 00:27:03,430
So I want to estimate
lambdas of S plus t

405
00:27:03,430 --> 00:27:10,880
in terms of lambdas
of S and lambdas of T.

406
00:27:10,880 --> 00:27:13,710
And I want some
inequality sign there.

407
00:27:17,000 --> 00:27:21,680
And it's supposed to be true
for any symmetric matrices,

408
00:27:21,680 --> 00:27:26,800
symmetric S and T.

409
00:27:26,800 --> 00:27:32,360
And then a totally
identical Weyl inequality--

410
00:27:32,360 --> 00:27:33,980
actually, Weyl was
one of the people

411
00:27:33,980 --> 00:27:36,380
who discovered singular values.

412
00:27:36,380 --> 00:27:39,350
And when he did it, he
asked about his inequality.

413
00:27:39,350 --> 00:27:42,370
And he found that it
still worked the way we've

414
00:27:42,370 --> 00:27:44,180
found this morning earlier.

415
00:27:47,210 --> 00:27:49,490
I haven't completed
that yet, because I

416
00:27:49,490 --> 00:27:54,790
haven't told you which
lambdas I'm talking about.

417
00:27:54,790 --> 00:27:58,420
So let me do that.

418
00:27:58,420 --> 00:28:01,050
So now, I'll tell you
Weyl's inequality.

419
00:28:01,050 --> 00:28:03,280
So S and T are symmetric.

420
00:28:03,280 --> 00:28:05,770
And so the lambdas are real.

421
00:28:05,770 --> 00:28:07,670
And we want to know--

422
00:28:07,670 --> 00:28:10,060
we want to get them in order.

423
00:28:10,060 --> 00:28:11,740
OK, so here it goes.

424
00:28:15,170 --> 00:28:21,460
Weyl allowed the i-th eigenvalue
of S and the j-th eigenvalue

425
00:28:21,460 --> 00:28:27,850
of T and figured out that this
was bounded by that eigenvalue

426
00:28:27,850 --> 00:28:32,650
of S plus T. So that's
Weyl's great inequality,

427
00:28:32,650 --> 00:28:42,730
which reduces to the
one I wrote here,

428
00:28:42,730 --> 00:28:44,680
if I make the right choice--

429
00:28:44,680 --> 00:28:47,940
yeah, probably, if
I take j equal to 1.

430
00:28:47,940 --> 00:28:51,340
So you see the beauty of this.

431
00:28:51,340 --> 00:28:56,560
It tells you about
any eigenvalues of S,

432
00:28:56,560 --> 00:28:57,760
eigenvalues of T.

433
00:28:57,760 --> 00:29:00,670
So I'm using lambdas here.

434
00:29:00,670 --> 00:29:02,950
Lambda of S are the
eigenvalues of S.

435
00:29:02,950 --> 00:29:07,480
I'm using lambda again for T
and lambda again for S plus T.

436
00:29:07,480 --> 00:29:11,830
So you have to pay attention
to which matrix I'm

437
00:29:11,830 --> 00:29:13,330
taking the eigenvalues out of.

438
00:29:13,330 --> 00:29:17,270
So let me take j equal to 1.

439
00:29:17,270 --> 00:29:21,780
And this says that
lambda i, because j is 1,

440
00:29:21,780 --> 00:29:28,210
S plus T is less or equal to
lambda i of S plus lambda 1,

441
00:29:28,210 --> 00:29:40,170
the top eigenvalue of T.
This is lambda max of T.

442
00:29:40,170 --> 00:29:44,850
Do you see that that's totally
reasonable, believable?

443
00:29:44,850 --> 00:29:49,260
That the eigenvalue
when I add on T-- let's

444
00:29:49,260 --> 00:29:52,260
imagine in our minds
that T is positive.

445
00:29:52,260 --> 00:29:56,640
T is like this thing.

446
00:29:56,640 --> 00:30:02,280
This could be the T, example of
a T. It's what I'm adding on.

447
00:30:02,280 --> 00:30:06,570
Then the eigenvalues go up.

448
00:30:06,570 --> 00:30:09,870
But they don't pass that.

449
00:30:09,870 --> 00:30:12,510
So that tells you how
much it could go up by.

450
00:30:12,510 --> 00:30:19,070
So I guess that Weyl is giving
us a less than or equal here.

451
00:30:19,070 --> 00:30:22,050
Less or equal to lambda 1--

452
00:30:22,050 --> 00:30:24,450
so I'm taking i to be 1--

453
00:30:24,450 --> 00:30:27,320
plus theta.

454
00:30:27,320 --> 00:30:32,590
Yeah, so that any equality
I've written down there--

455
00:30:32,590 --> 00:30:37,020
there's some playing around
to do to get practice.

456
00:30:37,020 --> 00:30:44,310
And it's not so essential for
us to be like world grandmasters

457
00:30:44,310 --> 00:30:48,030
at this thing, but
you should see it.

458
00:30:48,030 --> 00:30:52,010
And you should also
see j equal to 2.

459
00:30:52,010 --> 00:30:55,230
Why will j equal to
2 tell us something?

460
00:30:55,230 --> 00:30:58,120
I hope it will.

461
00:30:58,120 --> 00:31:00,310
Let's see what it tells us.

462
00:31:00,310 --> 00:31:04,720
Lambda i plus 1 now-- j is 2--

463
00:31:04,720 --> 00:31:12,640
of S plus T. So it's less
than or equal to lambda i of S

464
00:31:12,640 --> 00:31:19,480
plus lambda 2 of T. I
think that's interesting.

465
00:31:19,480 --> 00:31:34,280
And also, I think I also could
get lambda i plus i minus 1.

466
00:31:34,280 --> 00:31:37,570
Let me write it and
see if it's correct.

467
00:31:37,570 --> 00:31:40,755
Plus lambda i minus 1.

468
00:31:40,755 --> 00:31:43,690
So those was add up to i plus 2.

469
00:31:43,690 --> 00:31:51,120
Yeah, I guess lambda i
plus 1 plus lambda 1 of T.

470
00:31:51,120 --> 00:31:55,050
That's what I got by taking--

471
00:31:55,050 --> 00:31:57,360
yeah, did I do that right?

472
00:32:00,480 --> 00:32:03,696
I'm taking j equal to 1.

473
00:32:03,696 --> 00:32:07,480
No, well, I don't
think I got it right.

474
00:32:10,520 --> 00:32:15,080
What do I want to do here to
get a bound on lambda i plus 1?

475
00:32:15,080 --> 00:32:16,610
I want to take j equal to 2.

476
00:32:16,610 --> 00:32:23,210
I should just be sensible
and plug in j equal to 2

477
00:32:23,210 --> 00:32:24,270
and i equal to 1.

478
00:32:29,510 --> 00:32:35,480
All I want to say is that Weyl's
inequality is the great fact

479
00:32:35,480 --> 00:32:38,390
out of which all this
interlacing falls

480
00:32:38,390 --> 00:32:42,650
and more and more, because
the interlacing is telling me

481
00:32:42,650 --> 00:32:46,020
about neighbors.

482
00:32:46,020 --> 00:32:50,790
And actually if I use Weyl for i
and j, different i's and j's, I

483
00:32:50,790 --> 00:32:56,166
even learn about ones
that are not neighbors.

484
00:32:56,166 --> 00:33:00,600
And I could tell you a
proof of Weyl's inequality.

485
00:33:00,600 --> 00:33:02,340
But I'll save that
for the notes.

486
00:33:07,310 --> 00:33:09,100
So I think maybe
that's what I want

487
00:33:09,100 --> 00:33:14,240
to do about interfacing, just
to say what the notes have,

488
00:33:14,240 --> 00:33:17,550
but not repeat it all in class.

489
00:33:17,550 --> 00:33:24,230
So the notes have actually two
ways to prove this interlacing.

490
00:33:24,230 --> 00:33:27,830
The standard way that every
mathematician would use

491
00:33:27,830 --> 00:33:30,990
would be Weyl's inequality.

492
00:33:30,990 --> 00:33:36,600
But last year,
Professor Rao, visiting,

493
00:33:36,600 --> 00:33:42,060
found a nice argument
that's also in the notes.

494
00:33:42,060 --> 00:33:43,700
It ends up with a graph.

495
00:33:43,700 --> 00:33:47,540
And on that graph, you
can see that this is true.

496
00:33:47,540 --> 00:33:55,530
So for what it's worth, two
approaches to this interlacing

497
00:33:55,530 --> 00:33:58,440
and some examples.

498
00:33:58,440 --> 00:34:01,710
But I really don't
want to spend our lives

499
00:34:01,710 --> 00:34:05,000
on this eigenvalue topic.

500
00:34:05,000 --> 00:34:08,940
It's a beautiful fact
about symmetric matrices

501
00:34:08,940 --> 00:34:12,510
and the corresponding fact
is true for singular values

502
00:34:12,510 --> 00:34:18,270
of any matrix, but let's
think of leaving it there.

503
00:34:21,150 --> 00:34:28,670
So now, I'm moving on
to the new section.

504
00:34:28,670 --> 00:34:30,710
The new section
involves something

505
00:34:30,710 --> 00:34:31,897
called compressed sensing.

506
00:34:31,897 --> 00:34:33,605
I don't know if you've
heard those words.

507
00:34:45,949 --> 00:34:52,699
So these are all topics in
Section 4.4, which you have.

508
00:34:52,699 --> 00:34:55,880
I think we sent it out
10 days ago probably.

509
00:34:58,660 --> 00:35:04,000
OK, so first let me remember
what the nuclear norm is

510
00:35:04,000 --> 00:35:06,320
of a matrix.

511
00:35:06,320 --> 00:35:19,635
The nuclear norm a matrix is
the sum of the singular values,

512
00:35:19,635 --> 00:35:22,460
the sum of the singular values.

513
00:35:22,460 --> 00:35:29,170
So it's like the L1
norm for a vector.

514
00:35:29,170 --> 00:35:32,080
That's a right way
to think about it.

515
00:35:32,080 --> 00:35:34,030
And do you remember
what was special?

516
00:35:34,030 --> 00:35:38,230
We've talked about
using the L1 norm.

517
00:35:38,230 --> 00:35:42,610
It has this special property
that the ordinary L2

518
00:35:42,610 --> 00:35:45,190
norm absolutely does not have.

519
00:35:45,190 --> 00:35:48,070
What was it special
about the L1 norm?

520
00:35:48,070 --> 00:35:56,080
If I minimize the L1 norm with
some constraint, like ab equal

521
00:35:56,080 --> 00:36:01,320
b, what's special about the
solution, the minimum in the L1

522
00:36:01,320 --> 00:36:02,040
norm?

523
00:36:02,040 --> 00:36:02,910
AUDIENCE: Sparse.

524
00:36:02,910 --> 00:36:04,160
GILBERT STRANG: Sparse, right.

525
00:36:04,160 --> 00:36:06,920
Sparse.

526
00:36:06,920 --> 00:36:10,700
So this is moving
us up to matrices.

527
00:36:10,700 --> 00:36:13,670
And that's where compressed
sensing comes in.

528
00:36:13,670 --> 00:36:16,230
Matrix completion comes in.

529
00:36:16,230 --> 00:36:20,580
So matrix completion
would just be--

530
00:36:20,580 --> 00:36:23,270
I mentioned-- so
this is completion.

531
00:36:26,120 --> 00:36:28,590
And I'll remember
the words Netflix,

532
00:36:28,590 --> 00:36:31,910
which made the problem famous.

533
00:36:31,910 --> 00:36:44,140
So I have the matrix A, 3, 2,
question mark, question mark,

534
00:36:44,140 --> 00:36:47,310
question mark, 1, 4,
6, question mark--

535
00:36:53,390 --> 00:36:55,780
missing data.

536
00:36:55,780 --> 00:36:59,650
And so I have to put
it in something there,

537
00:36:59,650 --> 00:37:03,000
because if I don't put in
anything, then the numbers

538
00:37:03,000 --> 00:37:07,930
I do know are useless,
because no row or no column

539
00:37:07,930 --> 00:37:10,100
is complete.

540
00:37:10,100 --> 00:37:12,100
So it just would give up.

541
00:37:12,100 --> 00:37:14,710
Somebody that sent
me the data, 3 and 2

542
00:37:14,710 --> 00:37:20,050
and didn't tell me a
ranking for the third movie,

543
00:37:20,050 --> 00:37:22,780
I'd have to say,
well, I can't use it.

544
00:37:22,780 --> 00:37:24,010
That's not possible.

545
00:37:24,010 --> 00:37:28,990
So we need to think about there.

546
00:37:28,990 --> 00:37:36,790
And the idea is that the numbers
that minimized the nuclear norm

547
00:37:36,790 --> 00:37:40,180
are a good choice,
a good choice.

548
00:37:40,180 --> 00:37:46,540
So that's just a connection here
that we will say more about,

549
00:37:46,540 --> 00:37:49,300
but not--

550
00:37:49,300 --> 00:37:52,240
we could have a whole
course in compressed sensing

551
00:37:52,240 --> 00:37:53,470
and nuclear norm.

552
00:37:53,470 --> 00:37:59,470
Professor Parrilo in course
6 is an expert on this.

553
00:38:03,450 --> 00:38:06,980
But you see the point that--

554
00:38:06,980 --> 00:38:18,900
so you remember v1
came from the 0 norm.

555
00:38:21,530 --> 00:38:23,585
And what is the 0
norm of the vector?

556
00:38:26,940 --> 00:38:27,860
Well, it's not a norm.

557
00:38:27,860 --> 00:38:31,760
So you could say,
forget it, no answer.

558
00:38:31,760 --> 00:38:34,880
But what do we
symbolically mean when

559
00:38:34,880 --> 00:38:38,310
I write the 0 norm of a vector?

560
00:38:38,310 --> 00:38:41,070
I mean the number of....?

561
00:38:41,070 --> 00:38:42,540
Non-zeros.

562
00:38:42,540 --> 00:38:44,520
The number of non-zeros.

563
00:38:44,520 --> 00:38:55,430
This was the number of
non-zeros in the vector, in v.

564
00:38:55,430 --> 00:39:03,720
But it's not a norm, because
if I take 2 times the vector,

565
00:39:03,720 --> 00:39:07,330
I have the same number
of non-zeros, same norm.

566
00:39:07,330 --> 00:39:09,662
I can't have the norm
of 2v equal the norm

567
00:39:09,662 --> 00:39:15,890
of v. That would blow away
all the properties of norms.

568
00:39:15,890 --> 00:39:17,920
So v0 is not a norm.

569
00:39:17,920 --> 00:39:21,500
And then we move it to that sort
of appropriate nearest norm.

570
00:39:21,500 --> 00:39:23,260
And we get v1.

571
00:39:23,260 --> 00:39:27,320
We get the L1 norm,
which is the sum of--

572
00:39:27,320 --> 00:39:32,474
everybody remembers that
this is the sum of the vi.

573
00:39:32,474 --> 00:39:37,950
And you remember my pictures
of diamonds touching

574
00:39:37,950 --> 00:39:41,400
planes at sharp points.

575
00:39:41,400 --> 00:39:44,430
Well, that's what
is going on here.

576
00:39:44,430 --> 00:39:48,660
That problem was
called basis pursuit.

577
00:39:48,660 --> 00:39:51,600
And it comes back
again in this section.

578
00:39:54,120 --> 00:40:01,570
So I minimize this norm
subject to the conditions.

579
00:40:01,570 --> 00:40:07,070
Now, I'm just going to take
a jump to the matrix case.

580
00:40:07,070 --> 00:40:09,920
What's my idea here?

581
00:40:09,920 --> 00:40:14,420
My idea is that for a
matrix, the nuclear norm

582
00:40:14,420 --> 00:40:15,440
comes from what?

583
00:40:18,830 --> 00:40:21,570
What's the norm that
we sort of start with,

584
00:40:21,570 --> 00:40:24,220
but it's not a norm?

585
00:40:24,220 --> 00:40:27,280
And when I sort of take the--

586
00:40:27,280 --> 00:40:34,000
because the requirements
for a norm don't fail--

587
00:40:34,000 --> 00:40:38,170
they fail for what I'm
about to write there.

588
00:40:38,170 --> 00:40:41,920
I could put A 0,
but I don't want

589
00:40:41,920 --> 00:40:44,740
the number of non-zero entries.

590
00:40:44,740 --> 00:40:47,280
That would be a good guess.

591
00:40:47,280 --> 00:40:50,480
And probably in some
sense it makes sense.

592
00:40:50,480 --> 00:40:53,960
But it's not the
answer I'm looking for.

593
00:40:53,960 --> 00:41:04,030
What do you think is the 0 norm
of a matrix that is not a norm,

594
00:41:04,030 --> 00:41:09,400
but when I pump it up to the
best, to the nearest good norm,

595
00:41:09,400 --> 00:41:11,710
I get the nuclear norm?

596
00:41:11,710 --> 00:41:14,500
So this is the question,
it's what is A0?

597
00:41:18,698 --> 00:41:22,160
And it's what?

598
00:41:22,160 --> 00:41:23,060
AUDIENCE: The rank.

599
00:41:23,060 --> 00:41:24,290
GILBERT STRANG: The rank.

600
00:41:24,290 --> 00:41:27,890
The rank of matrix
is the equivalent.

601
00:41:32,630 --> 00:41:34,170
So I don't know about the zero.

602
00:41:34,170 --> 00:41:35,720
Nobody else calls it A0.

603
00:41:35,720 --> 00:41:37,460
So I better not.

604
00:41:37,460 --> 00:41:39,180
It's the rank.

605
00:41:39,180 --> 00:41:40,890
So again, the rank
is not a norm,

606
00:41:40,890 --> 00:41:43,820
because if I double the matrix,
I don't double the rank.

607
00:41:46,600 --> 00:41:48,220
So I have to move to a norm.

608
00:41:48,220 --> 00:41:50,410
And it turns out to
be the nuclear norm.

609
00:41:50,410 --> 00:41:52,540
And now, I'll just,
with one minute,

610
00:41:52,540 --> 00:41:57,850
say it's the guess of some
people who are working hard

611
00:41:57,850 --> 00:42:02,920
to prove it, that
the deep learning

612
00:42:02,920 --> 00:42:07,120
algorithm of gradient
descent finds

613
00:42:07,120 --> 00:42:12,850
the solution to the minimum
problem in the nuclear norm.

614
00:42:12,850 --> 00:42:16,140
And we don't know if
that's true or not yet.

615
00:42:16,140 --> 00:42:25,540
For related examples, like
this thing, it's proved.

616
00:42:25,540 --> 00:42:31,870
For the exact problem of deep
learning, it's a conjecture.

617
00:42:31,870 --> 00:42:35,200
So that's what in section 4.4.

618
00:42:35,200 --> 00:42:38,860
But that word lasso, you
want to know what that is.

619
00:42:38,860 --> 00:42:41,140
Compressed sensing,
I'll say a word about.

620
00:42:41,140 --> 00:42:46,540
So that will be Monday after
Alex Townsend's lecture Friday.

621
00:42:46,540 --> 00:42:51,790
So he's coming to
speak to computational

622
00:42:51,790 --> 00:42:55,960
science students all over
MIT tomorrow afternoon.

623
00:42:55,960 --> 00:43:00,100
I'll certainly go
to that, but then he

624
00:43:00,100 --> 00:43:03,850
said he would come in and
take this class Friday.

625
00:43:03,850 --> 00:43:05,410
So I'll see you Friday.

626
00:43:05,410 --> 00:43:07,455
And he'll be here too.