1 00:00:15,730 --> 00:00:16,840 PETER SZOLOVITS: OK. 2 00:00:16,840 --> 00:00:20,810 Today's topic is differential diagnosis. 3 00:00:20,810 --> 00:00:23,230 And so I'm just quoting Wikipedia here. 4 00:00:23,230 --> 00:00:27,340 Diagnosis is the identification of the nature and cause 5 00:00:27,340 --> 00:00:29,230 of a certain phenomenon. 6 00:00:29,230 --> 00:00:33,070 And differential diagnosis is the distinguishing 7 00:00:33,070 --> 00:00:35,800 of a particular disease or condition 8 00:00:35,800 --> 00:00:39,490 from others that present similar clinical features. 9 00:00:39,490 --> 00:00:43,720 So doctors typically talk about differential diagnosis 10 00:00:43,720 --> 00:00:46,090 when they're faced with a patient 11 00:00:46,090 --> 00:00:49,510 and they make list of what are the things that might 12 00:00:49,510 --> 00:00:51,650 be wrong with this patient. 13 00:00:51,650 --> 00:00:53,200 And then they go through the process 14 00:00:53,200 --> 00:00:56,560 of trying to figure out which one it actually is. 15 00:00:56,560 --> 00:00:59,900 So that's what we're going to focus on today. 16 00:00:59,900 --> 00:01:05,530 Now, just to scare you, here's a lovely model 17 00:01:05,530 --> 00:01:09,400 of human circulatory physiology. 18 00:01:09,400 --> 00:01:15,700 So this is from Guyton's textbook of cardiology. 19 00:01:15,700 --> 00:01:20,170 And I'm not going to hold you responsible for all 20 00:01:20,170 --> 00:01:24,520 of the details of this model, but it's interesting, 21 00:01:24,520 --> 00:01:29,020 because this is, at least as of maybe 20 years ago, 22 00:01:29,020 --> 00:01:32,890 the state of the art of how people understood what happens 23 00:01:32,890 --> 00:01:35,120 in the circulatory system. 24 00:01:35,120 --> 00:01:38,590 And it has various control inputs 25 00:01:38,590 --> 00:01:44,200 that determine things like how your hormone levels change 26 00:01:44,200 --> 00:01:47,800 various aspects of the cardiovascular system 27 00:01:47,800 --> 00:01:52,060 and how the interactions between different components 28 00:01:52,060 --> 00:01:57,220 of the cardiovascular system affect each other. 29 00:01:57,220 --> 00:02:02,620 And so in principle, if I could tune this model to me, 30 00:02:02,620 --> 00:02:06,100 then I could make all kinds of pretty good predictions 31 00:02:06,100 --> 00:02:11,560 that say if I increase my systemic vascular resistance, 32 00:02:11,560 --> 00:02:13,510 then here's what's going to happen 33 00:02:13,510 --> 00:02:16,570 as the rest of the system adjusts. 34 00:02:16,570 --> 00:02:20,590 And if I get a blockage in a coronary artery, 35 00:02:20,590 --> 00:02:24,340 then here's what's going to happen to my cardiac output 36 00:02:24,340 --> 00:02:27,080 and various other things. 37 00:02:27,080 --> 00:02:28,610 So this would be terrific. 38 00:02:28,610 --> 00:02:32,170 And if we had this kind of model for not 39 00:02:32,170 --> 00:02:36,310 just the cardiovascular system, but the entire body, then we'd 40 00:02:36,310 --> 00:02:39,550 say, OK, we've solved medicine. 41 00:02:39,550 --> 00:02:44,500 Well, we don't have this kind of model for most systems. 42 00:02:44,500 --> 00:02:47,230 And also, there's this minor problem 43 00:02:47,230 --> 00:02:51,310 that if I give you this model and say, "How does this 44 00:02:51,310 --> 00:02:55,030 relate to a particular patient?", 45 00:02:55,030 --> 00:02:57,250 how would you figure that out? 46 00:02:57,250 --> 00:02:59,740 This has hundreds of differential equations 47 00:02:59,740 --> 00:03:02,590 that are being represented by this diagram. 48 00:03:02,590 --> 00:03:05,830 And they have many hundreds of parameters. 49 00:03:05,830 --> 00:03:11,920 And so we were joking when we started working with this model 50 00:03:11,920 --> 00:03:14,800 that you'd really have to kill the patient in order 51 00:03:14,800 --> 00:03:21,070 to do enough measurements to be able to tune this model 52 00:03:21,070 --> 00:03:23,620 to their particular physiology. 53 00:03:23,620 --> 00:03:28,030 And of course, that's probably not a good practical approach. 54 00:03:28,030 --> 00:03:31,990 We're getting a little better by developing more non-invasive 55 00:03:31,990 --> 00:03:36,520 ways of measuring these things. 56 00:03:36,520 --> 00:03:38,830 But that's moving along very slowly. 57 00:03:38,830 --> 00:03:43,030 And I don't expect that I or maybe even any of you 58 00:03:43,030 --> 00:03:47,620 will live long enough that sort of this approach to doing 59 00:03:47,620 --> 00:03:50,410 medical reasoning and medical diagnosis 60 00:03:50,410 --> 00:03:54,110 is actually going to happen. 61 00:03:54,110 --> 00:03:57,670 So what we're going to look at today 62 00:03:57,670 --> 00:04:04,660 is what simpler models are there for diagnostic reasoning. 63 00:04:04,660 --> 00:04:08,260 And I'm going to take the liberty of inflicting 64 00:04:08,260 --> 00:04:12,970 a bit of history on you, because I think it's interesting where 65 00:04:12,970 --> 00:04:16,168 a lot of these ideas came from. 66 00:04:16,168 --> 00:04:20,540 So the first idea was to build flowcharts. 67 00:04:20,540 --> 00:04:25,810 Oh, and by the way, the signs and symptoms, 68 00:04:25,810 --> 00:04:29,120 I've forgotten if we've talked about that in the class. 69 00:04:29,120 --> 00:04:33,680 So a sign is something that a doctor sees, 70 00:04:33,680 --> 00:04:37,980 and a symptom is something that the patient experiences. 71 00:04:37,980 --> 00:04:39,750 So a sign is objective. 72 00:04:39,750 --> 00:04:42,950 It's something that can be told outside your body. 73 00:04:42,950 --> 00:04:45,950 A symptom is something that you feel. 74 00:04:45,950 --> 00:04:49,070 So if you're feeling dizzy, then that's 75 00:04:49,070 --> 00:04:53,840 a symptom, because it's not obvious to somebody outside you 76 00:04:53,840 --> 00:04:58,640 that you're dizzy, or that you have a pain, or such things. 77 00:04:58,640 --> 00:05:01,970 Normally, we talk about manifestations or findings, 78 00:05:01,970 --> 00:05:05,270 which is sort of a super category of all the things 79 00:05:05,270 --> 00:05:09,590 that are determinable about a patient. 80 00:05:09,590 --> 00:05:14,060 So we'll talk about flowcharts, models 81 00:05:14,060 --> 00:05:17,180 based on associations between diseases 82 00:05:17,180 --> 00:05:19,392 and these manifestations. 83 00:05:19,392 --> 00:05:21,350 Then there are some issues about whether you're 84 00:05:21,350 --> 00:05:25,040 trying to diagnose a single disease or a multiplicity 85 00:05:25,040 --> 00:05:29,330 of diseases, which makes the models much more complicated 86 00:05:29,330 --> 00:05:32,540 whether you're trying to do probabilistic diagnosis 87 00:05:32,540 --> 00:05:35,600 or definitive or categorical. 88 00:05:35,600 --> 00:05:39,440 And then we'll talk about some utility theoretic methods. 89 00:05:39,440 --> 00:05:43,040 And I'll just mention some rule-based and pattern-matching 90 00:05:43,040 --> 00:05:45,030 kinds of approaches. 91 00:05:45,030 --> 00:05:47,660 So this is kind of cute. 92 00:05:47,660 --> 00:05:50,540 This is from 1973. 93 00:05:50,540 --> 00:05:54,920 And if you were a woman and walked into the MIT Health 94 00:05:54,920 --> 00:05:58,460 Center and complained of potentially 95 00:05:58,460 --> 00:06:02,210 a urinary tract infection, they would take out 96 00:06:02,210 --> 00:06:05,720 this sheet of paper, which was nicely color-coded, 97 00:06:05,720 --> 00:06:08,120 and they would check a bunch of boxes. 98 00:06:08,120 --> 00:06:13,080 And if you hit a red box, that represented a conclusion. 99 00:06:13,080 --> 00:06:15,950 And otherwise, it gave you suggestions 100 00:06:15,950 --> 00:06:18,590 about what further tests to do. 101 00:06:18,590 --> 00:06:21,320 And this was essentially a triage instrument. 102 00:06:21,320 --> 00:06:25,040 It said, does this woman have a problem that 103 00:06:25,040 --> 00:06:26,960 requires immediate attention? 104 00:06:26,960 --> 00:06:30,560 And so we should either call an ambulance 105 00:06:30,560 --> 00:06:35,810 and take them to a hospital, or is it something where 106 00:06:35,810 --> 00:06:37,970 we can just tell them to come back the next day 107 00:06:37,970 --> 00:06:40,640 and see a doctor, or is it in fact 108 00:06:40,640 --> 00:06:44,480 some self-limited thing where we say, take two aspirin, 109 00:06:44,480 --> 00:06:46,170 and it'll go away. 110 00:06:46,170 --> 00:06:48,860 So that was the attempt here. 111 00:06:48,860 --> 00:06:51,110 Now, interestingly, if you look at the history 112 00:06:51,110 --> 00:06:54,770 of this project between the Beth Israel Hospital and Lincoln 113 00:06:54,770 --> 00:06:59,060 Laboratories, it started off as a computer aid. 114 00:06:59,060 --> 00:07:02,420 So they were building a computer system 115 00:07:02,420 --> 00:07:05,040 that was supposed to do this. 116 00:07:05,040 --> 00:07:09,180 And then in-- but you can imagine, in the late 1960s, 117 00:07:09,180 --> 00:07:13,970 early 1970s, computers were pretty clunky. 118 00:07:13,970 --> 00:07:15,900 PCs hadn't been invented yet. 119 00:07:15,900 --> 00:07:19,110 So this was like mainframe kinds of operations. 120 00:07:19,110 --> 00:07:21,680 It was very hard to use. 121 00:07:21,680 --> 00:07:25,610 And so they said, well, this is a small enough program 122 00:07:25,610 --> 00:07:28,870 that we can reduce it to about 20 flow sheets-- 123 00:07:28,870 --> 00:07:34,100 20 sheets like this, which they proceeded to print up. 124 00:07:34,100 --> 00:07:37,040 And I was amused, because in the-- 125 00:07:37,040 --> 00:07:40,930 around 1980, I was working in my office one night. 126 00:07:40,930 --> 00:07:43,130 And I got this splitting headache. 127 00:07:43,130 --> 00:07:45,530 And I went over to MIT medical. 128 00:07:45,530 --> 00:07:47,960 And sure enough, the nurse pulled out 129 00:07:47,960 --> 00:07:50,480 one of these sheets for headaches 130 00:07:50,480 --> 00:07:52,760 and went through it with me and decided 131 00:07:52,760 --> 00:07:56,420 that a couple of Tylenols should fix me. 132 00:07:56,420 --> 00:07:57,770 But it was interesting. 133 00:07:57,770 --> 00:08:00,740 So this was really in use for a while. 134 00:08:00,740 --> 00:08:04,640 Now, the difficulty with approaches like this, 135 00:08:04,640 --> 00:08:06,450 of which there have been many, many, 136 00:08:06,450 --> 00:08:11,180 many in the medical world, is that they're very fragile. 137 00:08:11,180 --> 00:08:13,620 They're very specific. 138 00:08:13,620 --> 00:08:18,410 They don't take account of unusual cases. 139 00:08:18,410 --> 00:08:22,100 And there's a lot of effort in coming to consensus 140 00:08:22,100 --> 00:08:23,730 to build these things. 141 00:08:23,730 --> 00:08:27,080 And then they're not necessarily useful for a long time. 142 00:08:27,080 --> 00:08:30,800 So MIT actually stopped using them shortly 143 00:08:30,800 --> 00:08:33,530 after my headache experience. 144 00:08:33,530 --> 00:08:36,710 But if you go over to a hospital and you 145 00:08:36,710 --> 00:08:41,900 look on the bookshelf of a junior doctor, 146 00:08:41,900 --> 00:08:44,600 you will still find manuals that look 147 00:08:44,600 --> 00:08:47,960 kind of like this that say, how do we 148 00:08:47,960 --> 00:08:50,960 deal with tropical diseases? 149 00:08:50,960 --> 00:08:53,450 So you ask a bunch of questions, and then 150 00:08:53,450 --> 00:08:56,990 depending on the branching logic of the flowchart, 151 00:08:56,990 --> 00:09:00,000 it'll tell you whether this is serious or not. 152 00:09:00,000 --> 00:09:02,690 And the reason is because if you do your medical training 153 00:09:02,690 --> 00:09:04,580 in Boston, you're not going to see 154 00:09:04,580 --> 00:09:06,800 very many tropical diseases. 155 00:09:06,800 --> 00:09:10,220 And so you don't have a base of experience 156 00:09:10,220 --> 00:09:13,370 on the basis of which you can learn and become 157 00:09:13,370 --> 00:09:14,630 an expert at doing it. 158 00:09:14,630 --> 00:09:16,970 And so they use this as a kind of cheat sheet. 159 00:09:20,070 --> 00:09:25,830 I mentioned that the association between diseases and symptoms 160 00:09:25,830 --> 00:09:29,850 is another important way of doing diagnosis. 161 00:09:29,850 --> 00:09:35,100 And I swear to you, there was a paper in the 1960s, I think, 162 00:09:35,100 --> 00:09:37,200 that actually proposed this. 163 00:09:37,200 --> 00:09:41,790 So if any of you have hung around ancient libraries, 164 00:09:41,790 --> 00:09:44,910 libraries used to have card catalogs that 165 00:09:44,910 --> 00:09:48,780 were physical pieces of paper, cardboard. 166 00:09:48,780 --> 00:09:55,170 And one of the things they did with these was each card 167 00:09:55,170 --> 00:09:57,270 would be a book. 168 00:09:57,270 --> 00:10:01,330 And then around the edges were a bunch of holes, 169 00:10:01,330 --> 00:10:04,770 and depending on categorizations of the book 170 00:10:04,770 --> 00:10:09,000 along various dimensions, like its Dewey decimal number, 171 00:10:09,000 --> 00:10:13,380 or the top digits of its Library of Congress 172 00:10:13,380 --> 00:10:19,030 number or something, they would punch out holes in the borders. 173 00:10:19,030 --> 00:10:21,090 And this allowed you to do a kind 174 00:10:21,090 --> 00:10:24,130 of easy sorting of these books. 175 00:10:24,130 --> 00:10:26,100 So if you've got a bunch of cards 176 00:10:26,100 --> 00:10:29,100 together when people were returning their books 177 00:10:29,100 --> 00:10:31,440 and you pulled a bunch of cards. 178 00:10:31,440 --> 00:10:34,320 And you wanted to find all the math books. 179 00:10:34,320 --> 00:10:38,040 So what you would do is you'd stick a needle through the hole 180 00:10:38,040 --> 00:10:42,300 that represented math books, and then you shake the pile, 181 00:10:42,300 --> 00:10:44,670 and all the math books would fall out 182 00:10:44,670 --> 00:10:46,890 because they had punched. 183 00:10:46,890 --> 00:10:52,120 So somebody seriously proposed this as a diagnostic algorithm. 184 00:10:52,120 --> 00:10:54,660 And in fact, implemented it. 185 00:10:54,660 --> 00:10:59,010 And was trying to even make money on it. 186 00:10:59,010 --> 00:11:03,390 I think this was an attempt at a commercial venture, where they 187 00:11:03,390 --> 00:11:07,440 were going to provide doctors with these library cards that 188 00:11:07,440 --> 00:11:09,390 represented diseases. 189 00:11:09,390 --> 00:11:12,390 And the holes now represented not mathematics 190 00:11:12,390 --> 00:11:15,570 versus literature, but they represented 191 00:11:15,570 --> 00:11:18,750 shortness of breath versus pain in the left ankle 192 00:11:18,750 --> 00:11:20,490 versus whatever. 193 00:11:20,490 --> 00:11:22,980 And again, as people came in and complained 194 00:11:22,980 --> 00:11:25,500 about some condition, you'd stick a needle 195 00:11:25,500 --> 00:11:27,820 through that condition and you'd shake, 196 00:11:27,820 --> 00:11:33,000 and up would come the cards that had that condition in common. 197 00:11:33,000 --> 00:11:37,500 So one of the obvious problems with this approach 198 00:11:37,500 --> 00:11:42,150 is that if you had two things wrong with you, then 199 00:11:42,150 --> 00:11:47,970 you would wind up with no cards very quickly, because nothing 200 00:11:47,970 --> 00:11:49,980 would fall out of the pile. 201 00:11:49,980 --> 00:11:51,350 So this didn't go anywhere. 202 00:11:51,350 --> 00:11:56,890 But interestingly, even in the late 1980s, 203 00:11:56,890 --> 00:12:01,560 I remember being asked by the board of directors of the New 204 00:12:01,560 --> 00:12:04,560 England Journal of Medicine to come to a meeting 205 00:12:04,560 --> 00:12:08,700 where they had gotten a pitch from somebody who was proposing 206 00:12:08,700 --> 00:12:11,730 essentially exactly this diagnostic model, 207 00:12:11,730 --> 00:12:16,680 except implemented in a computer now and not in these library 208 00:12:16,680 --> 00:12:17,670 cards. 209 00:12:17,670 --> 00:12:20,070 And they wanted to know whether this was something 210 00:12:20,070 --> 00:12:22,740 that they ought to get behind and invest in. 211 00:12:22,740 --> 00:12:24,600 And I and a bunch of my colleagues 212 00:12:24,600 --> 00:12:27,857 assured them that this was probably not a great idea 213 00:12:27,857 --> 00:12:29,940 and they should stay away from it, which they did. 214 00:12:32,940 --> 00:12:36,900 Well, a more sophisticated model is something like a Naive Bayes 215 00:12:36,900 --> 00:12:40,065 model that says if you have a disease-- 216 00:12:43,450 --> 00:12:44,390 where is my cursor? 217 00:12:44,390 --> 00:12:48,330 If you have a disease, and you have a bunch of manifestations 218 00:12:48,330 --> 00:12:50,850 that can be caused by the disease, 219 00:12:50,850 --> 00:12:53,700 we can make some simplifying assumptions 220 00:12:53,700 --> 00:12:55,860 that say that you will only ever have 221 00:12:55,860 --> 00:12:59,250 one disease at a time, which means 222 00:12:59,250 --> 00:13:05,400 that the values of that node D form an exhaustive and mutually 223 00:13:05,400 --> 00:13:08,460 exclusive set of values. 224 00:13:08,460 --> 00:13:10,830 And we can assume that the manifestations are 225 00:13:10,830 --> 00:13:14,970 conditionally independent observables that depend only 226 00:13:14,970 --> 00:13:17,970 on the disease that you have, but not on each other 227 00:13:17,970 --> 00:13:21,760 or not on any other factors. 228 00:13:21,760 --> 00:13:23,790 And if you make that assumption, then you 229 00:13:23,790 --> 00:13:27,960 can apply good old Thomas Bayes's rule. 230 00:13:27,960 --> 00:13:31,540 This, by the way, is the Reverend Bayes. 231 00:13:31,540 --> 00:13:35,310 Do you guys know his history? 232 00:13:35,310 --> 00:13:41,410 So he was a nonconformist minister in England. 233 00:13:41,410 --> 00:13:47,200 And he was not a mathematician, except I mean, 234 00:13:47,200 --> 00:13:49,360 he was an amateur mathematician. 235 00:13:49,360 --> 00:13:51,580 But he decided that he wanted to prove 236 00:13:51,580 --> 00:13:54,830 to people that God existed. 237 00:13:54,830 --> 00:13:57,070 And so he developed Bayesian reasoning 238 00:13:57,070 --> 00:13:59,470 in order to make this proof. 239 00:13:59,470 --> 00:14:02,310 And so his argument was, well, suppose 240 00:14:02,310 --> 00:14:04,330 you're completely in doubt. 241 00:14:04,330 --> 00:14:08,140 So you have 50/50 odds that God exists. 242 00:14:08,140 --> 00:14:12,220 And then you say, let's look at miracles. 243 00:14:12,220 --> 00:14:15,160 And let's ask, what's the likelihood of this miracle 244 00:14:15,160 --> 00:14:20,560 having occurred if God exists versus if God doesn't exist? 245 00:14:20,560 --> 00:14:23,560 And so by racking up a bunch of miracles, 246 00:14:23,560 --> 00:14:28,060 you can convince people more and more that God must exist, 247 00:14:28,060 --> 00:14:29,860 because otherwise all these miracles 248 00:14:29,860 --> 00:14:31,750 couldn't have happened. 249 00:14:31,750 --> 00:14:35,500 So he never publish this in his lifetime, but after his death 250 00:14:35,500 --> 00:14:38,110 one of his colleagues actually presented this 251 00:14:38,110 --> 00:14:42,580 as a paper at the Royal Society in the UK. 252 00:14:42,580 --> 00:14:46,930 And so Bayes became famous as the originator 253 00:14:46,930 --> 00:14:52,060 of this notion of how to do probabilistic reasoning about 254 00:14:52,060 --> 00:14:54,520 at least fairly simple situations, 255 00:14:54,520 --> 00:14:58,600 like in his case, the existence or nonexistence of God. 256 00:14:58,600 --> 00:15:01,840 Or in our case, the cause of some disease, 257 00:15:01,840 --> 00:15:04,340 the nature of some disease. 258 00:15:04,340 --> 00:15:05,980 And so you can draw these trees. 259 00:15:05,980 --> 00:15:08,420 And Bayes's rule is very simple. 260 00:15:08,420 --> 00:15:10,870 I'm sure you've all seen it. 261 00:15:10,870 --> 00:15:15,050 One thing that, again, makes contact with medicine 262 00:15:15,050 --> 00:15:19,040 is that a lot of times, you're not just interested 263 00:15:19,040 --> 00:15:22,670 in the impact of one observable on your probability 264 00:15:22,670 --> 00:15:25,520 distribution, but you're interested in the impact 265 00:15:25,520 --> 00:15:28,140 of a sequence of observations. 266 00:15:28,140 --> 00:15:31,320 And so one thing you can do is you can say, 267 00:15:31,320 --> 00:15:34,680 well, here is my general population. 268 00:15:34,680 --> 00:15:40,310 So let's say disease 2 has 37% prevalence and disease 1 269 00:15:40,310 --> 00:15:43,100 has 12%, et cetera. 270 00:15:43,100 --> 00:15:45,470 And now I make some observation. 271 00:15:45,470 --> 00:15:47,210 I apply Bayes's rule. 272 00:15:47,210 --> 00:15:50,980 And I revise my probability distribution. 273 00:15:50,980 --> 00:15:55,560 So this is the equivalent of finding a smaller population 274 00:15:55,560 --> 00:15:58,290 of patients who have all had whatever 275 00:15:58,290 --> 00:16:00,960 answer I got for symptom 1. 276 00:16:00,960 --> 00:16:03,100 And then I just keep doing that. 277 00:16:03,100 --> 00:16:06,660 And so this is the sequential application of Bayes's rule. 278 00:16:06,660 --> 00:16:10,410 And of course, it does depend on the conditional independence 279 00:16:10,410 --> 00:16:13,030 of all these symptoms. 280 00:16:13,030 --> 00:16:15,820 But in medicine, people don't like 281 00:16:15,820 --> 00:16:19,600 to do math, even arithmetic much. 282 00:16:19,600 --> 00:16:23,530 And they prefer doing addition rather than multiplication, 283 00:16:23,530 --> 00:16:25,400 because it's easier. 284 00:16:25,400 --> 00:16:27,590 And so what they've done is they said, 285 00:16:27,590 --> 00:16:30,820 well, instead of representing all this data 286 00:16:30,820 --> 00:16:36,550 in a probabilistic framework, let's represent it as odds. 287 00:16:36,550 --> 00:16:45,470 And if you represent it as odds, then the odds 288 00:16:45,470 --> 00:16:48,320 of some disease given a bunch of symptoms, 289 00:16:48,320 --> 00:16:50,520 given the independence assumption, 290 00:16:50,520 --> 00:16:53,720 is just the prior odds of the disease 291 00:16:53,720 --> 00:16:57,110 times the conditional odds, the likelihood 292 00:16:57,110 --> 00:17:00,560 ratio of each of the symptoms that you've observed. 293 00:17:00,560 --> 00:17:03,410 So you've just got to multiply these together. 294 00:17:03,410 --> 00:17:06,230 And then because they like adding more than multiplying, 295 00:17:06,230 --> 00:17:10,250 they said, let's take the log of both sides. 296 00:17:10,250 --> 00:17:12,650 And then you can just add them. 297 00:17:12,650 --> 00:17:18,109 And so if you remember when I was talking about medical data, 298 00:17:18,109 --> 00:17:20,540 there are things like the Glasgow Coma 299 00:17:20,540 --> 00:17:24,829 score, or the APACHE score, or various measures 300 00:17:24,829 --> 00:17:29,300 of how badly or well a patient is doing that often involve 301 00:17:29,300 --> 00:17:33,170 adding up numbers corresponding to different conditions 302 00:17:33,170 --> 00:17:34,590 that they have. 303 00:17:34,590 --> 00:17:37,160 And what they're doing is exactly this. 304 00:17:37,160 --> 00:17:40,730 They're applying sequentially Bayes's rule 305 00:17:40,730 --> 00:17:45,050 with these independence assumptions in the form of logs 306 00:17:45,050 --> 00:17:48,457 rather than multiplications, log odds, 307 00:17:48,457 --> 00:17:49,790 and that's how they're doing it. 308 00:17:52,460 --> 00:17:55,250 Very quickly. 309 00:17:55,250 --> 00:18:00,680 Somebody in a previous lecture was wondering about receiver 310 00:18:00,680 --> 00:18:02,990 operator characteristic curves. 311 00:18:02,990 --> 00:18:05,880 And I just wanted to give you a little bit of insight on those. 312 00:18:05,880 --> 00:18:10,880 So if you do a test on two populations of patients-- 313 00:18:10,880 --> 00:18:12,620 the red ones are sick patients. 314 00:18:12,620 --> 00:18:16,130 The blue ones are not sick patients. 315 00:18:16,130 --> 00:18:17,150 You do some test. 316 00:18:17,150 --> 00:18:20,750 What you expect is that the result of that test 317 00:18:20,750 --> 00:18:23,420 will be some continuous number, and it'll 318 00:18:23,420 --> 00:18:26,270 be distributed something like the blue distribution 319 00:18:26,270 --> 00:18:28,550 for the well patients and something 320 00:18:28,550 --> 00:18:31,550 like the red distribution for the ill patients. 321 00:18:31,550 --> 00:18:35,190 And typically, we choose some threshold. 322 00:18:35,190 --> 00:18:39,110 And we say, well, if you choose this 323 00:18:39,110 --> 00:18:43,040 to be the threshold between a prediction of sick 324 00:18:43,040 --> 00:18:46,430 or well, then what you're going to get 325 00:18:46,430 --> 00:18:49,130 is that the part of the blue distribution that 326 00:18:49,130 --> 00:18:52,820 lies to the right is the false positives 327 00:18:52,820 --> 00:18:54,830 and the part of the red distribution that 328 00:18:54,830 --> 00:18:59,240 lies to the left is the false negatives. 329 00:18:59,240 --> 00:19:03,500 And often people will choose the lowest point at which these two 330 00:19:03,500 --> 00:19:07,010 curves intersect as the threshold, but that, of course, 331 00:19:07,010 --> 00:19:09,440 isn't necessarily the case. 332 00:19:09,440 --> 00:19:11,630 Now, if I give you a better test, 333 00:19:11,630 --> 00:19:16,040 one like this, that's terrific, because there is essentially no 334 00:19:16,040 --> 00:19:17,180 overlap. 335 00:19:17,180 --> 00:19:21,860 Very small false negative and false positive rates. 336 00:19:21,860 --> 00:19:23,690 And as I said, you can choose to put 337 00:19:23,690 --> 00:19:26,090 the threshold in different places, 338 00:19:26,090 --> 00:19:27,830 depending on how you want to trade off 339 00:19:27,830 --> 00:19:30,380 sensitivity and specificity. 340 00:19:30,380 --> 00:19:33,350 And we measure this by this receiver operator 341 00:19:33,350 --> 00:19:38,000 characteristics curve, which has the general form 342 00:19:38,000 --> 00:19:42,170 that if you get a curve like this, 343 00:19:42,170 --> 00:19:45,470 that means that there's an exact trade-off for sensitivity 344 00:19:45,470 --> 00:19:50,220 and specificity, which is the case if you're flipping coins. 345 00:19:50,220 --> 00:19:52,070 So it's random. 346 00:19:52,070 --> 00:19:56,400 And of course, if you manage to hit the top corner up there, 347 00:19:56,400 --> 00:19:58,430 that means that there would be no overlap 348 00:19:58,430 --> 00:20:01,190 whatsoever between the two distributions, 349 00:20:01,190 --> 00:20:03,350 and you would get a perfect result. 350 00:20:03,350 --> 00:20:06,570 And so typically you get something in between. 351 00:20:06,570 --> 00:20:09,110 And so normally, if you do a study 352 00:20:09,110 --> 00:20:13,760 and your AUC, the area under this receiver operator 353 00:20:13,760 --> 00:20:17,930 characteristics curve, is barely over a half, 354 00:20:17,930 --> 00:20:20,150 you're pretty close to worthless, 355 00:20:20,150 --> 00:20:22,160 whereas if it's close to 1, then you 356 00:20:22,160 --> 00:20:25,880 have a really good method for distinguishing 357 00:20:25,880 --> 00:20:27,935 these categories of patients. 358 00:20:31,040 --> 00:20:32,395 Next topic. 359 00:20:32,395 --> 00:20:33,770 What does it mean to be rational? 360 00:20:38,084 --> 00:20:40,320 I should have a philosophy course here. 361 00:20:43,164 --> 00:20:44,810 AUDIENCE: Are you talking about pi? 362 00:20:44,810 --> 00:20:45,200 PETER SZOLOVITS: Sorry. 363 00:20:45,200 --> 00:20:46,658 AUDIENCE: Are you talking about pi? 364 00:20:46,658 --> 00:20:47,870 Pi is-- 365 00:20:47,870 --> 00:20:49,910 PETER SZOLOVITS: Pi is irrational, 366 00:20:49,910 --> 00:20:51,560 but that's not what I'm talking about. 367 00:20:54,170 --> 00:20:56,870 Well, so there is this principle of rationality 368 00:20:56,870 --> 00:21:01,940 that says that what you want to do is to act in such a way 369 00:21:01,940 --> 00:21:04,950 as to maximize your expected utility. 370 00:21:04,950 --> 00:21:07,100 So for example, if you're a gambler 371 00:21:07,100 --> 00:21:11,270 and you have a choice of various ways of betting in some poker 372 00:21:11,270 --> 00:21:15,200 game or something, then if you were a perfect calculator 373 00:21:15,200 --> 00:21:20,960 of the odds of getting a queen on your next draw, 374 00:21:20,960 --> 00:21:23,360 then you could make some rational decision 375 00:21:23,360 --> 00:21:26,180 about whether to bet more or less, 376 00:21:26,180 --> 00:21:31,550 but you'd also have to take into account things like, "How could 377 00:21:31,550 --> 00:21:38,480 I convince my opponent that I am not bluffing if I am bluffing?" 378 00:21:38,480 --> 00:21:42,080 and "How could I convince them that I'm bluffing if I'm not 379 00:21:42,080 --> 00:21:44,040 bluffing?" and so on. 380 00:21:44,040 --> 00:21:47,130 So there is a complicated model there. 381 00:21:47,130 --> 00:21:49,100 But nevertheless, the idea is that you 382 00:21:49,100 --> 00:21:51,440 should behave in a way that will give you 383 00:21:51,440 --> 00:21:54,450 the best expected outcome. 384 00:21:54,450 --> 00:21:58,490 And so people joke that this is Homo economicus, 385 00:21:58,490 --> 00:22:01,550 because economists make the assumption that this 386 00:22:01,550 --> 00:22:03,680 is how people behave. 387 00:22:03,680 --> 00:22:07,140 And we now know that that's not really how people behave. 388 00:22:07,140 --> 00:22:10,310 But it's a pretty common model of their behavior, 389 00:22:10,310 --> 00:22:12,530 because it's easy to compute, and it 390 00:22:12,530 --> 00:22:16,520 has some appropriate characteristics. 391 00:22:16,520 --> 00:22:20,180 So as I mentioned, every action has a cost. 392 00:22:20,180 --> 00:22:23,540 And utility measures the value or the goodness 393 00:22:23,540 --> 00:22:27,260 of some outcome, which is the amount of money you've won, 394 00:22:27,260 --> 00:22:31,550 or whether you live or die, or quality adjusted life years, 395 00:22:31,550 --> 00:22:35,780 or various other measures of utility-- 396 00:22:35,780 --> 00:22:39,370 how much it costs for your hospitalization. 397 00:22:39,370 --> 00:22:40,770 So let me give you an example. 398 00:22:40,770 --> 00:22:45,200 This actually comes from a decision analysis service 399 00:22:45,200 --> 00:22:47,930 at New England Medical Center Tufts 400 00:22:47,930 --> 00:22:50,580 Hospital in the late 1970s. 401 00:22:50,580 --> 00:22:53,300 So this was an elderly Chinese gentleman 402 00:22:53,300 --> 00:22:55,460 whose foot had gangrene. 403 00:22:55,460 --> 00:23:00,170 So gangrene is an infection that usually people 404 00:23:00,170 --> 00:23:04,010 who have bad circulation can get these. 405 00:23:04,010 --> 00:23:06,590 And what he was facing was a choice 406 00:23:06,590 --> 00:23:09,680 of whether to amputate his foot or to try 407 00:23:09,680 --> 00:23:12,260 to treat him medically. 408 00:23:12,260 --> 00:23:16,130 To treat him medically means injecting antibiotics 409 00:23:16,130 --> 00:23:20,090 into his system and hoping that his circulation is good enough 410 00:23:20,090 --> 00:23:23,490 to get them to the infected areas. 411 00:23:23,490 --> 00:23:28,520 And so the choice becomes a little more complicated, 412 00:23:28,520 --> 00:23:31,970 because if the medical treatment fails, then, of course, 413 00:23:31,970 --> 00:23:35,360 the patient may die, a bad outcome. 414 00:23:35,360 --> 00:23:38,420 Or you may have to now amputate the whole leg, 415 00:23:38,420 --> 00:23:42,560 because the gangrene has spread from his foot up the foot, 416 00:23:42,560 --> 00:23:45,240 and now you're cutting off his leg. 417 00:23:45,240 --> 00:23:46,400 So what should you do? 418 00:23:46,400 --> 00:23:49,040 And how should you reason about this? 419 00:23:49,040 --> 00:23:55,370 So Pauker's staff came up with this decision tree. 420 00:23:55,370 --> 00:23:59,360 By the way, decision tree in this literature means something 421 00:23:59,360 --> 00:24:05,120 different from decision tree in like C4.5. 422 00:24:05,120 --> 00:24:08,510 So your choices here are to amputate the foot 423 00:24:08,510 --> 00:24:11,430 or start with medical care. 424 00:24:11,430 --> 00:24:13,520 And if you amputate the foot, let's 425 00:24:13,520 --> 00:24:17,730 say there is a 99% chance that the patient will live. 426 00:24:17,730 --> 00:24:23,120 There's a 1% chance that maybe the anesthesia will kill him. 427 00:24:23,120 --> 00:24:27,290 And if we treat him medically, they 428 00:24:27,290 --> 00:24:30,830 estimated that there is a 70% chance of full recovery, 429 00:24:30,830 --> 00:24:35,180 a 25% chance that he'd get worse, a 5% chance 430 00:24:35,180 --> 00:24:37,730 that he would die. 431 00:24:37,730 --> 00:24:42,020 If he got worse, you're now faced with another decision, 432 00:24:42,020 --> 00:24:44,330 which is, do we amputate the whole leg 433 00:24:44,330 --> 00:24:47,360 or continue pushing medicine? 434 00:24:47,360 --> 00:24:50,570 And again, there are various outcomes with various estimated 435 00:24:50,570 --> 00:24:53,130 probabilities. 436 00:24:53,130 --> 00:24:57,470 Now, the critical thing here that this group was pushing 437 00:24:57,470 --> 00:24:59,900 was the idea that these decisions 438 00:24:59,900 --> 00:25:04,190 shouldn't be based on what the doctor thinks is good for you. 439 00:25:04,190 --> 00:25:07,580 They should be based on what you think is good for you. 440 00:25:07,580 --> 00:25:09,860 And so they worked very hard to try 441 00:25:09,860 --> 00:25:15,870 to elicit individualized utilities from patients. 442 00:25:15,870 --> 00:25:22,280 So for example, this guy said that having your foot amputated 443 00:25:22,280 --> 00:25:27,470 was worth 850 points on a scale of 1,000 where being healthy 444 00:25:27,470 --> 00:25:33,050 was 1 and being dead was 0. 445 00:25:33,050 --> 00:25:35,180 Now, you could imagine that that number 446 00:25:35,180 --> 00:25:38,630 would be very different for different individuals. 447 00:25:38,630 --> 00:25:41,240 If you asked LeBron James how bad would it 448 00:25:41,240 --> 00:25:43,710 be to have your foot amputated, he 449 00:25:43,710 --> 00:25:46,420 might think that it's much worse than I would, 450 00:25:46,420 --> 00:25:51,150 because it would be a pain to have my foot amputated, 451 00:25:51,150 --> 00:25:53,310 but I could still do most of the things 452 00:25:53,310 --> 00:25:57,660 that I do professionally, whereas he probably couldn't 453 00:25:57,660 --> 00:26:02,170 as a star basketball player. 454 00:26:02,170 --> 00:26:04,100 So how do you solve a problem like this? 455 00:26:04,100 --> 00:26:07,680 Well, you say, OK, at every chance node 456 00:26:07,680 --> 00:26:12,550 I can calculate the expected value of what happens here. 457 00:26:12,550 --> 00:26:17,400 So here at it's 0.6 times 995, 0.4 times 0. 458 00:26:17,400 --> 00:26:20,100 That gets me a value for this decision. 459 00:26:20,100 --> 00:26:21,900 Do the same thing here. 460 00:26:21,900 --> 00:26:26,040 I compare the values here and choose the best one. 461 00:26:26,040 --> 00:26:28,630 That gives me a value for this decision. 462 00:26:28,630 --> 00:26:32,880 And so I fold back this decision tree. 463 00:26:32,880 --> 00:26:36,920 And my next slide should have-- 464 00:26:36,920 --> 00:26:39,840 yeah, so these are the numbers that you get. 465 00:26:39,840 --> 00:26:42,870 And what you discover is that the utility 466 00:26:42,870 --> 00:26:45,840 of trying medical treatment is somewhat higher 467 00:26:45,840 --> 00:26:49,320 than the utility of immediately amputating the foot 468 00:26:49,320 --> 00:26:53,430 if you believe these numbers and those utilities, 469 00:26:53,430 --> 00:26:57,000 these probabilities and those utilities. 470 00:26:57,000 --> 00:27:01,560 Now, the difficulty is that these numbers are fickle. 471 00:27:01,560 --> 00:27:05,400 And so you'd like to do some sort of sensitivity analysis. 472 00:27:05,400 --> 00:27:08,340 And you say, for example, what if this gentleman 473 00:27:08,340 --> 00:27:12,420 valued his living with an amputated foot 474 00:27:12,420 --> 00:27:15,390 at 900 rather than 850. 475 00:27:15,390 --> 00:27:19,020 And now you discover that amputating the foot 476 00:27:19,020 --> 00:27:24,450 looks like a slightly better decision than the other. 477 00:27:24,450 --> 00:27:30,630 So this is actually applied in clinical medicine. 478 00:27:30,630 --> 00:27:33,450 And there are now thousands of doctors 479 00:27:33,450 --> 00:27:35,880 who have been trained in these techniques 480 00:27:35,880 --> 00:27:39,720 and really try to work through this with individual patients. 481 00:27:39,720 --> 00:27:43,530 Of course, it's used much more on an epidemiological basis 482 00:27:43,530 --> 00:27:46,123 when people look at large populations. 483 00:27:46,123 --> 00:27:47,290 AUDIENCE: I have a question. 484 00:27:47,290 --> 00:27:48,207 PETER SZOLOVITS: Yeah. 485 00:27:48,207 --> 00:27:50,560 AUDIENCE: How are the probabilities assessed? 486 00:27:55,260 --> 00:27:56,760 PETER SZOLOVITS: So the service that 487 00:27:56,760 --> 00:27:59,610 did this study would read the literature, 488 00:27:59,610 --> 00:28:01,530 and they would look in databases. 489 00:28:01,530 --> 00:28:04,410 And they would try to estimate those probabilities. 490 00:28:04,410 --> 00:28:08,520 We can do a lot better today than they could at that time, 491 00:28:08,520 --> 00:28:12,030 because we have a lot more data that you can look in. 492 00:28:12,030 --> 00:28:15,530 But you could say, OK, for people-- 493 00:28:15,530 --> 00:28:20,130 men of this age who have gangrenous feet, what 494 00:28:20,130 --> 00:28:23,910 fraction of them have the following experience? 495 00:28:23,910 --> 00:28:26,910 And that's how these are estimated. 496 00:28:26,910 --> 00:28:30,720 Some of it feels like 5%. 497 00:28:38,220 --> 00:28:38,720 OK. 498 00:28:38,720 --> 00:28:42,205 So I just said this. 499 00:28:42,205 --> 00:28:43,580 And then the question of where do 500 00:28:43,580 --> 00:28:47,640 you get these utilities is a tricky one. 501 00:28:47,640 --> 00:28:51,770 So one way is to do the standard gamble, which 502 00:28:51,770 --> 00:29:00,320 says, OK, Mr. Szolovits, we're going to play this game. 503 00:29:00,320 --> 00:29:07,490 We're going to roll a fair die or something that will come up 504 00:29:07,490 --> 00:29:11,570 with some continuous number between 0 and 1, 505 00:29:11,570 --> 00:29:18,710 and then I'm going to play the game where either I chop off 506 00:29:18,710 --> 00:29:23,330 your foot, or I roll this die and if it 507 00:29:23,330 --> 00:29:29,300 exceeds some threshold, then I kill you. 508 00:29:29,300 --> 00:29:32,300 Nice game. 509 00:29:32,300 --> 00:29:37,790 So now if you find the point at which I'm indifferent, 510 00:29:37,790 --> 00:29:45,460 if I say, well, 0.8, that's a 20% chance of dying. 511 00:29:45,460 --> 00:29:47,780 It seems like a lot. 512 00:29:47,780 --> 00:29:51,920 But maybe I'll go for 0.9, right? 513 00:29:51,920 --> 00:29:53,720 Now you've said, OK, well, that means 514 00:29:53,720 --> 00:29:56,870 that you value living without a foot 515 00:29:56,870 --> 00:30:01,710 at 0.9 of the value of being healthy. 516 00:30:01,710 --> 00:30:03,660 So this is a way of doing it. 517 00:30:03,660 --> 00:30:06,070 And this is typically done. 518 00:30:06,070 --> 00:30:10,445 Unfortunately, of course, it's difficult to ascertain 519 00:30:10,445 --> 00:30:11,810 the problem. 520 00:30:11,810 --> 00:30:13,260 And it's also not stable. 521 00:30:13,260 --> 00:30:17,210 So people have done experiments where they get somebody 522 00:30:17,210 --> 00:30:20,690 to give them this kind of number as a hypothetical, 523 00:30:20,690 --> 00:30:23,390 and then when that person winds up actually 524 00:30:23,390 --> 00:30:26,570 faced with such a decision, they no longer 525 00:30:26,570 --> 00:30:29,060 will abide with that number. 526 00:30:29,060 --> 00:30:34,012 So they've changed their mind when the situation is real. 527 00:30:34,012 --> 00:30:36,470 AUDIENCE: But it's nice, because there are two feet, right? 528 00:30:36,470 --> 00:30:39,707 So you could run this experiment and see. 529 00:30:39,707 --> 00:30:41,540 PETER SZOLOVITS: They didn't actually do it. 530 00:30:41,540 --> 00:30:43,810 It was hypothetical. 531 00:30:46,640 --> 00:30:48,020 OK. 532 00:30:48,020 --> 00:30:50,900 Next program I want to tell you about, again, 533 00:30:50,900 --> 00:30:54,770 the technique for this was developed as a PhD thesis 534 00:30:54,770 --> 00:30:57,290 here at MIT in 1967. 535 00:30:57,290 --> 00:30:59,970 So this is hot off the presses. 536 00:30:59,970 --> 00:31:04,190 But it's still used, this type of idea. 537 00:31:04,190 --> 00:31:08,780 And so this was a program that was published in the American 538 00:31:08,780 --> 00:31:12,710 Journal of Medicine, which is a high impact medical journal. 539 00:31:12,710 --> 00:31:14,750 I think this was actually the first sort 540 00:31:14,750 --> 00:31:18,680 of computational program that journal had ever 541 00:31:18,680 --> 00:31:21,980 published as a medical journal. 542 00:31:21,980 --> 00:31:24,260 And it was addressed at the problem 543 00:31:24,260 --> 00:31:28,650 of the diagnosis of acute oliguric renal failure. 544 00:31:28,650 --> 00:31:32,030 Oliguric means you're not peeing enough. 545 00:31:32,030 --> 00:31:33,870 Renal is your kidney. 546 00:31:33,870 --> 00:31:36,650 So this is something's gone wrong with your kidney, 547 00:31:36,650 --> 00:31:40,160 and you're not producing enough urine. 548 00:31:40,160 --> 00:31:45,890 Now, this is a good problem to address with these techniques, 549 00:31:45,890 --> 00:31:49,970 because if something happens to you suddenly, 550 00:31:49,970 --> 00:31:54,950 it's very likely that there is one cause for it. 551 00:31:54,950 --> 00:31:59,000 If you are 85 years old and you have a little heart 552 00:31:59,000 --> 00:32:01,730 disease and a little kidney disease 553 00:32:01,730 --> 00:32:05,660 and a little liver disease and a little lung disease, 554 00:32:05,660 --> 00:32:08,750 there's no guarantee that there was one thing that went wrong 555 00:32:08,750 --> 00:32:11,310 with you that caused all these. 556 00:32:11,310 --> 00:32:15,290 But if you were OK yesterday and then you stopped peeing, 557 00:32:15,290 --> 00:32:18,790 it's pretty likely that there's one thing that's gone wrong. 558 00:32:18,790 --> 00:32:22,260 So it's a good application of this model. 559 00:32:22,260 --> 00:32:25,520 So what they said is there are 14 potential causes. 560 00:32:25,520 --> 00:32:28,830 And these are exhaustive and mutually exclusive. 561 00:32:28,830 --> 00:32:33,080 There are 27 tests or questions or observations 562 00:32:33,080 --> 00:32:35,570 that are relevant to the differential. 563 00:32:35,570 --> 00:32:37,520 These are cheap tests, so they didn't 564 00:32:37,520 --> 00:32:41,150 involve doing anything either expensive or dangerous 565 00:32:41,150 --> 00:32:41,960 to the patient. 566 00:32:41,960 --> 00:32:44,210 It was measuring something in the lab 567 00:32:44,210 --> 00:32:47,930 or asking questions of the patient. 568 00:32:47,930 --> 00:32:50,750 But they didn't want to have to ask all of them, 569 00:32:50,750 --> 00:32:53,060 because that's pretty tedious. 570 00:32:53,060 --> 00:32:55,070 And so they were trying to minimize 571 00:32:55,070 --> 00:32:56,600 the amount of information that they 572 00:32:56,600 --> 00:32:59,090 needed to gather in order to come up 573 00:32:59,090 --> 00:33:01,610 with an appropriate decision. 574 00:33:01,610 --> 00:33:05,360 Now, the real problem, there were three invasive tests 575 00:33:05,360 --> 00:33:07,940 that are dangerous and expensive, 576 00:33:07,940 --> 00:33:09,710 and then eight different treatments 577 00:33:09,710 --> 00:33:11,360 that could be applied. 578 00:33:11,360 --> 00:33:14,390 And I'm only going to tell you about the first part 579 00:33:14,390 --> 00:33:16,730 of this problem. 580 00:33:16,730 --> 00:33:21,750 This 1973 article shows you what the program looked like. 581 00:33:21,750 --> 00:33:25,430 It was a computer terminal where it gave you choices, 582 00:33:25,430 --> 00:33:27,980 and you would type in an answer. 583 00:33:27,980 --> 00:33:32,330 And so that was the state of the art at the time. 584 00:33:32,330 --> 00:33:35,390 But what I'm going to do is, god willing, 585 00:33:35,390 --> 00:33:39,080 I'm going to demonstrate a reconstruction 586 00:33:39,080 --> 00:33:40,505 that I made of this program. 587 00:33:43,440 --> 00:33:51,310 So these guys are the potential causes of stopping to pee-- 588 00:33:51,310 --> 00:33:55,700 acute tubular necrosis, functional acute renal failure, 589 00:33:55,700 --> 00:34:00,920 urinary tract obstruction, acute glomerulonephritis, et cetera. 590 00:34:00,920 --> 00:34:04,137 And these are the prior probabilities. 591 00:34:04,137 --> 00:34:05,720 Now, I have to warn you, these numbers 592 00:34:05,720 --> 00:34:09,110 were, in fact, estimated by people sticking their finger 593 00:34:09,110 --> 00:34:12,949 in the air and figuring out which way the wind was blowing, 594 00:34:12,949 --> 00:34:18,440 because in 1973, there were not great databases that you could 595 00:34:18,440 --> 00:34:20,360 turn to. 596 00:34:20,360 --> 00:34:23,060 And then these are the questions that 597 00:34:23,060 --> 00:34:25,730 were available to be asked. 598 00:34:25,730 --> 00:34:27,620 And what you see in the first column, 599 00:34:27,620 --> 00:34:31,790 at least if you're sitting close to the screen, 600 00:34:31,790 --> 00:34:37,219 is the expected entropy of the probability distribution 601 00:34:37,219 --> 00:34:41,010 if you answered this question. 602 00:34:41,010 --> 00:34:45,170 So this is basically saying, if I ask this question, how likely 603 00:34:45,170 --> 00:34:49,670 is each of the possible answers, given my disease distribution 604 00:34:49,670 --> 00:34:51,290 probabilities? 605 00:34:51,290 --> 00:34:53,340 And then for each of those answers, 606 00:34:53,340 --> 00:34:57,530 I do a Bayesian revision, then I weight the entropy 607 00:34:57,530 --> 00:35:03,170 of that resulting distribution by the probability of getting 608 00:35:03,170 --> 00:35:04,310 that answer. 609 00:35:04,310 --> 00:35:06,350 And that gets me the expected entropy 610 00:35:06,350 --> 00:35:08,300 for asking that question. 611 00:35:08,300 --> 00:35:11,150 And the idea is that the lower the expected entropy, 612 00:35:11,150 --> 00:35:14,040 the more valuable the question. 613 00:35:14,040 --> 00:35:15,420 Makes sense. 614 00:35:15,420 --> 00:35:19,640 So if we look, for example, the most valuable question 615 00:35:19,640 --> 00:35:25,190 is, what was the blood pressure at the onset of oliguria? 616 00:35:25,190 --> 00:35:35,630 And I can click on this and say it was, let's say, 617 00:35:35,630 --> 00:35:37,900 moderately elevated. 618 00:35:37,900 --> 00:35:39,920 And what this little colorful graph 619 00:35:39,920 --> 00:35:45,110 is showing you is that if you look at the initial probability 620 00:35:45,110 --> 00:35:51,050 distribution, acute tubular necrosis was about 25%, 621 00:35:51,050 --> 00:35:54,260 and has gone down to a very small amount, 622 00:35:54,260 --> 00:35:56,630 whereas some of these others have grown 623 00:35:56,630 --> 00:35:58,220 in importance considerably. 624 00:36:02,950 --> 00:36:07,150 So we can answer more questions, we can say-- 625 00:36:07,150 --> 00:36:08,620 let's see. 626 00:36:08,620 --> 00:36:10,840 What is the degree-- 627 00:36:10,840 --> 00:36:12,160 is there proteinuria? 628 00:36:12,160 --> 00:36:14,630 Is there protein in the urine? 629 00:36:14,630 --> 00:36:18,206 And we say, no, there isn't. 630 00:36:18,206 --> 00:36:20,320 I think we say, no, there isn't. 631 00:36:20,320 --> 00:36:22,600 0. 632 00:36:22,600 --> 00:36:25,770 And that revises the probability distribution. 633 00:36:25,770 --> 00:36:30,460 And then it says the next most important thing is kidney size. 634 00:36:30,460 --> 00:36:34,040 And we say-- let's say the kidney size is normal. 635 00:36:34,040 --> 00:36:38,060 So now all of a sudden functional acute renal failure, 636 00:36:38,060 --> 00:36:40,390 which, by the way, is one of these funny medical care 637 00:36:40,390 --> 00:36:43,150 categories that says it doesn't work well, 638 00:36:43,150 --> 00:36:46,210 doesn't explain to why it doesn't work well, 639 00:36:46,210 --> 00:36:49,280 but it's sort of a generic thing. 640 00:36:49,280 --> 00:36:51,080 And sure enough. 641 00:36:51,080 --> 00:36:54,460 We can keep answering questions about, 642 00:36:54,460 --> 00:36:58,360 are you producing less than 50 ccs 643 00:36:58,360 --> 00:37:02,200 of urine, which is a tiny amount, or somewhere between 50 644 00:37:02,200 --> 00:37:04,090 and 400? 645 00:37:04,090 --> 00:37:07,060 Remember, this is for people who are not producing enough. 646 00:37:07,060 --> 00:37:10,120 So normally you'd be over 400. 647 00:37:10,120 --> 00:37:12,190 So these are the only choices. 648 00:37:12,190 --> 00:37:13,510 So let's say it's moderate. 649 00:37:13,510 --> 00:37:17,500 And so you see the probability distribution keeps changing. 650 00:37:17,500 --> 00:37:20,470 And what happened in the original program 651 00:37:20,470 --> 00:37:22,720 is they had an arbitrary threshold that 652 00:37:22,720 --> 00:37:28,240 said when the probability of one of these causes of the disease 653 00:37:28,240 --> 00:37:33,940 reaches 95%, then we switch to a different mode, 654 00:37:33,940 --> 00:37:36,910 where now we're actually willing to contemplate 655 00:37:36,910 --> 00:37:43,240 doing the expensive tests and doing the expensive treatments. 656 00:37:43,240 --> 00:37:45,490 And we build the decision tree, as we 657 00:37:45,490 --> 00:37:48,370 saw in the case of the gangrenous foot, that 658 00:37:48,370 --> 00:37:52,280 figures out which of those is the optimal approach. 659 00:37:52,280 --> 00:37:55,600 So the idea here was because building a decision 660 00:37:55,600 --> 00:38:02,590 tree with 27 potential questions becomes enormously bushy, 661 00:38:02,590 --> 00:38:04,840 we're using a heuristic that says 662 00:38:04,840 --> 00:38:09,460 information maximization or entropy reduction 663 00:38:09,460 --> 00:38:13,240 is a reasonable way of focusing in on what's 664 00:38:13,240 --> 00:38:14,990 wrong with this patient. 665 00:38:14,990 --> 00:38:17,470 And then once we focused in pretty well, then 666 00:38:17,470 --> 00:38:20,620 we can begin to do more detailed analysis 667 00:38:20,620 --> 00:38:25,090 on the remaining more consequential and more costly 668 00:38:25,090 --> 00:38:27,940 tests that are available. 669 00:38:27,940 --> 00:38:31,510 Now, this program didn't work terribly well, 670 00:38:31,510 --> 00:38:36,850 because the numbers were badly estimated, 671 00:38:36,850 --> 00:38:44,590 and also because of the utility model that they 672 00:38:44,590 --> 00:38:50,820 had for the decision analytic part was particularly terrible. 673 00:38:50,820 --> 00:38:54,640 It didn't really reflect anything in the real world. 674 00:38:54,640 --> 00:38:58,420 They had an incremental utility model that said the patient 675 00:38:58,420 --> 00:39:01,900 either got better, or stayed the same, or got worse. 676 00:39:01,900 --> 00:39:06,070 And obviously in that order of utilities, 677 00:39:06,070 --> 00:39:09,460 but they didn't correspond to how much better he got 678 00:39:09,460 --> 00:39:10,960 or how much worse he got. 679 00:39:10,960 --> 00:39:14,020 And so it wasn't terribly useful. 680 00:39:14,020 --> 00:39:19,450 So nevertheless, in the 1990s, I was teaching a tutorial 681 00:39:19,450 --> 00:39:21,720 at a Medical Informatics conference, 682 00:39:21,720 --> 00:39:23,910 and there were a bunch of doctors in the audience. 683 00:39:23,910 --> 00:39:25,840 And I showed them this program. 684 00:39:25,840 --> 00:39:28,870 And one of the doctors came up afterwards and said, wow, 685 00:39:28,870 --> 00:39:30,430 it thinks just the way I do. 686 00:39:33,220 --> 00:39:34,780 And I said, really? 687 00:39:37,990 --> 00:39:39,760 I don't think so. 688 00:39:39,760 --> 00:39:43,630 But clearly, it was doing something 689 00:39:43,630 --> 00:39:45,760 that corresponded to the way that he 690 00:39:45,760 --> 00:39:48,550 thought about these cases. 691 00:39:48,550 --> 00:39:51,730 So I thought that was a good thing. 692 00:39:51,730 --> 00:39:52,360 All right. 693 00:39:52,360 --> 00:39:55,300 Well, what happens if we can't assume 694 00:39:55,300 --> 00:39:58,210 that there's just a single disease underlying the person's 695 00:39:58,210 --> 00:40:00,220 problems? 696 00:40:00,220 --> 00:40:03,730 If there are multiple diseases, we 697 00:40:03,730 --> 00:40:06,760 can build this kind of bipartite model 698 00:40:06,760 --> 00:40:08,810 that says we have a list of diseases 699 00:40:08,810 --> 00:40:10,630 and we have a list of manifestations. 700 00:40:10,630 --> 00:40:13,570 And some subset of the diseases can 701 00:40:13,570 --> 00:40:16,240 cause some subset of the symptoms, 702 00:40:16,240 --> 00:40:18,790 of the manifestations. 703 00:40:18,790 --> 00:40:21,250 And so the manifestations depend only 704 00:40:21,250 --> 00:40:24,550 on the diseases that are present, not on each other. 705 00:40:24,550 --> 00:40:27,440 And therefore, we have conditional independence. 706 00:40:27,440 --> 00:40:30,460 And this is a type of Bayesian network, which 707 00:40:30,460 --> 00:40:36,190 can't be solved exactly because of the computational 708 00:40:36,190 --> 00:40:37,390 complexity. 709 00:40:37,390 --> 00:40:39,640 So a program I'll show you in a minute 710 00:40:39,640 --> 00:40:46,360 had 400 or 500 diseases and thousands of manifestations. 711 00:40:46,360 --> 00:40:50,080 And the computational complexity of exact solution techniques 712 00:40:50,080 --> 00:40:53,260 for these networks tends to go exponentially 713 00:40:53,260 --> 00:40:56,590 with the number of undirected cycles in the network. 714 00:40:56,590 --> 00:40:59,530 And of course, there are plenty of undirected cycles 715 00:40:59,530 --> 00:41:02,960 in a network like that. 716 00:41:02,960 --> 00:41:07,390 So there was a program developed originally 717 00:41:07,390 --> 00:41:12,490 in the early 1970s called Dialog. 718 00:41:12,490 --> 00:41:15,590 And then they got sued, because somebody owned that name. 719 00:41:15,590 --> 00:41:17,170 And then they called it Internist, 720 00:41:17,170 --> 00:41:21,530 and they got sued because somebody owned that name. 721 00:41:21,530 --> 00:41:23,800 And then they called it QMR, which 722 00:41:23,800 --> 00:41:25,750 stands for Quick Medical Reference, 723 00:41:25,750 --> 00:41:28,150 and nobody owned that name. 724 00:41:28,150 --> 00:41:33,430 So around 1982, this program had about 500 diseases, 725 00:41:33,430 --> 00:41:37,840 which they estimated represented about 70% to 75% 726 00:41:37,840 --> 00:41:41,020 of major diagnoses in internal medicine, about 727 00:41:41,020 --> 00:41:43,060 3,500 manifestations. 728 00:41:43,060 --> 00:41:47,950 And it took about 15 man years of manual effort 729 00:41:47,950 --> 00:41:51,850 to sit there and read medical textbooks and journal articles 730 00:41:51,850 --> 00:41:56,440 and look at records of patients in their hospital. 731 00:41:56,440 --> 00:42:00,370 The effort was led by a computer scientist 732 00:42:00,370 --> 00:42:04,270 at the University of Pittsburgh and the chief of medicine 733 00:42:04,270 --> 00:42:09,010 at UPMC, the University of Pittsburgh Medical Center, who 734 00:42:09,010 --> 00:42:10,480 was just a fanatic. 735 00:42:10,480 --> 00:42:14,260 And he got all the medical school trainees 736 00:42:14,260 --> 00:42:19,480 to spend hours and hours coming up with these databases. 737 00:42:19,480 --> 00:42:23,350 By 1997, they had commercialized it 738 00:42:23,350 --> 00:42:27,170 through a company that had bought the rights to it. 739 00:42:27,170 --> 00:42:29,620 And they had-- that company had expanded it 740 00:42:29,620 --> 00:42:35,870 to about 750 diagnoses and about 5,500 manifestations. 741 00:42:35,870 --> 00:42:38,740 So they made it considerably larger. 742 00:42:38,740 --> 00:42:43,820 Details are-- I've tried to put references on all the slides. 743 00:42:43,820 --> 00:42:47,590 So here's what data in QMR looks like. 744 00:42:47,590 --> 00:42:50,440 For each diagnosis, there is a list 745 00:42:50,440 --> 00:42:54,040 of associated manifestations with evoking 746 00:42:54,040 --> 00:42:56,030 strengths and frequencies. 747 00:42:56,030 --> 00:42:58,900 So I'll explain that in a minute. 748 00:42:58,900 --> 00:43:04,270 On average, there are about 75 manifestations per disease. 749 00:43:04,270 --> 00:43:05,950 And for each disease-- 750 00:43:05,950 --> 00:43:09,130 for each manifestation in addition to the data 751 00:43:09,130 --> 00:43:12,610 you see here, there is also an important measure 752 00:43:12,610 --> 00:43:17,220 that says how critical is it to explain this particular symptom 753 00:43:17,220 --> 00:43:21,280 or sign or lab value in the final diagnosis. 754 00:43:21,280 --> 00:43:24,190 So for example, if you have a headache, 755 00:43:24,190 --> 00:43:25,990 that could be incidental and it's not 756 00:43:25,990 --> 00:43:28,330 that important to explain it. 757 00:43:28,330 --> 00:43:33,760 If you're bleeding from your gastrointestinal system, 758 00:43:33,760 --> 00:43:36,280 that's really important to explain. 759 00:43:36,280 --> 00:43:38,710 And you wouldn't expect a diagnosis 760 00:43:38,710 --> 00:43:41,440 of that patient that doesn't explain to you why 761 00:43:41,440 --> 00:43:43,180 they have that symptom. 762 00:43:43,180 --> 00:43:49,030 And then here is an example of alcoholic hepatitis. 763 00:43:49,030 --> 00:43:53,800 And the two numbers here are a so-called evoking strength 764 00:43:53,800 --> 00:43:55,210 and a frequency. 765 00:43:55,210 --> 00:43:57,010 These are both on scales-- 766 00:43:57,010 --> 00:44:01,090 well, evoking strength is on a scale of 0 to 5, 767 00:44:01,090 --> 00:44:04,552 and frequency is on a scale of 1 to 5. 768 00:44:04,552 --> 00:44:07,840 And I'll show you what those are supposed to mean. 769 00:44:07,840 --> 00:44:10,330 And so, for example, what this says 770 00:44:10,330 --> 00:44:14,680 is that if you're anorexic, that should not 771 00:44:14,680 --> 00:44:20,480 make you think about alcoholic hepatitis as a disease. 772 00:44:20,480 --> 00:44:23,530 But you should expect that if somebody 773 00:44:23,530 --> 00:44:28,880 has alcoholic hepatitis, they're very likely to have anorexia. 774 00:44:28,880 --> 00:44:30,790 So that's the frequency number. 775 00:44:30,790 --> 00:44:32,980 This is the evoking strength number. 776 00:44:32,980 --> 00:44:35,890 And you see that there is a variety of those. 777 00:44:35,890 --> 00:44:40,360 So much of that many, many years of effort 778 00:44:40,360 --> 00:44:42,550 went into coming up with these lists 779 00:44:42,550 --> 00:44:45,130 and coming up with those numbers. 780 00:44:45,130 --> 00:44:46,540 Here are the scales. 781 00:44:46,540 --> 00:44:49,440 So the evoking strength-- 782 00:44:49,440 --> 00:44:51,460 0 means nonspecific. 783 00:44:51,460 --> 00:44:53,380 5 means its pathognomonic. 784 00:44:53,380 --> 00:44:55,810 In other words, just seeing the symptom 785 00:44:55,810 --> 00:44:58,750 is enough to convince you that the patient must 786 00:44:58,750 --> 00:45:00,890 have this disease. 787 00:45:00,890 --> 00:45:05,680 Similarly, frequency 1 means it occurs rarely, 788 00:45:05,680 --> 00:45:09,490 and 5 means that it occurs in essentially all cases 789 00:45:09,490 --> 00:45:12,460 with scaled values in between. 790 00:45:12,460 --> 00:45:15,400 And these are kind of like odds ratios. 791 00:45:15,400 --> 00:45:19,750 And they add them kind of as if they were log likelihood 792 00:45:19,750 --> 00:45:20,470 ratios. 793 00:45:20,470 --> 00:45:23,290 And so there's been a big literature 794 00:45:23,290 --> 00:45:27,310 on trying to figure out exactly what these numbers mean, 795 00:45:27,310 --> 00:45:30,640 because there's no formal definition in terms of you 796 00:45:30,640 --> 00:45:34,060 count the number of this and divide by the number of that, 797 00:45:34,060 --> 00:45:36,530 and that gives you the right answer. 798 00:45:36,530 --> 00:45:39,730 These were sort of the impressionistic kinds 799 00:45:39,730 --> 00:45:40,345 of numbers. 800 00:45:42,860 --> 00:45:50,560 So the logic in the system was that you would come to it 801 00:45:50,560 --> 00:45:54,820 and give it a list of the manifestations of a case. 802 00:45:54,820 --> 00:45:59,090 And to their credit, they went after very complicated cases. 803 00:45:59,090 --> 00:46:02,750 So they took clinical pathologic conference cases 804 00:46:02,750 --> 00:46:05,030 from The New England Journal of Medicine. 805 00:46:05,030 --> 00:46:09,140 These are cases selected to be difficult enough that doctors 806 00:46:09,140 --> 00:46:11,000 are willing to read these. 807 00:46:11,000 --> 00:46:15,200 And they're typically presented at Grand Rounds at MGH 808 00:46:15,200 --> 00:46:18,470 by somebody who is often stumped by the case. 809 00:46:18,470 --> 00:46:22,430 So it's an opportunity to watch people reason interactively 810 00:46:22,430 --> 00:46:25,050 about these things. 811 00:46:25,050 --> 00:46:28,820 And so you evoke the diagnoses that 812 00:46:28,820 --> 00:46:32,870 have a high evoking strength from the giving manifestations. 813 00:46:32,870 --> 00:46:35,390 And then you do a scoring calculation 814 00:46:35,390 --> 00:46:37,340 based on those numbers. 815 00:46:37,340 --> 00:46:39,920 The details of this are probably all wrong, 816 00:46:39,920 --> 00:46:42,950 but that's the way they went about it. 817 00:46:42,950 --> 00:46:45,770 And then you form a differential around the highest 818 00:46:45,770 --> 00:46:47,660 scoring diagnosis. 819 00:46:47,660 --> 00:46:50,460 Now, this is actually an interesting idea. 820 00:46:50,460 --> 00:46:54,600 It's a heuristic idea, but it's one that worked pretty well. 821 00:46:54,600 --> 00:46:57,210 So suppose I have two diseases. 822 00:46:57,210 --> 00:47:01,250 D1 can cause manifestations 1 through 4. 823 00:47:01,250 --> 00:47:03,365 And D2 can cause 3 through 6. 824 00:47:05,990 --> 00:47:12,110 So are these competing to explain the same case or could 825 00:47:12,110 --> 00:47:14,870 they be complementary? 826 00:47:14,870 --> 00:47:17,270 Well, until we know what symptoms the patient 827 00:47:17,270 --> 00:47:19,790 actually has, we don't know. 828 00:47:19,790 --> 00:47:21,060 But let's trace through this. 829 00:47:21,060 --> 00:47:25,010 So suppose I tell you that the patient has manifestations 830 00:47:25,010 --> 00:47:27,280 3 and 4. 831 00:47:27,280 --> 00:47:28,460 OK. 832 00:47:28,460 --> 00:47:31,820 Well, you would say, there is no reason 833 00:47:31,820 --> 00:47:35,600 to think that the patient may have both diseases, 834 00:47:35,600 --> 00:47:39,690 because either of them can explain those manifestations, 835 00:47:39,690 --> 00:47:40,310 right? 836 00:47:40,310 --> 00:47:42,680 So you would consider them to be competitors. 837 00:47:45,860 --> 00:47:49,100 What about if I add M1? 838 00:47:49,100 --> 00:47:52,310 So here, it's getting a little dicier. 839 00:47:52,310 --> 00:47:56,430 Now you're more likely to think that it's D1. 840 00:47:56,430 --> 00:48:00,440 But if it's D1, that could explain all the manifestations, 841 00:48:00,440 --> 00:48:05,210 and D2 is still viewable as a competitor. 842 00:48:05,210 --> 00:48:08,900 On the other hand, if I also add M6, 843 00:48:08,900 --> 00:48:12,890 now neither disease can explain all the manifestations. 844 00:48:12,890 --> 00:48:17,570 And so it's more likely, somewhat more likely, 845 00:48:17,570 --> 00:48:21,050 that there may be two diseases present. 846 00:48:21,050 --> 00:48:25,320 So what Internist had was this interesting heuristic, 847 00:48:25,320 --> 00:48:31,490 which said that when you get that complementary situation, 848 00:48:31,490 --> 00:48:36,030 you form a differential around the top ranked hypothesis. 849 00:48:36,030 --> 00:48:40,430 In other words, you retain all those diseases that 850 00:48:40,430 --> 00:48:43,830 compete with that hypothesis. 851 00:48:43,830 --> 00:48:46,460 And that defines a subproblem that 852 00:48:46,460 --> 00:48:49,700 looks like the acute renal failure problem, 853 00:48:49,700 --> 00:48:53,030 because now you have one set of factors 854 00:48:53,030 --> 00:48:56,900 that you're trying to explain by one disease. 855 00:48:56,900 --> 00:49:00,680 And you set aside all of the other manifestations 856 00:49:00,680 --> 00:49:02,390 and all of the other diseases that 857 00:49:02,390 --> 00:49:04,670 are potentially complementary. 858 00:49:04,670 --> 00:49:06,890 And you don't worry about them for the moment. 859 00:49:06,890 --> 00:49:10,670 Just focus on this cluster of things 860 00:49:10,670 --> 00:49:13,940 that are competitors to explain some subset 861 00:49:13,940 --> 00:49:16,610 of the manifestations. 862 00:49:16,610 --> 00:49:21,560 And then there are different questioning strategies. 863 00:49:21,560 --> 00:49:25,160 So depending on the scores within these things, 864 00:49:25,160 --> 00:49:28,070 if one of those diseases has a very high score 865 00:49:28,070 --> 00:49:30,890 and the others have relatively low scores, 866 00:49:30,890 --> 00:49:34,890 you would choose a pursue strategy that says, 867 00:49:34,890 --> 00:49:37,730 OK, I'm interested in asking questions 868 00:49:37,730 --> 00:49:40,280 that will more likely convince me 869 00:49:40,280 --> 00:49:44,850 of the correctness of that leading hypothesis. 870 00:49:44,850 --> 00:49:48,500 So you look for the things that it predicts strongly. 871 00:49:48,500 --> 00:49:53,210 If you have a very large list in the differential, 872 00:49:53,210 --> 00:49:55,550 you might say, I'm going to try to reduce 873 00:49:55,550 --> 00:49:59,360 the size of the differential by looking for things that 874 00:49:59,360 --> 00:50:03,890 are likely in some of the less likely hypotheses 875 00:50:03,890 --> 00:50:07,910 so that I can rule them out if that thing is not present. 876 00:50:07,910 --> 00:50:09,470 So different strategies. 877 00:50:09,470 --> 00:50:13,030 And I'll come back to that in a few minutes. 878 00:50:13,030 --> 00:50:18,710 So their test, of course, based on their own evaluation 879 00:50:18,710 --> 00:50:19,550 was terrific. 880 00:50:19,550 --> 00:50:21,590 It did wonderfully well. 881 00:50:21,590 --> 00:50:23,870 The paper got published in The New England Journal 882 00:50:23,870 --> 00:50:27,080 of Medicine, which was an unbelievable breakthrough 883 00:50:27,080 --> 00:50:30,350 to have an AI program that the editors of The New England 884 00:50:30,350 --> 00:50:34,490 Journal considered interesting. 885 00:50:34,490 --> 00:50:39,080 Now, unfortunately, it didn't hold up very well. 886 00:50:39,080 --> 00:50:42,380 And so there was this paper by Eta Berner 887 00:50:42,380 --> 00:50:46,530 and her colleagues in 1994 where they evaluated 888 00:50:46,530 --> 00:50:49,260 QMR and three other programs. 889 00:50:49,260 --> 00:50:53,020 DXplain is very similar in structure to QMR. 890 00:50:53,020 --> 00:50:57,070 Iliad and Meditel are Bayesian network, 891 00:50:57,070 --> 00:51:01,670 or almost naive Bayesian types of models 892 00:51:01,670 --> 00:51:03,740 developed by other groups. 893 00:51:03,740 --> 00:51:07,890 And they looked for results, which is coverage. 894 00:51:07,890 --> 00:51:14,300 So what fraction of the real diagnoses in these 105 cases 895 00:51:14,300 --> 00:51:18,800 that they chose to test on could any of these programs actually 896 00:51:18,800 --> 00:51:19,860 diagnose? 897 00:51:19,860 --> 00:51:23,150 So if the program didn't know about a certain disease, 898 00:51:23,150 --> 00:51:26,750 then obviously it wasn't going to get it right. 899 00:51:26,750 --> 00:51:30,170 And then they said, OK, of the program's diagnoses, 900 00:51:30,170 --> 00:51:33,560 what fraction were considered correct by the experts? 901 00:51:33,560 --> 00:51:36,680 What was the rank order of that correct diagnosis 902 00:51:36,680 --> 00:51:39,620 among the list of diagnoses that the program gave? 903 00:51:42,500 --> 00:51:47,000 The experts were asked to list all the plausible diagnoses 904 00:51:47,000 --> 00:51:49,010 from these cases. 905 00:51:49,010 --> 00:51:53,570 What fraction of those showed up in the program's top 20? 906 00:51:53,570 --> 00:51:58,520 And then did the program have any value added by coming up 907 00:51:58,520 --> 00:52:01,610 with things that the experts had not thought about, 908 00:52:01,610 --> 00:52:03,410 but that they agreed when they saw 909 00:52:03,410 --> 00:52:08,180 them were reasonable explanations for this case? 910 00:52:08,180 --> 00:52:10,130 So here are the results. 911 00:52:10,130 --> 00:52:18,890 And what you see is that the diagnoses in these 105 test 912 00:52:18,890 --> 00:52:25,460 cases, 91% of them appeared in the DXplain program, 913 00:52:25,460 --> 00:52:30,310 but, for example, only 73% of them in the QMR program. 914 00:52:30,310 --> 00:52:32,110 So that means that right off the bat 915 00:52:32,110 --> 00:52:36,620 it's missing about a quarter of the possible cases. 916 00:52:36,620 --> 00:52:39,160 And then if you look at correct diagnosis, 917 00:52:39,160 --> 00:52:45,370 you're seeing numbers like 0.69, 0.61, 0.71, et cetera. 918 00:52:45,370 --> 00:52:52,120 So these are-- it's like the dog who sings, but badly, right? 919 00:52:52,120 --> 00:52:54,670 It's remarkable that it can sing at all, 920 00:52:54,670 --> 00:52:57,250 but it's not something you want to listen to. 921 00:53:00,220 --> 00:53:06,550 And then rank of the correct diagnosis in the program 922 00:53:06,550 --> 00:53:10,400 is at like 12 or 10 or 13 or so on. 923 00:53:10,400 --> 00:53:15,740 So it is in the top 20, but it's not at the top of the top 20. 924 00:53:15,740 --> 00:53:19,870 So the results were a bit disappointing. 925 00:53:19,870 --> 00:53:24,310 And depending on where you put the cut off, 926 00:53:24,310 --> 00:53:29,920 you get the proportion of cases where a correct diagnosis is 927 00:53:29,920 --> 00:53:31,810 within the top end. 928 00:53:31,810 --> 00:53:37,060 And you see that at 20, you're up at a little over 0.5 929 00:53:37,060 --> 00:53:39,280 for most of these programs. 930 00:53:39,280 --> 00:53:43,870 And it gets better if you extend the list to longer and longer. 931 00:53:43,870 --> 00:53:47,320 Of course, if you extended the list to 100, 932 00:53:47,320 --> 00:53:51,190 then you reach 100%, but it wouldn't be practically very 933 00:53:51,190 --> 00:53:52,013 useful. 934 00:53:52,013 --> 00:53:54,089 AUDIENCE: Why didn't they somehow compare it 935 00:53:54,089 --> 00:53:55,586 to the human decision? 936 00:53:58,275 --> 00:53:59,900 PETER SZOLOVITS: Well, so first of all, 937 00:53:59,900 --> 00:54:03,260 they assumed that their experts were perfect. 938 00:54:03,260 --> 00:54:06,150 So they were the gold standard. 939 00:54:06,150 --> 00:54:09,965 So they were comparing it to a human in a way. 940 00:54:09,965 --> 00:54:10,590 AUDIENCE: Yeah. 941 00:54:14,597 --> 00:54:15,430 PETER SZOLOVITS: OK. 942 00:54:15,430 --> 00:54:20,350 So the bottom line is that although the sensitivity 943 00:54:20,350 --> 00:54:23,190 and specificity were not impressive, 944 00:54:23,190 --> 00:54:25,780 the programs were potentially useful, 945 00:54:25,780 --> 00:54:29,650 because they had interactive displays of signs and symptoms 946 00:54:29,650 --> 00:54:31,830 associated with diseases. 947 00:54:31,830 --> 00:54:33,790 They could give you the relative likelihood 948 00:54:33,790 --> 00:54:36,200 of various diagnoses. 949 00:54:36,200 --> 00:54:39,700 And they concluded that they needed 950 00:54:39,700 --> 00:54:43,510 to study the effects of whether a program like this 951 00:54:43,510 --> 00:54:47,680 actually helped a doctor perform medicine better. 952 00:54:47,680 --> 00:54:50,750 So just here's an example. 953 00:54:50,750 --> 00:54:52,970 I did a reconstruction of this program. 954 00:54:52,970 --> 00:54:55,810 This is the kind of exploration you could say. 955 00:54:55,810 --> 00:55:00,430 So if you click on angina pectoris, 956 00:55:00,430 --> 00:55:02,950 here are the findings that are associated with it. 957 00:55:02,950 --> 00:55:05,590 So you can browse through its database. 958 00:55:05,590 --> 00:55:10,360 You can type in an example case, or select an example case. 959 00:55:10,360 --> 00:55:14,050 So this is one of those clinical pathological conference 960 00:55:14,050 --> 00:55:17,920 cases, and then the manifestations 961 00:55:17,920 --> 00:55:20,620 that are present and absent, and then you 962 00:55:20,620 --> 00:55:23,600 can get an interpretation that says, 963 00:55:23,600 --> 00:55:26,920 OK, this is our differential. 964 00:55:26,920 --> 00:55:30,760 And these are the complementary hypotheses. 965 00:55:30,760 --> 00:55:35,350 And therefore these are the manifestations 966 00:55:35,350 --> 00:55:40,990 that we set aside, whereas these are the ones explained 967 00:55:40,990 --> 00:55:42,580 by that set of diseases. 968 00:55:42,580 --> 00:55:49,210 And so you could watch how the program does its reasoning. 969 00:55:49,210 --> 00:55:53,050 Well, then a group at Stanford came along 970 00:55:53,050 --> 00:55:56,140 when belief networks or Bayesian networks 971 00:55:56,140 --> 00:55:59,110 were created, and said, hey, why don't we 972 00:55:59,110 --> 00:56:04,960 treat this database as if it were a Bayesian network 973 00:56:04,960 --> 00:56:08,650 and see if we can evaluate things that way? 974 00:56:08,650 --> 00:56:11,630 So they had to fill in a lot of details. 975 00:56:11,630 --> 00:56:17,200 They wound up using the QMR database 976 00:56:17,200 --> 00:56:19,560 with a binary interpretation. 977 00:56:19,560 --> 00:56:21,980 So a disease was present or absent. 978 00:56:21,980 --> 00:56:25,510 The manifestation was present or absent. 979 00:56:25,510 --> 00:56:28,900 They used causal independence, or a leaky noisy-OR, 980 00:56:28,900 --> 00:56:32,140 which I think you've seen in other contexts. 981 00:56:32,140 --> 00:56:37,510 So this just says if there are multiple independent causes 982 00:56:37,510 --> 00:56:41,350 of something, how likely is it to happen depending on which 983 00:56:41,350 --> 00:56:43,970 of those is present or not. 984 00:56:43,970 --> 00:56:49,060 And there is a simplified way of doing that calculation, which 985 00:56:49,060 --> 00:56:52,300 corresponds to sort of causal independence 986 00:56:52,300 --> 00:56:56,980 and is computationally reasonably fast to do. 987 00:56:56,980 --> 00:57:03,250 And then they also estimated priors on the various diagnoses 988 00:57:03,250 --> 00:57:06,970 from national health statistics, because the original data 989 00:57:06,970 --> 00:57:08,540 did not have prior data-- 990 00:57:08,540 --> 00:57:10,330 priors. 991 00:57:10,330 --> 00:57:14,110 They wound up not using the evoking strengths, 992 00:57:14,110 --> 00:57:17,560 because they were doing a pretty straight Bayesian 993 00:57:17,560 --> 00:57:21,730 model where all you need is the priors and the conditionals. 994 00:57:21,730 --> 00:57:26,620 They took the frequency as a kind of scaled conditional, 995 00:57:26,620 --> 00:57:29,230 and then built a system based on that. 996 00:57:29,230 --> 00:57:31,520 And I'll just show you the results. 997 00:57:31,520 --> 00:57:35,830 So they took a bunch of Scientific American medicine 998 00:57:35,830 --> 00:57:42,880 cases and said, what are the ranks assigned to the reference 999 00:57:42,880 --> 00:57:46,150 diagnoses of these 23 cases? 1000 00:57:46,150 --> 00:57:49,030 And you see that like in case number one, 1001 00:57:49,030 --> 00:57:53,920 QMR ranked the correct solution as number six, 1002 00:57:53,920 --> 00:57:59,320 but their two methods, TB and iterative TB 1003 00:57:59,320 --> 00:58:01,180 ranked it as number one. 1004 00:58:01,180 --> 00:58:07,030 And then these are attempts to do a kind of ablation analysis 1005 00:58:07,030 --> 00:58:10,750 to see how well the program works if you take away 1006 00:58:10,750 --> 00:58:13,670 various of its clever features. 1007 00:58:13,670 --> 00:58:17,170 But what you see is that it works reasonably well, 1008 00:58:17,170 --> 00:58:19,210 except for a few cases. 1009 00:58:19,210 --> 00:58:26,390 So case number 23, all variants of the program did badly. 1010 00:58:26,390 --> 00:58:28,510 And then they excused themselves and said, 1011 00:58:28,510 --> 00:58:31,210 well, there's actually a generalization 1012 00:58:31,210 --> 00:58:34,750 of the disease that was in the Scientific American medicine 1013 00:58:34,750 --> 00:58:38,580 conclusion, which the programs did find, 1014 00:58:38,580 --> 00:58:41,980 and so that would have been number one across the board. 1015 00:58:41,980 --> 00:58:45,070 So they can sort of make a kind of handwavy argument 1016 00:58:45,070 --> 00:58:47,840 that it really got that one right. 1017 00:58:47,840 --> 00:58:50,290 And so these were pretty good. 1018 00:58:50,290 --> 00:58:58,030 And so this validated the idea of using this model 1019 00:58:58,030 --> 00:58:59,740 in that way. 1020 00:58:59,740 --> 00:59:05,960 Now, today you can go out and go to your favorite Google App 1021 00:59:05,960 --> 00:59:11,080 store or Apple's app store or anybody's app store 1022 00:59:11,080 --> 00:59:16,640 and download tons and tons and tons of symptom checkers. 1023 00:59:16,640 --> 00:59:23,620 So I wanted to give you a demo of one of these if it works. 1024 00:59:23,620 --> 00:59:24,680 OK. 1025 00:59:24,680 --> 00:59:28,220 So I was playing earlier with having abdominal pain 1026 00:59:28,220 --> 00:59:29,690 and headache. 1027 00:59:29,690 --> 00:59:33,980 So let's start a new one. 1028 00:59:33,980 --> 00:59:37,070 So type in how you're feeling today. 1029 00:59:37,070 --> 00:59:40,910 Should we have a cough, or runny nose, abdominal pain, fever, 1030 00:59:40,910 --> 00:59:44,085 sore throat, headache, back pain, fatigue, diarrhea, 1031 00:59:44,085 --> 00:59:44,585 or phlegm? 1032 00:59:47,120 --> 00:59:47,630 Phlegm? 1033 00:59:47,630 --> 00:59:48,680 Phlegm is the winner. 1034 00:59:51,530 --> 00:59:54,703 Phlegm is like coughing up crap in your throat. 1035 00:59:54,703 --> 00:59:58,960 AUDIENCE: Oh, luckily, they visualize it. 1036 00:59:58,960 --> 01:00:00,380 PETER SZOLOVITS: Right. 1037 01:00:00,380 --> 01:00:02,590 So tell me about your phlegm. 1038 01:00:02,590 --> 01:00:04,658 When did it start? 1039 01:00:04,658 --> 01:00:05,875 AUDIENCE: Last week. 1040 01:00:05,875 --> 01:00:07,000 PETER SZOLOVITS: Last week? 1041 01:00:07,000 --> 01:00:07,500 OK. 1042 01:00:14,870 --> 01:00:17,810 I signed in as Paul, because I didn't 1043 01:00:17,810 --> 01:00:21,830 want to be associated with any of this data. 1044 01:00:21,830 --> 01:00:25,790 So was the phlegm bloody or pus-like or watery 1045 01:00:25,790 --> 01:00:26,720 or none of the above? 1046 01:00:29,558 --> 01:00:30,980 AUDIENCE: None of the above. 1047 01:00:30,980 --> 01:00:32,438 PETER SZOLOVITS: None of the above. 1048 01:00:32,438 --> 01:00:33,672 So what was it like? 1049 01:00:33,672 --> 01:00:34,630 AUDIENCE: I don't know. 1050 01:00:34,630 --> 01:00:36,920 Paul? 1051 01:00:36,920 --> 01:00:38,756 PETER SZOLOVITS: Is it any of these colors? 1052 01:00:38,756 --> 01:00:40,890 AUDIENCE: Green. 1053 01:00:40,890 --> 01:00:44,230 PETER SZOLOVITS: I think I'll make it yellow. 1054 01:00:44,230 --> 01:00:44,730 Next. 1055 01:00:47,520 --> 01:00:50,340 Does it happen in the morning, midday, evening, nighttime, 1056 01:00:50,340 --> 01:00:51,930 or a specific time of year? 1057 01:00:51,930 --> 01:00:53,680 AUDIENCE: Specific time of year. 1058 01:00:53,680 --> 01:00:54,305 AUDIENCE: Yeah. 1059 01:00:54,305 --> 01:00:54,780 Specific time of year. 1060 01:00:54,780 --> 01:00:56,405 PETER SZOLOVITS: Specific time of year. 1061 01:01:00,840 --> 01:01:03,996 And does lying down or physical activity make it worse? 1062 01:01:03,996 --> 01:01:05,704 AUDIENCE: Well, it's generally not worse. 1063 01:01:05,704 --> 01:01:08,030 So that's physical activity. 1064 01:01:08,030 --> 01:01:09,488 PETER SZOLOVITS: Physical activity. 1065 01:01:13,240 --> 01:01:16,073 How often is this a problem? 1066 01:01:16,073 --> 01:01:16,615 I don't know. 1067 01:01:16,615 --> 01:01:18,110 A couple times a week maybe. 1068 01:01:22,150 --> 01:01:26,368 Did eating suspect food trigger your phlegm? 1069 01:01:26,368 --> 01:01:27,340 AUDIENCE: No. 1070 01:01:27,340 --> 01:01:28,030 PETER SZOLOVITS: I don't know. 1071 01:01:28,030 --> 01:01:29,620 I don't know what a suspect food is. 1072 01:01:29,620 --> 01:01:31,493 AUDIENCE: [INAUDIBLE] food. 1073 01:01:31,493 --> 01:01:32,410 PETER SZOLOVITS: Yeah. 1074 01:01:32,410 --> 01:01:36,358 This is going to kill most of my time. 1075 01:01:36,358 --> 01:01:39,945 AUDIENCE: Is it getting better? 1076 01:01:39,945 --> 01:01:41,320 PETER SZOLOVITS: Is it improving? 1077 01:01:41,320 --> 01:01:42,985 Sure, it's improving. 1078 01:01:46,450 --> 01:01:48,640 Can I think of another related symptom? 1079 01:01:48,640 --> 01:01:49,780 No. 1080 01:01:49,780 --> 01:01:54,595 I'm comparing your case to men aged 66 to 72. 1081 01:01:54,595 --> 01:01:58,030 A number of similar cases gets more refined. 1082 01:01:58,030 --> 01:02:00,220 Do I have shortness of breath? 1083 01:02:00,220 --> 01:02:02,470 No. 1084 01:02:02,470 --> 01:02:03,200 That's good. 1085 01:02:03,200 --> 01:02:03,700 All right. 1086 01:02:03,700 --> 01:02:04,990 Do I have a runny nose? 1087 01:02:04,990 --> 01:02:05,620 Yeah, sure. 1088 01:02:05,620 --> 01:02:06,670 I have a runny nose. 1089 01:02:10,900 --> 01:02:13,435 It's-- I don't know-- a watery, runny nose. 1090 01:02:17,482 --> 01:02:20,862 AUDIENCE: Does it say you've got to call [INAUDIBLE]?? 1091 01:02:20,862 --> 01:02:22,570 PETER SZOLOVITS: Well, I'm going to stop, 1092 01:02:22,570 --> 01:02:23,950 because it will just take-- 1093 01:02:23,950 --> 01:02:27,830 it takes too long to go through this, but you get the idea. 1094 01:02:27,830 --> 01:02:29,770 So what this is doing is actually 1095 01:02:29,770 --> 01:02:32,200 running an algorithm that is a cousin 1096 01:02:32,200 --> 01:02:36,010 of the acute renal failure algorithm that I showed you. 1097 01:02:36,010 --> 01:02:40,030 So it's trying to optimize the questions that it's asking, 1098 01:02:40,030 --> 01:02:45,400 and it's trying to come up with a diagnostic conclusion. 1099 01:02:45,400 --> 01:02:47,470 Now, in order not to get in trouble 1100 01:02:47,470 --> 01:02:49,660 with things like the FDA, it winds up 1101 01:02:49,660 --> 01:02:52,330 wimping out at the end, and it says, 1102 01:02:52,330 --> 01:02:56,740 if you're feeling really bad, go see a doctor. 1103 01:02:56,740 --> 01:03:01,660 But nevertheless, these kinds of things are now becoming real, 1104 01:03:01,660 --> 01:03:03,760 and they're getting better because they're 1105 01:03:03,760 --> 01:03:06,180 based on more and more data. 1106 01:03:06,180 --> 01:03:06,750 Yeah. 1107 01:03:06,750 --> 01:03:09,413 AUDIENCE: [INAUDIBLE] 1108 01:03:09,413 --> 01:03:11,330 PETER SZOLOVITS: Well, I can't get to the end, 1109 01:03:11,330 --> 01:03:12,720 because we're only at 36%. 1110 01:03:12,720 --> 01:03:14,910 [INTERPOSING VOICES] 1111 01:03:14,910 --> 01:03:15,410 Yeah. 1112 01:03:15,410 --> 01:03:18,050 Here. 1113 01:03:18,050 --> 01:03:19,280 All right. 1114 01:03:19,280 --> 01:03:21,156 Somebody-- 1115 01:03:21,156 --> 01:03:22,890 AUDIENCE: Oh, I think I need your finger. 1116 01:03:22,890 --> 01:03:25,600 PETER SZOLOVITS: Oh. 1117 01:03:25,600 --> 01:03:27,380 OK. 1118 01:03:27,380 --> 01:03:28,885 Just don't drain my bank account. 1119 01:03:33,510 --> 01:03:37,040 So The British Medical Journal did a test 1120 01:03:37,040 --> 01:03:40,430 of a bunch of symptom checkers, of 23 symptom checkers 1121 01:03:40,430 --> 01:03:43,170 like this about four years ago. 1122 01:03:43,170 --> 01:03:49,610 And they said, well, can it on 45 standardized patient 1123 01:03:49,610 --> 01:03:53,960 vignettes can it find at least the right level of urgency 1124 01:03:53,960 --> 01:03:57,560 to recommend whether you should go to the emergency room, 1125 01:03:57,560 --> 01:04:01,980 get other kinds of care, or just take care of yourself. 1126 01:04:01,980 --> 01:04:06,320 And then the goals were that if the diagnosis is 1127 01:04:06,320 --> 01:04:10,140 given by the program, it should be in the top 20 of the list 1128 01:04:10,140 --> 01:04:11,060 that it gives you. 1129 01:04:11,060 --> 01:04:13,490 And if triage is given, then it should 1130 01:04:13,490 --> 01:04:15,800 be the right level of urgency. 1131 01:04:15,800 --> 01:04:20,540 The correct diagnosis was first in 34% of the cases. 1132 01:04:20,540 --> 01:04:25,160 It was within the top 20 in 58% of the cases. 1133 01:04:25,160 --> 01:04:30,230 And the correct triage was 57% accurate. 1134 01:04:30,230 --> 01:04:33,830 But notice it was more accurate in the emergent cases, which 1135 01:04:33,830 --> 01:04:38,180 is good, because those are the ones where you really care. 1136 01:04:38,180 --> 01:04:40,430 So we have-- 1137 01:04:40,430 --> 01:04:41,480 OK. 1138 01:04:41,480 --> 01:04:46,220 So based on what he said about me, 1139 01:04:46,220 --> 01:04:51,770 I have an upper respiratory infection with 50% likelihood. 1140 01:04:51,770 --> 01:04:57,440 And I can ask what to do next. 1141 01:05:00,050 --> 01:05:03,770 Watch for symptoms like sore throat and fever. 1142 01:05:03,770 --> 01:05:07,160 Physicians often perform a physical exam, 1143 01:05:07,160 --> 01:05:11,000 explore other treatment options, and recovery 1144 01:05:11,000 --> 01:05:16,790 for most cases like this is a matter of days to weeks. 1145 01:05:16,790 --> 01:05:20,780 And I can go back and say, I might have the flu, 1146 01:05:20,780 --> 01:05:24,560 or I might have allergic rhinitis. 1147 01:05:24,560 --> 01:05:27,080 So that's actually reasonable. 1148 01:05:27,080 --> 01:05:30,425 I don't know exactly what you put in about me. 1149 01:05:30,425 --> 01:05:31,950 AUDIENCE: What is the less than 50? 1150 01:05:31,950 --> 01:05:33,200 PETER SZOLOVITS: What is what? 1151 01:05:33,200 --> 01:05:35,980 AUDIENCE: The less than 50. 1152 01:05:35,980 --> 01:05:37,970 [INTERPOSING VOICES] 1153 01:05:37,970 --> 01:05:40,743 AUDIENCE: Patients have to be the same demographics. 1154 01:05:40,743 --> 01:05:41,660 PETER SZOLOVITS: Yeah. 1155 01:05:41,660 --> 01:05:44,430 I don't know what the less than 50 is supposed to mean. 1156 01:05:44,430 --> 01:05:48,280 AUDIENCE: It started with 200,000 or so. 1157 01:05:48,280 --> 01:05:50,150 PETER SZOLOVITS: Oh, so this is based 1158 01:05:50,150 --> 01:05:52,470 on a small number of patients. 1159 01:05:52,470 --> 01:05:55,430 So what happens, of course, is as you slice and dice 1160 01:05:55,430 --> 01:05:58,600 a population, it gets smaller and smaller. 1161 01:05:58,600 --> 01:06:01,440 So that's what we're seeing. 1162 01:06:01,440 --> 01:06:02,300 OK. 1163 01:06:02,300 --> 01:06:03,740 Thank you. 1164 01:06:03,740 --> 01:06:05,780 OK. 1165 01:06:05,780 --> 01:06:09,890 So two more topics I'm going to rush through. 1166 01:06:09,890 --> 01:06:18,530 One is that-- as I mentioned in one of the much earlier slides, 1167 01:06:18,530 --> 01:06:21,550 every action has a cost. 1168 01:06:21,550 --> 01:06:23,580 It at least takes time. 1169 01:06:23,580 --> 01:06:27,230 And sometimes it induces potentially bad things 1170 01:06:27,230 --> 01:06:29,730 to happen to a patient. 1171 01:06:29,730 --> 01:06:34,490 And so people began studying a long time ago what does 1172 01:06:34,490 --> 01:06:37,280 it mean to be rational under resource constraints 1173 01:06:37,280 --> 01:06:41,720 rather than rational just in this Home economicus model. 1174 01:06:41,720 --> 01:06:46,290 And so Eric Horvitz, who's now a big cheese guy, 1175 01:06:46,290 --> 01:06:50,600 he's head of Microsoft Research, but used 1176 01:06:50,600 --> 01:06:54,100 to be just a lowly graduate student at Stanford 1177 01:06:54,100 --> 01:06:58,806 when he started doing this work. 1178 01:06:58,806 --> 01:07:01,730 He said, well, utility comes not only 1179 01:07:01,730 --> 01:07:03,410 from what happens to the patient, 1180 01:07:03,410 --> 01:07:05,990 but also from the reasoning process 1181 01:07:05,990 --> 01:07:08,750 from the computational process itself. 1182 01:07:08,750 --> 01:07:12,030 And so consider-- do you guys watch MacGyver? 1183 01:07:12,030 --> 01:07:14,630 This is way out of date. 1184 01:07:14,630 --> 01:07:19,640 So if MacGyver is defusing some bomb that's ticking down 1185 01:07:19,640 --> 01:07:24,890 to zero and he runs out of time, then his utilities 1186 01:07:24,890 --> 01:07:28,290 take a very sharp drop at that point. 1187 01:07:28,290 --> 01:07:33,650 So that's what this work is really about, saying, well, 1188 01:07:33,650 --> 01:07:36,860 what can we do when we don't have all the time in the world 1189 01:07:36,860 --> 01:07:41,930 to do the computation as well as having to try to maximize 1190 01:07:41,930 --> 01:07:44,000 utility to the patient? 1191 01:07:44,000 --> 01:07:47,150 And Daniel Kahneman, who won the Nobel Prize a few years ago 1192 01:07:47,150 --> 01:07:52,230 in economics for this notion of bounded rationality 1193 01:07:52,230 --> 01:07:56,270 that says that the way we would like to be rational 1194 01:07:56,270 --> 01:07:59,150 is not actually the way we behave, and he 1195 01:07:59,150 --> 01:08:01,850 wrote this popular book that I really like 1196 01:08:01,850 --> 01:08:05,000 called Thinking, Fast and Slow that 1197 01:08:05,000 --> 01:08:10,040 says that if you're trying to figure out which house to buy, 1198 01:08:10,040 --> 01:08:14,240 you have a lot of time to do it, so you can deliberate and list 1199 01:08:14,240 --> 01:08:17,689 all the advantages and disadvantages and costs and so 1200 01:08:17,689 --> 01:08:22,700 on of different houses and take your time making a decision. 1201 01:08:22,700 --> 01:08:26,870 If you see a car barreling toward you as you are crossing 1202 01:08:26,870 --> 01:08:30,120 in a crosswalk, you don't stop and say, well, 1203 01:08:30,120 --> 01:08:34,200 let me figure out the pluses and minuses of moving to the left 1204 01:08:34,200 --> 01:08:39,410 or moving to the right, because by the time you figure it out, 1205 01:08:39,410 --> 01:08:40,580 you're dead. 1206 01:08:40,580 --> 01:08:45,890 And so he claims that human beings have evolved 1207 01:08:45,890 --> 01:08:51,890 in a way where we have a kind of instinctual very fast response, 1208 01:08:51,890 --> 01:08:55,729 and that the deliberative process is only 1209 01:08:55,729 --> 01:08:57,890 invoked relatively rarely. 1210 01:08:57,890 --> 01:09:00,229 Now, he bemoans this fact, because he 1211 01:09:00,229 --> 01:09:03,890 claims that people make too many decisions that they ought 1212 01:09:03,890 --> 01:09:08,420 to be deliberative about based on these sort of gut instincts. 1213 01:09:08,420 --> 01:09:11,720 For example, our current president. 1214 01:09:11,720 --> 01:09:15,859 But never mind. 1215 01:09:15,859 --> 01:09:20,240 So what Eric and his colleagues were doing 1216 01:09:20,240 --> 01:09:25,399 was trying really to look at how this kind of meta level 1217 01:09:25,399 --> 01:09:28,370 reasoning about how much reasoning and what 1218 01:09:28,370 --> 01:09:33,109 kind of reasoning is worth doing plays into the decision making 1219 01:09:33,109 --> 01:09:34,319 process. 1220 01:09:34,319 --> 01:09:36,260 So the expected value of computation 1221 01:09:36,260 --> 01:09:38,930 as a fundamental component of reflection 1222 01:09:38,930 --> 01:09:41,580 about alternative inference strategies. 1223 01:09:41,580 --> 01:09:44,720 So for example, I mentioned that QMR 1224 01:09:44,720 --> 01:09:47,600 had these alternative questioning methods 1225 01:09:47,600 --> 01:09:50,689 depending on the length of the differential 1226 01:09:50,689 --> 01:09:52,350 that it was working on. 1227 01:09:52,350 --> 01:09:55,910 So that's an example of a kind of meta level reasoning that 1228 01:09:55,910 --> 01:09:59,330 says that it may be more effective to do 1229 01:09:59,330 --> 01:10:03,620 one kind of question asking strategy than another. 1230 01:10:03,620 --> 01:10:05,690 The degree of refinement, people talk 1231 01:10:05,690 --> 01:10:08,510 about things like just-in-time algorithms, 1232 01:10:08,510 --> 01:10:13,160 where if you run out of time to think more deliberately, 1233 01:10:13,160 --> 01:10:16,250 you can just take the best answer that's available to you 1234 01:10:16,250 --> 01:10:17,360 now. 1235 01:10:17,360 --> 01:10:20,540 And so taking the value of information, 1236 01:10:20,540 --> 01:10:24,290 the value of computation, and the value of experimentation 1237 01:10:24,290 --> 01:10:27,860 into account in doing this meta level reasoning 1238 01:10:27,860 --> 01:10:31,940 is important to come up with the most effective strategies. 1239 01:10:31,940 --> 01:10:36,470 So he gives an example of a time pressure 1240 01:10:36,470 --> 01:10:42,890 decision problem where you have a patient, a 75-year-old woman 1241 01:10:42,890 --> 01:10:47,870 in the ICU, and she develops sudden breathing difficulties. 1242 01:10:47,870 --> 01:10:48,620 So what do you do? 1243 01:10:52,820 --> 01:10:55,370 Well, it's a challenge, right? 1244 01:10:55,370 --> 01:10:58,160 You could be very deliberative, but the problem 1245 01:10:58,160 --> 01:11:02,510 is that she may die because she's not breathing well, 1246 01:11:02,510 --> 01:11:04,610 or you could impulsively say, well, 1247 01:11:04,610 --> 01:11:06,620 let's put her on a mechanical ventilator, 1248 01:11:06,620 --> 01:11:09,170 because we know that that will prevent her 1249 01:11:09,170 --> 01:11:11,570 from dying in the short term, but that 1250 01:11:11,570 --> 01:11:14,630 may be the wrong decision, because that has bad side 1251 01:11:14,630 --> 01:11:16,370 effects. 1252 01:11:16,370 --> 01:11:20,210 She may get an infection, get pneumonia, and die that way. 1253 01:11:20,210 --> 01:11:23,210 And you certainly don't want to subject her to that risk 1254 01:11:23,210 --> 01:11:26,990 if she didn't need to take that risk. 1255 01:11:26,990 --> 01:11:31,640 So they designed an architecture that 1256 01:11:31,640 --> 01:11:35,150 says, well, this is the decision that you're trying to make, 1257 01:11:35,150 --> 01:11:37,700 which they're modeling by an influence diagram. 1258 01:11:37,700 --> 01:11:42,110 So this is a Bayesian network with the addition of decision 1259 01:11:42,110 --> 01:11:44,060 nodes and value nodes. 1260 01:11:44,060 --> 01:11:48,110 But you use Bayesian network techniques to calculate 1261 01:11:48,110 --> 01:11:50,150 optimal decisions here. 1262 01:11:50,150 --> 01:11:52,910 And then this is kind of the background knowledge 1263 01:11:52,910 --> 01:11:56,180 of what we understand about the relationships 1264 01:11:56,180 --> 01:11:59,270 among different things in the intensive care unit. 1265 01:11:59,270 --> 01:12:03,470 And this is a representation of the meta reasoning 1266 01:12:03,470 --> 01:12:07,590 that says, which utility model should we use? 1267 01:12:07,590 --> 01:12:09,560 Which reasoning technique should we use? 1268 01:12:09,560 --> 01:12:10,670 And so on. 1269 01:12:10,670 --> 01:12:13,370 And they built an architecture that integrates 1270 01:12:13,370 --> 01:12:17,180 these various approaches. 1271 01:12:17,180 --> 01:12:20,420 And then in my last 2 minutes, I just 1272 01:12:20,420 --> 01:12:24,470 want to tell you about an interesting-- this 1273 01:12:24,470 --> 01:12:26,720 is a modern view, not historical. 1274 01:12:26,720 --> 01:12:31,910 So this was a paper presented at the last NeurIPS meeting, which 1275 01:12:31,910 --> 01:12:36,890 said the kinds of problems that we've been talking about, 1276 01:12:36,890 --> 01:12:39,170 like the acute renal failure problem 1277 01:12:39,170 --> 01:12:44,390 or like any of these others, we can reformulate this 1278 01:12:44,390 --> 01:12:48,690 as a reinforcement learning problem. 1279 01:12:48,690 --> 01:12:57,140 So the idea is that if you treat all activities, including 1280 01:12:57,140 --> 01:13:00,380 putting somebody on a ventilator or concluding 1281 01:13:00,380 --> 01:13:05,060 a diagnostic conclusion or asking a question or any 1282 01:13:05,060 --> 01:13:07,520 of the other things that we've contemplated, 1283 01:13:07,520 --> 01:13:10,070 if you treat those all in a uniform way 1284 01:13:10,070 --> 01:13:14,540 and say these are actions, we then 1285 01:13:14,540 --> 01:13:19,310 model the universe as a Markov decision process, 1286 01:13:19,310 --> 01:13:22,770 where every time that you take one of these actions, 1287 01:13:22,770 --> 01:13:26,360 it changes the state of the patient, 1288 01:13:26,360 --> 01:13:30,110 or the state of our knowledge about the patient. 1289 01:13:30,110 --> 01:13:32,180 And then you do reinforcement learning 1290 01:13:32,180 --> 01:13:35,120 to figure out what is the optimal policy 1291 01:13:35,120 --> 01:13:38,360 to apply under all possible states 1292 01:13:38,360 --> 01:13:42,750 in order to maximize the expected outcome. 1293 01:13:42,750 --> 01:13:47,360 So that's exactly the approach that they're taking. 1294 01:13:47,360 --> 01:13:50,810 The state space is the set of positive and negative findings. 1295 01:13:50,810 --> 01:13:53,900 The action space is to ask about a finding 1296 01:13:53,900 --> 01:13:56,000 or conclude a diagnosis. 1297 01:13:56,000 --> 01:14:00,140 The reward is the correct or incorrect single diagnosis. 1298 01:14:00,140 --> 01:14:03,170 So once you reach a diagnosis, the process stops, 1299 01:14:03,170 --> 01:14:05,170 and you get your reward. 1300 01:14:05,170 --> 01:14:08,150 It's finite horizon because they impose a limit 1301 01:14:08,150 --> 01:14:09,920 on the number of questions. 1302 01:14:09,920 --> 01:14:13,070 If you don't get an answer by then, you lose. 1303 01:14:13,070 --> 01:14:15,650 You get a minus one reward. 1304 01:14:15,650 --> 01:14:21,830 There is a discount factor so that the further away a reward 1305 01:14:21,830 --> 01:14:26,690 is, the less value it has to you at any point, which encourages 1306 01:14:26,690 --> 01:14:29,000 shorter question sequences. 1307 01:14:29,000 --> 01:14:32,540 And they use a pretty standard Q learning framework, or at least 1308 01:14:32,540 --> 01:14:36,860 a modern Q learning framework using a double deep neural 1309 01:14:36,860 --> 01:14:38,840 network strategy. 1310 01:14:38,840 --> 01:14:41,750 And then there are two pieces of magic sauce 1311 01:14:41,750 --> 01:14:44,550 that make this work better. 1312 01:14:44,550 --> 01:14:47,120 And one of them is that they want 1313 01:14:47,120 --> 01:14:50,030 to encourage asking questions that 1314 01:14:50,030 --> 01:14:52,640 are likely to have positive answers rather 1315 01:14:52,640 --> 01:14:54,440 than negative answers. 1316 01:14:54,440 --> 01:14:57,200 And the reason is because in their world, 1317 01:14:57,200 --> 01:15:00,020 there are hundreds and hundreds of questions. 1318 01:15:00,020 --> 01:15:02,270 And of course, most patients don't 1319 01:15:02,270 --> 01:15:05,030 have most of those findings. 1320 01:15:05,030 --> 01:15:07,880 And so you don't want to ask a whole bunch of questions 1321 01:15:07,880 --> 01:15:12,320 to which the answer is no, no, no, no, no, no, no, no, no, 1322 01:15:12,320 --> 01:15:14,600 because that doesn't give you very much guidance. 1323 01:15:14,600 --> 01:15:17,420 You want to ask questions where the answer is yes, 1324 01:15:17,420 --> 01:15:22,950 because that helps you clue in on what's really going on. 1325 01:15:22,950 --> 01:15:26,990 So they actually have a nice proof 1326 01:15:26,990 --> 01:15:30,680 that they do this thing they call reward shaping, 1327 01:15:30,680 --> 01:15:35,540 which basically adds some incremental reward 1328 01:15:35,540 --> 01:15:39,650 for asking questions that will have a positive answer. 1329 01:15:39,650 --> 01:15:43,460 But they can prove that an optimal policy learned 1330 01:15:43,460 --> 01:15:46,130 from that reward function is also 1331 01:15:46,130 --> 01:15:50,090 optimal for the reward function that would not include it. 1332 01:15:50,090 --> 01:15:52,510 So that's kind of cool. 1333 01:15:52,510 --> 01:15:54,930 And then the other thing they do is 1334 01:15:54,930 --> 01:15:59,490 to try to identify a reduced space of findings 1335 01:15:59,490 --> 01:16:02,310 by what they call feature rebuilding. 1336 01:16:02,310 --> 01:16:05,910 And this is essentially a dimension reduction technique 1337 01:16:05,910 --> 01:16:09,060 where they're co-training. 1338 01:16:09,060 --> 01:16:11,510 In this dual network architecture, 1339 01:16:11,510 --> 01:16:15,690 they're co-training the policy model. 1340 01:16:15,690 --> 01:16:17,940 It's, of course, the neural network model, 1341 01:16:17,940 --> 01:16:21,480 this being that 2010s. 1342 01:16:21,480 --> 01:16:26,640 And so they're generating a sequence, a deep layered set 1343 01:16:26,640 --> 01:16:30,750 of neural networks that generate an output, which 1344 01:16:30,750 --> 01:16:35,490 is the m questions and the n conclusions that can be made. 1345 01:16:35,490 --> 01:16:37,860 And I think there's a soft max over these 1346 01:16:37,860 --> 01:16:43,330 to come up with the right policy for any particular situation. 1347 01:16:43,330 --> 01:16:48,000 But at the same time, they co-train it in order to predict 1348 01:16:48,000 --> 01:16:51,870 a number of-- 1349 01:16:51,870 --> 01:16:56,610 all of the manifestations from what they've observed before. 1350 01:16:56,610 --> 01:17:00,390 So it's using-- it's learning a probabilistic model that 1351 01:17:00,390 --> 01:17:03,240 says if you've answered the following questions 1352 01:17:03,240 --> 01:17:07,050 in the following ways, here are the likely answers that you 1353 01:17:07,050 --> 01:17:10,560 would give to the remaining manifestations. 1354 01:17:10,560 --> 01:17:12,550 And the reason they can do that, of course, 1355 01:17:12,550 --> 01:17:14,850 is because they really are not independent. 1356 01:17:14,850 --> 01:17:18,570 They're very often co-varying. 1357 01:17:18,570 --> 01:17:20,700 And so they learn that covariance, 1358 01:17:20,700 --> 01:17:24,690 and therefore can predict which answers are going to get yes 1359 01:17:24,690 --> 01:17:28,380 answers, which questions are going to get yes answers. 1360 01:17:28,380 --> 01:17:33,330 And therefore, they can bias the learning toward doing that. 1361 01:17:33,330 --> 01:17:35,835 So last slide. 1362 01:17:38,400 --> 01:17:41,220 So this system is called REFUEL. 1363 01:17:41,220 --> 01:17:43,870 It's been tested on a simulated data 1364 01:17:43,870 --> 01:17:49,560 set of 650 diseases and 375 symptoms. 1365 01:17:49,560 --> 01:17:56,160 And what they show is that the red line is their algorithm. 1366 01:17:56,160 --> 01:18:01,220 The yellow line uses only this reward reshaping. 1367 01:18:01,220 --> 01:18:04,590 And the blue line is just a straight reinforcement 1368 01:18:04,590 --> 01:18:05,910 learning approach. 1369 01:18:05,910 --> 01:18:07,980 And you can see that they're doing 1370 01:18:07,980 --> 01:18:11,780 much better after many fewer epochs of training 1371 01:18:11,780 --> 01:18:12,990 in doing this. 1372 01:18:12,990 --> 01:18:17,310 Now, take this with a grain of salt. This is all fake data. 1373 01:18:17,310 --> 01:18:20,790 So they didn't have real data sets to test this on. 1374 01:18:20,790 --> 01:18:25,770 They got statistics on what diseases are common 1375 01:18:25,770 --> 01:18:28,860 and what symptoms are common in those diseases. 1376 01:18:28,860 --> 01:18:31,830 And then they had a generative model 1377 01:18:31,830 --> 01:18:34,420 that generated this fake data. 1378 01:18:34,420 --> 01:18:37,350 And then they learned from that generative model. 1379 01:18:37,350 --> 01:18:40,680 So of course it would be really important to redo 1380 01:18:40,680 --> 01:18:44,370 the study with real data, but they've not done that. 1381 01:18:44,370 --> 01:18:47,110 This was just published a few months ago. 1382 01:18:47,110 --> 01:18:50,470 So that's sort of where we are at the moment in diagnosis 1383 01:18:50,470 --> 01:18:53,040 and in differential diagnosis. 1384 01:18:53,040 --> 01:18:56,850 And I wanted to start by introducing 1385 01:18:56,850 --> 01:19:00,360 these ideas in a kind of historical framework. 1386 01:19:00,360 --> 01:19:03,210 But it means that there are a tremendous number of papers, 1387 01:19:03,210 --> 01:19:08,160 as you can imagine, that have been written since the 1990s 1388 01:19:08,160 --> 01:19:11,220 and '80s that I was showing you that 1389 01:19:11,220 --> 01:19:15,090 are essentially elaborations on the same themes. 1390 01:19:15,090 --> 01:19:18,690 And it's only in the past decade of the advent 1391 01:19:18,690 --> 01:19:21,690 of these neural network models that people 1392 01:19:21,690 --> 01:19:25,440 have changed strategy, so that instead 1393 01:19:25,440 --> 01:19:28,440 of learning explicit probabilities, for example, 1394 01:19:28,440 --> 01:19:31,920 like you do in a Bayesian network, you just say, 1395 01:19:31,920 --> 01:19:36,030 well, this is simply a prediction task. 1396 01:19:36,030 --> 01:19:38,130 And so we'll predict the way we predict 1397 01:19:38,130 --> 01:19:40,980 everything else with neural network models, which 1398 01:19:40,980 --> 01:19:45,330 is we build a CNN, or an RNN, or some combination of things, 1399 01:19:45,330 --> 01:19:48,660 or some attention model, or something. 1400 01:19:48,660 --> 01:19:49,980 And we throw that at it. 1401 01:19:49,980 --> 01:19:52,380 And it does typically a slightly better job 1402 01:19:52,380 --> 01:19:54,420 than any of the previous learning methods 1403 01:19:54,420 --> 01:19:57,780 that we've used typically, but not always. 1404 01:19:57,780 --> 01:19:58,320 OK. 1405 01:19:58,320 --> 01:19:59,870 Peace.