Every conversation with a patient is an exercise in the analysis of “big data.” The patient’s appearance, changes in mood and expression, and eye contact are data points. The illness narrative is rich in semiotics: pacing, timing, nuances of speech, dialect are influenced by context, background, and insight which in turn reflect religion, education, literacy, numeracy, life experiences and peer input. All this is tempered by personal philosophy and personality traits such as recalcitrance, resilience, and tolerance. Taking a history, by itself, generates a wealth of data but that’s just the start.
Add into the mix physical findings of variable reliability, laboratory markers of variable specificity, imaging bits and bytes and you have “big data.” Then you mine this data for the probabilistic variance of the potential causes of a complaint based on which you begin to consider values for numerous options for care. So armed, the physician next needs to factor the benefits and harms of multiple treatments’ derived from populations that never perfectly reflect the situation of the individual in the chair next to us, our patient. This is the information necessary to empower our patient to make rational choices from the menu of options. That is clinical medicine. That is what we do many times a day to the best of our ability and to the limits of our stamina.
Take that Watson. You need a lot more than 90 servers and megawatts of electricity to manage our bedside rounds. You need to contend with the gloriously complicated and idiosyncratic fabric of human existence. Poets might be a match, but Watson is not.
Watson is doomed not just from its limited technical sufficiency compared our cognitive birthright. Even if Watson could grow its server brain to match ours, it won’t be able to find measurable quantities for the independent variables captured during a patient encounter nor the role of personal values that temper that patient’s choice. Life does not have independent and dependent variables; the things that matter to us are on both sides of a regression model. Watson needs rules to violate this statistic and there are none that generalize. Somehow, our brains have a measuring instrument that no data query can find or measure and that we innately understand but can’t fully communicate. Also, our brains seem to intuitively understand statistics; our brains know that the variations around the regression lines (residuals) mean more to us than the models themselves. Sure, if there is something discrete to know, a simple, measurable deterministic item, or an answer to a game show question, Watson will kick most, and maybe all, of our butts. But, what if what is important to us is not deterministic, nor discrete? What if life is more importantly measured in “when” than “if”? And what if the “when, and how we feel about the when” are intertwined? What if medical life is not even measured in outcomes, but, instead, relationships that foster peaceful moments? In this reality, Watson will be lost.
Watson is doomed on yet another level beyond a dearth of “code friendly” meaningful measures of humanity. It is doomed in that it is capable of reading the “World’s Literature”. Our desires and motives to improve the care of individuals is being buried in reams of codependent, biased, unrestricted, marketed, false positive or false negative associated, and poorly studied information that sees the light of Watson’s day because it can read every report published in the massive number of nearly 20,000 biomedical journals. A “60 Minutes” report on AI reveled in Watson’s prowess at searching the literature. We can’t substantiate one particular quote in the report, and bet the quoted can’t either, that there are 8000 research reports published daily. But, that is Watson’s problem. Watson fails to recognize that it is more important to know what we should not read rather than to be able to read it all. There is just too much precarious information being perpetrated on unsuspecting readers, whether the readers have eyes or algorithms.
Science is the glue that holds medical care together but it is far from a perfect adhesive. We have both served long tenures on the editorial boards of leading general and specialty clinical journals. We have many an anecdote about the rocky relationship between medical care and the science that informs it. An anecdote from Dr. McNutt serves as a particularly disconcerting object lesson. He commented on a paper being brought for publication, a paper that he argued should be rejected because it was a Phase 2 study. The study was not fatally flawed by design, just premature, as many Phase 2 studies fail to be replicated after better-designed Phase 3 studies are performed. Science is about accuracy and redundancy and timelessness and process, not expediency. Despite his arguments the paper was published and became highly cited. Sure enough a better-designed Phase 3 study rejected the hypothesis supported by the Phase 2 study vindicating Dr. McNutt on this occasion. But that is not the point. The point is that Watson knows of both studies. You only need to know one of them. How did Watson handle the irreproducible nature of the studies and their contrary insights? One might wonder if the negative study was cited as often as the positive, premature study. Watson would know.
Are we being too tough on AI? We are not writing about Watson’s specific program but, instead, using it as a metaphor for big data analytics and messy regression models. It is not clear if Watson has been tested in a range of clinical situations where inherent uncertainty prevails. No pertinent randomized trials are cited when “Watson artificial intelligence” is entered into “PubMed”. There are attempts to match patients to clinical studies, but no outcome studies. This is important since that 60 Minute episode told of a patient who was treated after a “recommendation” from Watson. We assume that the treatment met ethical standards for a Phase 1 study and that the patient was fully informed. We are left to assume, also, that the information found by AI was reliable and adequately tested. After all, this compliant-with-Watson, yet unfortunate patient succumbed to an “infection” several months after receiving the treatment. We worry about the validity of the information spewed by the algorithm and how on earth the researchers planned to learn anything about the efficacy of the proposed intervention from treating their patient. Science requires universal aims and adequate comparisons. In our view, any AI solution for any patient should be subjected to stringent, publicly available scientific testing. AI, to us, is in dire need of Phase 1 testing.
Science can be better. Watson will not advance science, scientific inquiry will. Better designs for clinical care and insights from scientific data need to be developed and implemented. We do not need massive amounts of data, just small amounts gathered in thoughtfully planned studies. And with better science, we will not need AI. Instead of banking, or breaking the bank, on AI, we should use our remarkable brains to learn by rigorous scientific enquiry and introduce valid scientific insights into the “big data” dialogue we call the patient’s “history” and do so in the service of what we call “patient care.” Watson and other systems may be able to do a wonderful job determining what books we buy, and, from a medical perspective, it might be able to pick a particular antibiotic given a known infection due to the deterministic nature of that task. But, treating infection, as an example, is a small data part of what we do; we help sick people and for that big data task, Watson will, in our view, not be sufficiently insightful.