Hey Watson, Can I Sue You?

Currently, three South Korean medical institutions – Gachon University Gil Medical Center, Pusan National University Hospital and Konyang University Hospital – have implemented IBM’s Watson for Oncology artificial intelligence (AI) system. As IBM touts the Watson for Oncology AI’s to “[i]dentify, evaluate and compare treatment options” by understanding the longitudinal medical record and applying its training to each unique patient, questions regarding the status and liability of these AI machines have arisen.

Given its ability to interpret data and present treatment options (along with relevant justifications), AI represents an interim step between a diagnostic tool and colleague in medical settings. Using philosophical and legal concepts, this article explores whether AI’s ability to adapt and learn means that it has the capacity to reason and whether this means that AI should be considered a legal person.

Through this exploration, the authors conclude that medical AI such as Watson for Oncology should be given a unique legal status akin to personhood to reflect its current and potential role in the medical decision-making process. They analogize the role of IBM’s AI to those of medical residents and argue that liability for wrongful diagnoses should be generally based on a medical malpractice basis rather than through products liability or vicarious liability. Finally, they differentiate medical AI from AI used in other products, such as self-driving cars.

Legal scholarship pertaining to artificial intelligence (AI), especially in light of its increasing sophistication and decision-making prowess, is still nascent. Courts have traditionally deemed it impossible for machines to have legal liability as they are not legal persons. Indeed, U.S. courts have previously outlined that “robots cannot besued”. However,some modern commentators argue that a rethinking of legal systems is necessary to deal with AI considering its growing capabilities.

Controversy about how to treat AI from a legal standpoint reflects deeper disagreements about how humans should interact with such technologies, with debate about AI’s potential tending to gravitate towards the extremes. Decades of popular culture depictions reflect this dichotomy. Isaac Asimov envisions a world of peaceful co-existence with sentient robots, with the First Law of Robotics dictating that robots may not injure a human or allow one to come to harm. At the other end of the spectrum, we have dystopias such as Terminator’s Skynet or The Matrix, wherein sentient machines overtake humanity as the world’s dominant species.

AI has not yet achieved sentience and presently exists only in the form of useful tools rather than as a race capable of truly interacting with humans – let alone challenging us for global dominance. However, this does not mean that AI is incapable of interpretation or giving advice. AI employing cognitive learning is regularly used to parse through data and to offer options to human operators.

IBM Watson for Oncology (hereinafter “Watson”) represents the most compelling usage of such technology to date. Clinicians and analysts train Watson, a so-called cognitive computing system, to “interpret cancer patients’ clinical information and identify individualized, evidence-based treatment options.” But does this indicate a machine’s ability to reason? And, if so, should machines be held liable for exercising poor judgment?

This paper highlights Watson’s duties and abilities to argue that existing legal regimes are sufficient to deal with AI’s (including Watson’s) “humanity” at its current stage of technological advancement, and to adequately attribute blame and liability for its errors. By adopting a practical and responsibility-based approach that assesses Watson’s influence on treatment outcomes, legal regimes should enjoy a flexible and workable framework within which to regulate existing AI responsibility and liability while a broader debate about the rights and role of AI in human society takes place.

What is Artificial Intelligence Anyway?

Artificial intelligence is commonly understood as intelligence displayed by machines, the ability and development of machines to perform tasks that normally require human intelligence, or some such variation. AI applications are already commonplace in our daily activities, such as when we ask Siri if it’s raining, or request that Alexa play some bluegrass music. These types of AI applications, focused on the execution of a single task, are referred to alternatively as “weak” or “narrow” AI.

In contrast, many people associate AI with Artificial General Intelligence (AGI), or “strong” AI, which does not yet exist. This refers to the achievement of machine sentience or consciousness; AGI computers would possess or display intelligence equivalent to that of humans in every respect. Various tests have been developed that would purportedly detect the existence of true AGI, such as the Turing Test – in which a human would be unable to distinguish the responses of an AGI and a human to various posed questions. Ray Kurzweil, Google’s Director of Engineering and a noted futurist with a track record for accurate predictions (he claims an 86% success rate out of his 147 predictions since the 1990s), has consistently predicted 2029 as the year an AI will first pass a valid Turing Test.3 AGI would be capable of recursive self-improvement, leading to the rapid emergence of Artificial Superintelligence (ASI), the limits of which are unknown – but which many predict will result in the so-called “technological singularity,” perhaps entailing the obsolescence of human beings.

For now, we have expert systems, which have existed and evolved since the 1970s: computer programs that utilize AI methodologies to solve problems within a specialized subject area. Such domains may include anything from logic problems and games like chess to financial investing, legal research, and of course, medical decision-making. Expert systems that extend beyond binary logical inquiries (yes/no, true/false) rely on “fuzzy logic,” which must process linguistic terms in conditions of imprecise knowledge.4

These philosophical categorizations of AI inform the discussion herein.

What is Watson for Oncology?

IBM’s AI software or “expert system”, Watson (named after IBM’s first CEO, industrialist Thomas J. Watson, and not Sherlock Holmes’ sidekick, as is popularly misattributed), arrived on the popular radar when it beat two of Jeopardy’s all-time champions in 2011. The following year, IBM announced Watson’s first practical collaboration – with the Cleveland Clinic, in hopes that the system’s ability to synthesize huge amounts of data and produce evidence-based hypotheses could aid clinicians and students in more accurately diagnosing and treating patients.

IBM wanted to create a “supercharged Siri for business” and it essentially has. Watson now uses the same “Deep QA” software it used to achieve quiz show glory to tackle some of the healthcare system’s most complex problems – including properly diagnosing and treating various forms of cancer. Deep QA, in the simplest terms, is the software architecture that “analyzes, reasons about, and answers the content fed into Watson.” Watson’s abilities in the oncology space are ever expanding. In 2014, IBM announced that physicians could start using Watson to connect genomic and medical data to help drive more personalized treatments, and Business Insider has predicted that Watson will eventually allow oncologists to “upload the DNA fingerprint of a patient’s tumor,” identify which genes are mutated, and then sort through thousands of mutations to identify which is driving the tumor and can be targeted through drug therapies.

Expert systems – with Watson on the cutting edge – comprise “the most visible effects of research on artificial intelligence,” containing a knowledge base and inference tools that allow the user to both ask questions in natural language and receive a reply in the same language.7 Nonetheless, expert systems solve problems only in “a narrow and well-defined area.”

While Watson is therefore a far cry from AGI or the replacement of doctors with cyborgs, IBM frequently (though not always) shies away from specifically calling Watson “AI”, favoring alternate nomenclatures such as “augmented intelligence,” or as in recent TV commercials, “the platform for cognitive business.” Despite Watson representing one of the most sophisticated uses of narrow AI to date, this is perhaps unsurprising given the above-discussed confusion surrounding the definitional tiers of AI. Indeed, IBM’s fine distinction between “cognitive computing” and “artificial intelligence” appears more designed to placate those wary of the decline of human centrality and agency in the healthcare decision-making process rather than an accurate representation of the intensive research and data gathering and interpretation work carried out by Watson – work which is key to outlining the options for treatment of human subjects. Already, Watson is being used to score and rank medical literature and summarize patient records – key tasks in helping to contextualize and justify treatment options. Watson currently draws on more data than any reasonable human can be expected to consult, including 300 medical journals, 200 textbooks, and nearly 15 million pages of text in order to present treatment alternatives, drug options, and instructions for therapeutic administration.

Rising concordance rates suggest that Watson is on the cusp of being viewed as a viable, reliable option for vetting and applying such vast quantities of information. In June 2017, data presented at a meeting of the American Society of Clinical Oncology suggested Watson was highly likely to reach the same treatment conclusions as human doctors. At India’s Manipal Comprehensive Cancer Center, there was agreement between Watson and doctors on the appropriate course of treatment for 96.4% of 112 cases of lung cancer. For other forms of cancer, concordance rates ranged from 81% to 92.7%.

Anecdotal cases have revealed stunning successes as well. University of Tokyo doctors, for example, reported that Watson saved a 60-year old woman’s life by identifying her rare form of leukemia: “The analytical machine took just 10 minutes to compare the patient’s genetic changes with a database of 20 million cancer research papers, delivering an accurate diagnosis and leading to proper treatment that had proven elusive.”

Results have not all been this favorable – for instance, concordance occurred in a mere 49% of 185 gastric cancer cases in South Korea.12 However, this discrepancy likely occurred because results tuned for (and trained by) Watson’s doctors at Sloan Kettering did not translate to treatment approaches in Korea. However, with additional input from Korean doctors, Watson should be able to adapt to information fed to it from Korean doctors and provide options more in line with regional orthodoxy. This exemplifies one of the complex, “fuzzy” ways in which Watson can and must learn to effectively problem-solve in a given population.

Watson’s abilities also go beyond research and analysis. It is already being used by some clinicians to provide “a natural language interface for the delivery of general and patient specific information” in the form of giving patients information and gaining feedback. Hence, even such traditionally human duties as patient interviews and education are being taught to Watson – much as a medical resident learns and takes over from his or her attending doctor.

The distinction between Watson’s capabilities as a cognitive learning network, expert system, AI, etc. are more theoretical than practically significant, particularly from a legal standpoint. What is key is that Watson currently can investigate, learn, adapt and communicate with us. Until now, no other non-human being has been capable of as much.

But does this mean AI can make intelligent and responsible decisions?

How Do Humans Make ‘Good’ Decisions?

Watson is an incredible achievement of neural networking and heuristic learning. It can parse through unfathomable amounts of data, rank its quality, and integrate new information into its adaptive programming. It can do all this at a rate that no human can ever hope to match.

Yet, Watson is not sentient and, more crucially, cannot yet internalize or express moral values. This alienation of morality from the AI’s cognitive abilities lies at the heart of our difficulty in classifying Watson or attributing “human” intelligence to it. Traditionally, we have viewed morality as being central to the enjoyment of full rights and legal responsibilities of natural persons. Mankind has long attributed significant weight to emotional intelligence and intuition as necessary components of overall intelligence and the ability to make good decisions. This explains why doctors are judged not only on their raw analytical ability but also on their “bedside manner” and ability to weigh and explain difficult treatment options related to the welfare of the patient.

This may explain why AI in healthcare is viewed with some trepidation by the general public.  Watson, for all its ingenuity, cannot yet engage in such abstract thought and many seem to feel this means AI should have a narrow role and application as part of human medical systems. A common suspicion is that AI’s inability to intuit situations may lead to overly cold and harsh decisions that devalue human life.

Indeed, most Western concepts of morality naturally assume the primacy of human experience and decision-making. As it relates to the very nature of consciousness, René Descartes coined the phrase “je pense, donc je suis” to usher in the era of epistemological study. Through a first-person view, Descartes advanced the notion that interpreting information culled from human senses and either accepting or doubting them was the key to rational thought and existence.15 According to Descartes, the ability to apply methodological reasoning arose not as part of our corporeal existence but through an “essence” communicated through awareness and interpretation of the external world.16

To Descartes, this was a uniquely human ability. The animals with which we share the world were described as “animal machines” (“bête-machines”) – automata without self-consciousness, which could be exploited for human use.17 Similarly, regarding the possibility of “thinking machines,” Descartes argues further in his Discourse on Method that:

…they could never use words, or put together other signs, as we do in order to declare our thought to others. For we can certainly conceive of a machine so constructed that it utters words, and even utters words which correspond to bodily actions causing a change in its organs… But it is not conceivable that such a machine should produce different arrangements of words so as to give an appropriately meaningful answer to whatever is said in its presence, as the dullest of men do.

While Descartes was arguably the progenitor of attempting to outline a systematic method of rationality, even he could never conceive of a world in which machines could communicate with human beings and/or provide appropriate situational responses. After all, Descartes believed that thinking machines would never be able to “reason” through “all the contingencies of life.” According to Descartes, both the complexity of human language and decision-making were beyond mechanization.

Similarly, Immanuel Kant also believed that human beings were special, and he believed that this gave humans an intrinsic worth and dignity. All else, including animals and machines, were therefore but “things” – mere means to ends for humans.

Kantian notions of morality are therefore reserved for humans as “rational agents” capable of making their own decisions, setting goals, and applying reason to conduct. It is this capacity to reason that allows humans to make deontological moral decisions that respect the Categorical Imperative (CI). According to Kant, the CI is a rule that states:

Act only according to that maxim by which you can at the same time will that it should become a universal law.

Much like Descartes, Kant views humans as the only beings capable of comprehending, establishing and respecting CIs. The very concept of morality and responsible decision-making, according to Kant, is predicated on the conceit that the treatment of persons is an “end and never a means only.”21 In practical terms, that means difficult decisions can only arise from humans by humans as no other entity would have the necessary intrinsic qualities to filter human experience and values. That may be why Kant viewed human beings as “above all price.”

From a broader societal perspective, utilitarian philosophers such as Jeremy Bentham, John Stuart Mill, and David Hume advanced the notion that raising overall human happiness should be the primary goal of human decision-making. Their consequences-based approach and devotion to the “common good” is again a human-centric one and one that appears too nuanced for AI to understand. After all, even though AI excels at pure analytical calculations, how would it measure and compare intangible goods so that they can be weighed and pitted against one another?

These examples are not meant to be comprehensive but to demonstrate that morality is viewed as the key to principled decision-making among humans. Judging from the diversity and breadth of philosophical texts on morality, balancing individualistic and societal goals is an exceedingly difficult process involving a metaphysical arithmetic that is highly unsettled and characteristically ill-suited to rigid categorization and valuation.

For a decision to be viewed as good and principled, however, it will have to be consistent with at least some vision of human morality. And this proves to be the greatest difficulty for most when thinking about whether we should accept AI as a component of healthcare system – how can a machine devoid of intrinsic human sensations, perceptions, and intuition be trusted to make the ‘right’ decisions? Even more crucially, how can AI be trusted to choose the correct option in cases where there are competing moral views and interests?

To be fair, this issue is not specific to “robots” and perplexes human decision makers as well. Even a cursory review of Western moral philosophy shows a wide disparity in what people believe makes humans “special” and how humans should interpret their universe and interact with others. This demonstrates the ambiguity of the supposed morality that underpins modern society.

After all, even the weightiest human expression of values, our system of laws expressed through judicial systems, generally allows for some degree of variability in penalties and sentencing. Even when we have succeeded in establishing laws consistent with certain maxims (e.g. against murder), cases which offend these most basic maxims can involve lengthy debates about appropriate actions and mitigating circumstances.

For example, the interplay between reality and maxims is underscored by contemporary debates on euthanasia, where the social value of human life is weighed against the right of the individual to determine his or her own fate. There is no simple answer to this question, and to even begin to view this as a methodological or calculable issue among humans, let alone how to program such issues into AI language, engenders debate on the foundations of how to tackle such issues.

Further, it is unclear how humans can “teach” AI morality when we have failed to address glaring holes in our own extant theories of morality. For instance, regarding utilitarianism, the challenge of dealing with imprecise measurables in calculating the common good remains a challenge – not only for AI but for humans. As critiqued by political philosopher John Rawls, utilitarianism conflates “all systems of desires” into a singular conception of desirable social outcomes.  Thus, the separateness of persons is sacrificed in favor of the utilitarian conceit that there is an impartial truth that maximizes human happiness.

This is problematic because ascertaining how to achieve the greatest net balance of satisfaction is not only difficult but can also obscure questions regarding the “source or quality” of such impetuses. After all, racism may make the majority happier overall but deprive a minority of their safety and liberty. Rawls notes that such a trade-off satisfies the central tenets of utilitarianism but still represents a lack of justice.24 Of course, this begs the question of what justice entails and who gets to define the concept. As one can imagine, this is a difficult conceptual conversation for humans – let alone for a programmer to code for the benefit of AI systems.

It is clear then that morality plays a key role in defining “appropriate” decisions among humans. As moral agents, we are not only called upon to make decisions but also pressured to be able to justify actions to both ourselves and others.Whether the influence is philosophical, religious, spiritual, or simply “intuitive,” human decision-making is assumed to be superior, paradoxically, because of our innate desire and ability to measure and compare incalculable moral considerations.

Can AI Make “Good” Decisions?

Morality may lie at the center of human decision-making but is it a necessity for AI such as Watson, particularly as it relates to current usage in medicine? While the answer to that question may become complex as AI capabilities evolve, with respect to contemporary technological usage and implications, the answer is no.

There are many prerequisite steps before humans can apply moral principles to a decision and not all decisions made in our day-to-day lives require the same level of complexity. For instance, the decision to drink water when thirsty is much more elemental than whether to prolong the lives of terminally ill patients in pain.

This reflects the fact that there is a hierarchy to thought and learning. The work of Benjamin Bloom and his colleagues in 1956 outlined learning theory that classified functional learning into three domains – knowledge (thinking) skills, psychomotor (motor) skills, and affective (behavioral) skills – with graduated categories of learning from the simplest (concrete) to most complex (abstract) in each domain.

Current AI technologies such as Watson exist to mimic human behavior in the knowledge, also referred to as cognitive, domain. For Watson to reach par with the decision-making capability of adult humans, it stands to reason that it would be able to learn in a manner consistent with Bloom’s knowledge taxonomy. To learn on par with humans, AI would have to demonstrate achievement of each of the following steps:

(1)  Knowledge: The ability to recall.

(2)  Comprehension: The ability to understand and interpret a surface-level issue.

(3)  Application: The ability to apply abstractions, general principles and methods to concrete problems.

Bloom’s taxonomy is a practical one and still widely used in the educational realm. Its appeal in this context derives from the fact that it separates out the ability to gauge mastery over increasingly complex tasks and gives us a clear way to judge AI capabilities in a functional and outcome-driven, rather than theoretical, manner.

According to IBM, Watson mimics human problem-solving ability by respecting the following steps28:

Watson does this by employing Deep QA software for the purposes of:

(1)  storing and updating a “corpus of knowledge” in the form of medical texts,

(2)  curating the content (with human intervention) by culling through relevant medical texts and discarding those which are not relevant,

(3)  ingesting the content by creating indices and other metadata that make working with the data more efficient (including creating knowledge graphs),

(4)  learning from human experts who facilitate machine learning by uploading question/answer pairs for the purposes of exposing Watson to linguistic patterns,

(5)  continuing the learning process indefinitely with periodic review by human experts,

(6)  identifying parts of speech in a question or inquiry,

(7)  generating hypotheses,

(8)  searching for evidence to support or refute the hypotheses,

(9)  scoring each hypothesis based on statistical modeling for each piece of weighted evidence, and

(10) performing evidence scoring and rating.

In applying Bloom’s taxonomy to Watson, we see that it excels in certain domains – to the point of far exceeding human capacity – but lags in others. Indeed, it has been argued that current AI machines “have very limited learning abilities” equivalent to only the third level of Bloom’s knowledge taxonomy.

With respect to Watson, this argument appears correct. Regarding the “knowledge” level of Bloom’s taxonomy, we have noted that Watson can store and recall information from a staggering number of medical texts. With regard to “comprehension” and “application”, Watson’s ability to understand inquiries as well as search through relevant data in order to organize information necessary to present logical treatment options is, at this point, indisputable given high concordance rates.

But in marketing Watson, IBM overhypes its capabilities. Unlike humans, Watson still does not and likely cannot perform a key function – to decide on conclusions or courses of action. While Watson excels at understanding human questions, the question and answer-based model of Watson’s programming means that it cannot independently “analyze,” “synthesize,” or “evaluate” medical issues, or independently conceive of and search for new and relevant information without human stewardship or guidance.

For true analysis, synthesis, or evaluation to take place, there must be the ability to choose between a range of competing options. While one may argue that Watson can do that to some extent, in that answers to hypotheses can be scored and ranked, Watson still cannot do anything with this information. Watson’s role is as a passive assistant that can only interpret and answer external inputs.  It does not have its own capacity or desire to formulate the question itself or to independently test and definitively prove or invalidate hypotheses.


Categories: Uncategorized

Tagged as:

3 replies »

  1. ^ a DIAGNOSIS ^ for a person’s Unstable HEALTH is the result of an iterative interchange between deductive and inductive reasoning for testing the possible alternatives of a specific hypothesis based on knowledge, resources and human dignity. The confounding factors are voluminous, especially the ownership of the lab and imaging equipment involved.
    My recent “ancestry DNA” test revealed a heritage of 94% Swedish, 4% European east, < 1% European west, 1% British Isles, and 1% European Jewish. It might be difficult to add these dimensions of the world's 7.5 billion humans to Watson to consider for a specific human's diagnostic needs, especially since the data set might be irrelevant without knowing the entire genome. I wonder how much total memory exists "in the cloud."
    Steve, only if there hasn't been a recent astrophysical revision of GMT.

  2. A teleologic discussion of AI implies, as does our preoccupation with EHR, that a discretely defined decision process, no matter how nuanced, could contribute to the level of TRUST, COOPERATION and RECIPROCITY underlying a caring relationship. An enhanced tool, maybe yes; elucidating a solution to the cost and quality problems of our nation’s healthcare, definitely not.

    Maybe I am just being ingenuous, but it seems as if the owners of the AI software are ultimately liable. Eventually, liability could be defined by legislative structure or appellate court case-law. In the meantime, it is possible that IBM has already planned a solution for liability. What say their C suite folks?