We entered the 21st century awash in “evidence” and determined to anchor the practice of medicine on the evidentiary basis for benefit. There is the sense of triumph; in one generation we had displaced the dominance of theory, conviction and hubris at the bedside. The task now is to make certain that evidence serves no agenda other than the best interests of the patient.
Evidence-based medicine is the conscientious and judicious use of current best evidence from clinical care research in the management of individual patients”. [1,2]
But, what does “judicious” mean? What does “current best” mean? If the evidence is tenuous, should it hold sway because it is currently the best we have? Or should we consider the result “negative” pending a more robust demonstration of benefit? Ambiguity is intolerable when defining evidence because of the propensity of people to decide to do something rather than nothing.  Can we and our patients make “informed” medical decisions on thin evidentiary ice? How thin? Does tenuous evidence mean that no one is benefited or that the occasional individual may be benefited or that many may be benefited but only a very little bit?
The reason clinical medicine is not a science but a philosophy informed by science is that no two people are the same and no two patients present with identical dilemmas or value solutions to the same degree. No science can ever answer the query, “How certain are you that this will benefit me?” The science at best speaks to circumstances similar to “me”. The charge to the physician is to speak to the patient about the science as it pertains to “similar.” To meet this charge the physician must have a substantial degree of confidence that the intervention does or does not benefit most members of some defined population. This is the notion of efficacy and is the goal of a randomized controlled trial (RCT). If there is efficacy, the patient-physician dialogue can seek the relevance for “me” hiding in the uniformity and averaging that characterize high quality RCTs . Defining the absolute value of the added benefit/harm to a particular patient that is associated with one medical service rather than another is the goal of evidence-based medicine . This is first step to informed medical decision making. Providing the support, empathy, caring and perspective so that a given patient can value this information is the goal of evidence-based medical care and the calling of the modern physician. It is a calling that is both reductionistic and humanistic.
The reductionistic exercise demands one adhere to the refutationist principal that is central to scientific reasoning. Seldom does one design a clinical trial hoping that the null hypothesis will stand. It is human nature to hope that there is a demonstrable difference in outcome between those treated with an active intervention and the referent group. But logic dictates that one start with the assumption that there is no benefit. The charge to the statistician is stochastic; what is the likelihood that the null hypothesis can or can not be rejected. The charge to the clinical investigator is not simply to frame the result as negative or positive but as negative or positive enough to inform clinical decision making in the population studied. The moral dilemma for the treating physician is to seek, with the patient, the relevance of the result to that given patient. If an intervention can not be shown to offer a meaningful benefit in any particular population, there is no reason to assume that it will benefit the patient and its prescription is unconscionable. If the intervention is barely effective in a particular population, its prescription is unappealing even if the patient shares characteristics with the study population. Only when the patient learns that it “will likely benefit me” is the prescribing appealing and compliance sensible.
There is nothing about this decision tree that yields certainty. For example, if there have been a reasonably executed randomized controlled trials that did not reject the null hypothesis, the physician is science-bound to advise the patient that this is an intervention with no demonstrated benefit. But no RCT is ever perfectly executed; there is always an element of imperfect compliance on the part of the subjects or the investigators. No RCT ever recruits all varieties of possible subjects. And no negative trial can be assumed to be absolutely negative; there may be effects that were too small to be detected. It is possible to calculate the “power” of the trial’s design to detect small effects, but these calculations can rapidly devolve into an exercise in reductio ad absurdum; sure a small effect can be missed and certainly the statisticians will be exercised, but as a clinician if it’s that small, who cares. From the clinical perspective, a negative trial consigns the intervention to “useless” till proven otherwise.
It there is a positive efficacy trial, there is evidence for benefit and the statisticians and purveyors rejoice. But the patient and physician engaged in informed medical decision making are quick to ask, “Evidence for what?” 
The following are some of the major guideposts for a conversation between a patient and a physician when considering whether the results of any positive efficacy trial are relevant to the patient’s need to know, “Will this benefit me?”
Small effects. We’ve already expressed our hesitancy to base decisions on small effects. The smaller the effect, the less reliable its measurement. Not only are there limitations consequent to the design of the trial, but there are limits consequent to biological variability. These are called randomization errors. When one does an RCT one attempts to randomize all relevant variables and compensate mathematically for any variables that are unequally represented. But often there are relevant variables that are not measured and others not yet quantified (such as heritability). For example, in cardiovascular trials on well individuals, it would be unethical to subject all volunteers to cardiac catheterization. One has to assume that those with less or more subclinical disease randomize equally or counterbalance inequities, an assumption that will not hold with some likelihood. That’s why a small effect may represent randomization error and not the benefit of the intervention. For trials with hard outcomes, such as death, heart attack, or stroke, our interest perks with absolute differences of 2% or greater. If we have to treat 50 or more people to benefit one, we are unimpressed, unconvinced and dissuasive in our discussions with our patients.
The lottery mentality. But the patient says that if the intervention is reasonably benign and indemnified and benefits only 1 in 100, or 1 in 1000 treated, “What do I have to lose?” To which we respond, “A RCT is not a lottery. You are essentially as likely to benefit without the intervention as with it. It’s like winning a lottery without buying a ticket.”
Risk factors. Biochemical measures of risk are modern augury’s talisman. So many amongst us rise up and go to sleep aware that our future is marked by some number: PSA, cholesterol, blood pressure, HbA1c, BMI, and the like. These have become surrogate markers for the disease they portend. The desire to expunge the harbinger is inescapably human and scientifically seductive. We have learned better, but it is a lesson that is hard learned. Never again should anyone be screened for anything unless the test is accurate, the disease is important, and something can be done about it.
Secondary analysis. So many RCTs are powered for small effects. Even if the outcome studied is a surrogate outcome, large study populations are required. For the few trials that test for small, actual clinical outcomes the studies are large, prolonged and costly. There is much a stake and much pressure to eke a small effect out of the morass of data, so much pressure that industry trials are generally more likely to do so than government sponsored trials on the same agents. When all else fails, the trial is declared “negative’ but the data seldom left fallow. It is poked and prodded to look for associations that were not quite those for which the trial was designed. No one should ever assume this exercise can generate anything more than a hypothesis. But many such hypotheses have a life of their own (coronary artery bypass grafting for multi-vessel disease is a classic).
Composite outcomes. We would not leap to purchase an automobile if we were told it was better that the competition in terms of price, reliability, safety or comfort. We’d like to know how much better and by which of the criteria. Yet trials abound touting as a positive outcome a reduction in all heart attacks, fatal heart attacks, congestive heart failure, or the “need” for a procedure. As a rule, the “positive” is only by the last criterion, which is in the eyes of the beholder – often eyes that have a vested interest in this outcome. Informed medical decision making smirks at such shenanigans.
Since the individual patient is to be the beneficiary of evidence based medicine, deciding an intervention has meaningful evidentiary support is not the end of the decision making process. The positive randomized trial may not generalize to the particular circumstance. The criteria for admission to trial are generally stringent and may exclude important attributes that define the illness of any given patient, from age or gender to comorbidities and more. Hence, using or not using the intervention tested in the trial may lead to either over or under treatment in the population at large. This is the humanistic side of informed medical decision making. At least it is known that the intervention benefits someone. Dealing with the uncertainty as to whether this patient is similar to that someone is a value-laden qualitative interactive process. It is medicine.
So, the randomized trial may be decision activating only if the absolute benefit is large and absolute harm is low. Such a result provides evidence for action in patients with characteristics similar to the subjects in the trial. More importantly, this result becomes the prerequisite “gold standard” against which effectiveness can be measured in comparing outcomes in practice in the community, i.e. comparative effectiveness research. Without such, one may be seeking the lesser of two evils. But observational data from large administrative data sets is far more likely to sink in the murk of unquantifiable, unmeasured variables than to answer the question as to whether particular practices in particular populations advantages patients at least as much as the subjects in the “positive” RCT. Our science, hence, must be reformed to involve more people in studies to learn how specific attributes alter an individual’s benefits and harms. We are not asking for studies on subsets of patients within the present framework of clinical care research, which is lacking. We are talking about moving our studies closer to the public at large with enough variation in clinical situations to find who and who does not do better or worse.
It is time for a “state of the union” about the standards for medical evidence. Without a uniform standard of how to obtain evidence for individuals, we will not be able to justly adjudicate the inevitable constraints on the allocations of medical services. A new definition of evidence may be, “Evidence is information that reflects the added value of any medical service based on the unique characteristics of the person and, hence, can inform that person about the relative value of one treatment versus another”.
The gold standard of evidence for individual decision making, in our view, may start with the demonstration of efficacy in a RCT and go on to consecutive patient trials for which only one service is available for a defined condition. These before and after or side-by-side large cohort studies that limit the available options would allow us to control prospectively for characteristics of patients and estimate the differences in outcomes across the continuum of individual vagaries .
Buying health care services is presently poorly planned due to our inability to study what works better than something else for individual decision makers. We can expand insurance coverage, but more importantly we should be talking about what we should cover, and why. We presently do not have standards for measures of value and we do not have enough studies that inform the individual decision maker. We do not need comparative effectiveness studies until we decide what those studies should be aimed at finding. We do not have a uniform definition of evidence, and hence, we are left advising our patients about what is valuable to buy without knowing if anyone really is getting better, or if any individual would want what we are selling in the first place.
But no matter how well we respond to these contingencies, we will be faced with uncertainty. Who better to hold our hand when we make decisions about our health based on imperfect data that our trusted physician. Why else do we need physicians going forward?
1. ACP Journal Club. 2002 Mar-Apr;136:A11.
2. ACP Journal Club. 2003 Nov-Dec;139:A11.
4. McNutt RA, Livingston EH. HYPERLINK “http://www.ncbi.nlm.nih.gov/pubmed/20124544” Evidence-based medicine requires appropriate clinical context. JAMA. 2010 Feb 3;303(5):454-5.
5. McNutt RA. HYPERLINK “http://www.ncbi.nlm.nih.gov/pubmed/15562133” Shared medical decision making: problems, process, progress. JAMA. 2004 Nov 24;292(20):2516-8.
6. Hadler NM. Worried Sick.A prescription for health in an overtreated America. Chapel Hill, UNC Press, 2008.
7. Hadler NM, Gillings DB. On the design of the phase III drug trial. Arthritis Rheum. 26:1354‑1361, 1983.
Nortin Hadler, MD, is a professor of Medicine and Microbiology at the University of North Carolina – Chapel Hill. Dr. Hadler is the author of numerous articles and essays and a series of a popular books on the state of medicine today. His most recent book is “Stabbed in the Back: Confronting Back Pain in an Overtreated Society.” Robert McNutt, MD, is a professor of Medicine at Rush Medical College in Chicago.