The Evidentiary Basis for a Clinically Meaningful Benefit

We entered the 21st century awash in “evidence” and determined to anchor the practice of medicine on the evidentiary basis for benefit. There is the sense of triumph; in one generation we had displaced the dominance of theory, conviction and hubris at the bedside. The task now is to make certain that evidence serves no agenda other than the best interests of the patient.

Evidence-based medicine is the conscientious and judicious use of current best evidence from clinical care research in the management of individual patients”. [1,2]

But, what does “judicious” mean? What does “current best” mean? If the evidence is tenuous, should it hold sway because it is currently the best we have? Or should we consider the result “negative” pending a more robust demonstration of benefit? Ambiguity is intolerable when defining evidence because of the propensity of people to decide to do something rather than nothing. [3] Can we and our patients make “informed” medical decisions on thin evidentiary ice? How thin? Does tenuous evidence mean that no one is benefited or that the occasional individual may be benefited or that many may be benefited but only a very little bit?

The reason clinical medicine is not a science but a philosophy informed by science is that no two people are the same and no two patients present with identical dilemmas or value solutions to the same degree. No science can ever answer the query, “How certain are you that this will benefit me?” The science at best speaks to circumstances similar to “me”. The charge to the physician is to speak to the patient about the science as it pertains to “similar.” To meet this charge the physician must have a substantial degree of confidence that the intervention does or does not benefit most members of some defined population. This is the notion of efficacy and is the goal of a randomized controlled trial (RCT). If there is efficacy, the patient-physician dialogue can seek the relevance for “me” hiding in the uniformity and averaging that characterize high quality RCTs [4]. Defining the absolute value of the added benefit/harm to a particular patient that is associated with one medical service rather than another is the goal of evidence-based medicine [5]. This is first step to informed medical decision making. Providing the support, empathy, caring and perspective so that a given patient can value this information is the goal of evidence-based medical care and the calling of the modern physician. It is a calling that is both reductionistic and humanistic.

The reductionistic exercise demands one adhere to the refutationist principal that is central to scientific reasoning. Seldom does one design a clinical trial hoping that the null hypothesis will stand. It is human nature to hope that there is a demonstrable difference in outcome between those treated with an active intervention and the referent group. But logic dictates that one start with the assumption that there is no benefit. The charge to the statistician is stochastic; what is the likelihood that the null hypothesis can or can not be rejected. The charge to the clinical investigator is not simply to frame the result as negative or positive but as negative or positive enough to inform clinical decision making in the population studied. The moral dilemma for the treating physician is to seek, with the patient, the relevance of the result to that given patient. If an intervention can not be shown to offer a meaningful benefit in any particular population, there is no reason to assume that it will benefit the patient and its prescription is unconscionable. If the intervention is barely effective in a particular population, its prescription is unappealing even if the patient shares characteristics with the study population. Only when the patient learns that it “will likely benefit me” is the prescribing appealing and compliance sensible.

There is nothing about this decision tree that yields certainty. For example, if there have been a reasonably executed randomized controlled trials that did not reject the null hypothesis, the physician is science-bound to advise the patient that this is an intervention with no demonstrated benefit. But no RCT is ever perfectly executed; there is always an element of imperfect compliance on the part of the subjects or the investigators. No RCT ever recruits all varieties of possible subjects. And no negative trial can be assumed to be absolutely negative; there may be effects that were too small to be detected. It is possible to calculate the “power” of the trial’s design to detect small effects, but these calculations can rapidly devolve into an exercise in reductio ad absurdum; sure a small effect can be missed and certainly the statisticians will be exercised, but as a clinician if it’s that small, who cares. From the clinical perspective, a negative trial consigns the intervention to “useless” till proven otherwise.

It there is a positive efficacy trial, there is evidence for benefit and the statisticians and purveyors rejoice. But the patient and physician engaged in informed medical decision making are quick to ask, “Evidence for what?” [6]

The following are some of the major guideposts for a conversation between a patient and a physician when considering whether the results of any positive efficacy trial are relevant to the patient’s need to know, “Will this benefit me?”

Small effects.  We’ve already expressed our hesitancy to base decisions on small effects. The smaller the effect, the less reliable its measurement. Not only are there limitations consequent to the design of the trial, but there are limits consequent to biological variability. These are called randomization errors. When one does an RCT one attempts to randomize all relevant variables and compensate mathematically for any variables that are unequally represented. But often there are relevant variables that are not measured and others not yet quantified (such as heritability). For example, in cardiovascular trials on well individuals, it would be unethical to subject all volunteers to cardiac catheterization. One has to assume that those with less or more subclinical disease randomize equally or counterbalance inequities, an assumption that will not hold with some likelihood. That’s why a small effect may represent randomization error and not the benefit of the intervention. For trials with hard outcomes, such as death, heart attack, or stroke, our interest perks with absolute differences of 2% or greater. If we have to treat 50 or more people to benefit one, we are unimpressed, unconvinced and dissuasive in our discussions with our patients.

The lottery mentality. But the patient says that if the intervention is reasonably benign and indemnified and benefits only 1 in 100, or 1 in 1000 treated, “What do I have to lose?” To which we respond, “A RCT is not a lottery. You are essentially as likely to benefit without the intervention as with it. It’s like winning a lottery without buying a ticket.”

Risk factors. Biochemical measures of risk are modern augury’s talisman. So many amongst us rise up and go to sleep aware that our future is marked by some number: PSA, cholesterol, blood pressure, HbA1c, BMI, and the like. These have become surrogate markers for the disease they portend. The desire to expunge the harbinger is inescapably human and scientifically seductive. We have learned better, but it is a lesson that is hard learned. Never again should anyone be screened for anything unless the test is accurate, the disease is important, and something can be done about it.

Secondary analysis. So many RCTs are powered for small effects. Even if the outcome studied is a surrogate outcome, large study populations are required. For the few trials that test for small, actual clinical outcomes the studies are large, prolonged and costly. There is much a stake and much pressure to eke a small effect out of the morass of data, so much pressure that industry trials are generally more likely to do so than government sponsored trials on the same agents. When all else fails, the trial is declared “negative’ but the data seldom left fallow. It is poked and prodded to look for associations that were not quite those for which the trial was designed. No one should ever assume this exercise can generate anything more than a hypothesis. But many such hypotheses have a life of their own (coronary artery bypass grafting for multi-vessel disease is a classic).

Composite outcomes. We would not leap to purchase an automobile if we were told it was better that the competition in terms of price, reliability, safety or comfort. We’d like to know how much better and by which of the criteria. Yet trials abound touting as a positive outcome a reduction in all heart attacks, fatal heart attacks, congestive heart failure, or the “need” for a procedure. As a rule, the “positive” is only by the last criterion, which is in the eyes of the beholder – often eyes that have a vested interest in this outcome. Informed medical decision making smirks at such shenanigans.

Since the individual patient is to be the beneficiary of evidence based medicine, deciding an intervention has meaningful evidentiary support is not the end of the decision making process.  The positive randomized trial may not generalize to the particular circumstance. The criteria for admission to trial are generally stringent and may exclude important attributes that define the illness of any given patient, from age or gender to comorbidities and more. Hence, using or not using the intervention tested in the trial may lead to either over or under treatment in the population at large. This is the humanistic side of informed medical decision making. At least it is known that the intervention benefits someone. Dealing with the uncertainty as to whether this patient is similar to that someone is a value-laden qualitative interactive process. It is medicine.

So, the randomized trial may be decision activating only if the absolute benefit is large and absolute harm is low. Such a result provides evidence for action in patients with characteristics similar to the subjects in the trial. More importantly, this result becomes the prerequisite “gold standard” against which effectiveness can be measured in comparing outcomes in practice in the community, i.e. comparative effectiveness research. Without such, one may be seeking the lesser of two evils. But observational data from large administrative data sets is far more likely to sink in the murk of unquantifiable, unmeasured variables than to answer the question as to whether particular practices in particular populations advantages patients at least as much as the subjects in the “positive” RCT. Our science, hence, must be reformed to involve more people in studies to learn how specific attributes alter an individual’s benefits and harms. We are not asking for studies on subsets of patients within the present framework of clinical care research, which is lacking. We are talking about moving our studies closer to the public at large with enough variation in clinical situations to find who and who does not do better or worse.

It is time for a “state of the union” about the standards for medical evidence. Without a uniform standard of how to obtain evidence for individuals, we will not be able to justly adjudicate the inevitable constraints on the allocations of medical services. A new definition of evidence may be, “Evidence is information that reflects the added value of any medical service based on the unique characteristics of the person and, hence, can inform that person about the relative value of one treatment versus another”.

The gold standard of evidence for individual decision making, in our view, may start with the demonstration of efficacy in a RCT and go on to consecutive patient trials for which only one service is available for a defined condition. These before and after or side-by-side large cohort studies that limit the available options would allow us to control prospectively for characteristics of patients and estimate the differences in outcomes across the continuum of individual vagaries [7].

Buying health care services is presently poorly planned due to our inability to study what works better than something else for individual decision makers. We can expand insurance coverage, but more importantly we should be talking about what we should cover, and why. We presently do not have standards for measures of value and we do not have enough studies that inform the individual decision maker. We do not need comparative effectiveness studies until we decide what those studies should be aimed at finding.  We do not have a uniform definition of evidence, and hence, we are left advising our patients about what is valuable to buy without knowing if anyone really is getting better, or if any individual would want what we are selling in the first place.

But no matter how well we respond to these contingencies, we will be faced with uncertainty. Who better to hold our hand when we make decisions about our health based on imperfect data that our trusted physician. Why else do we need physicians going forward?

1. ACP Journal Club. 2002 Mar-Apr;136:A11.

2. ACP Journal Club. 2003 Nov-Dec;139:A11.

3.  HYPERLINK “http://www.ncbi.nlm.nih.gov/pubmed?term=%22Ayanian%20JZ%22%5BAuthor%5D” Ayanian JZ,  HYPERLINK “http://www.ncbi.nlm.nih.gov/pubmed?term=%22Berwick%20DM%22%5BAuthor%5D” Berwick DM. Do physicians have a bias toward action? A classic study revisited.  HYPERLINK “javascript:AL_get(this,%20’jour’,%20’Med%20Decis%20Making.’);” \o “Medical decision making : an international journal of the Society for Medical Decision Making.” Med Decis Making. 1991 Jul-Sep;11(3):154-8.

4. McNutt RA, Livingston EH.  HYPERLINK “http://www.ncbi.nlm.nih.gov/pubmed/20124544” Evidence-based medicine requires appropriate clinical context. JAMA. 2010 Feb 3;303(5):454-5.

5. McNutt RA.  HYPERLINK “http://www.ncbi.nlm.nih.gov/pubmed/15562133” Shared medical decision making: problems, process, progress. JAMA. 2004 Nov 24;292(20):2516-8.

6. Hadler NM. Worried Sick.A prescription for health in an overtreated America. Chapel Hill, UNC Press, 2008.

7. Hadler NM, Gillings DB.  On the design of the phase III drug trial.  Arthritis Rheum. 26:1354‑1361, 1983.

Nortin Hadler, MD, is a professor of Medicine and Microbiology at the University of North Carolina – Chapel Hill. Dr. Hadler is the author of numerous articles and essays and a series of a popular books on the state of medicine today. His most recent book is “Stabbed in the Back: Confronting Back Pain in an Overtreated Society.” Robert McNutt, MD, is a professor of Medicine at Rush Medical College in Chicago.

10 replies »

  1. Great post.
    I am reminded of Pascal’s remark: “All of man’s troubles stem from his inability to sit quietly in a room alone . . . ”
    Our medical schools teach doctors: “Don’t just sit there, Do something.” But often, patients would be better off if a doctor just sat there–listened, observed and talked, honestly and humbly, about the incertainties of medicine. . . I don’t think those ambiguities will ever be vanquished. After all, we have only the human mind with which to study the human
    mind/body. To further complicate things, each of those bodies/minds is unique. At the very best we’re looking at a stand-off.
    We want to believe that doing something will help. As Hadler puts it: “It is human nature to hope that there is a demonstrable difference in outcome between those treated with an active intervention and the referent group.” But this can lead us to exaggerate a small effect.
    Meanwhile, “So many RCTs are powered for small effects.” Who- ever is running the trial wants a positive outcome. I you aim for a small effect, you are more likely to get a positive result. The profit motive plays a major role here: There is much a stake and much pressure to eke a small effect out of the morass of data, so much pressure that industry trials are generally more likely to do so than government sponsored trials on the same agents.”
    Karen asks “How can you deny a patient their long shot . . ?”
    If that long shot came with no risk (and no cost) I might agree. But the fact is that every treatment and every test carries some risk. So on one side of the equation you have the long-shot possibility of a benefit; on the other side, the likelihood of side effects or unintended consequences (for example, the test leads to unnecessary treatment.)
    Moreover, in a world of finite resources it makes no sense to squander health care dollars on “interventions that cannot be shown to yield a meaningful benefit in any particular population.” Those dollars could be spent on a child’s education or in some other way where we can be pretty certain of a benefit.
    My favorite line in the post: “Never again should anyone be screened for anything unless the test is accurate, the disease is important, and something can be done about it.” Amen.

  2. Thought-provoking post. The other obstacle to tackle is an efficient means of getting the right information into the hands of all providers at the time they need it to help with informed decision-making. How many physicians feel that they have the suitable level of knowledge of absolute and relative risk, number needed to treat, and other mainstays of discerning whether evidence is solid enough to utilize in daily practice? All due respect to the medical profession, but numeracy is an incompletely understood phenomenon that merits attention in the context of this discussion.

  3. Every older large city probably has a building that was the Medical Arts building. It was called that for a reason. The art of medicine is still powerful and underestimated for its value and effectiveness; maybe not for the ultrasophisticates on this blog, but for normal happy people it goes a long way.

  4. “The reason clinical medicine is not a science but a philosophy informed by science is that no two people are the same and no two patients present with identical dilemmas or value solutions to the same degree.”
    There are two parts to this statement and only one has to do with philosophy. The fact that no two people are physically the same is a temporary limitation for medicine. In this respect, medicine is still a science, albeit an immature science. I believe the day when science can address this variability in subjects is not too far off. Unfortunately, until then medicine is forced to rely on statistical methods with all the inherent inaccuracy and irrelevance to a particular subject.
    As to the value system of each individual, that is pure philosophy. However, medicine should not assume responsibility for an individual decision. Medicine need only present scientifically accurate, and deterministic individualized alternatives. It is up to the individual subject to make the philosophical decision based on undefinable and unmeasurable value systems. To be (in pain), or not to be…..
    In the meantime I do agree that we must evaluate exactly how we employ the rather low resolution “evidence” that we can currently collect.

  5. “If an intervention can not be shown to offer a meaningful benefit in any particular population, there is no reason to assume that it will benefit the patient and its prescription is unconscionable….”
    Unconscionable is a very strong stance for an industry whose complexities and lack of uniform standards are outlined so well herein. How can a prescription be unconscionable when the foundation from which it was derived is inconclusive. The basis for your stance to not prescribe a particular prescription therefore can be easily flipped.How can you deny a patient their long shot based on a statistic from which you cannot identify exactly which CRT they fit into and which attributes within that CRT carry the most weight in regards to ensuring the expected outcome.
    Physicians that move forward will absolutely not be mere hand holders but the sought after physicians will be broad independent thinkers who use statistics to gauge the general area they should be in not determine concrete boundaries. Yes, those physicians that stand out will be brilliant enough to use the outcome of control groups and then apply those results to the possibilities for their specific patients, keeping in mind the individual characteristics the patient may have unique from the set that most likely represents them. Life is precious. Every patient deserves personalized possibilities not just limiting probabilities.

  6. Not clear what you point is here but it seems to boil down to the fear of becoming obsolete. I don’t think doctors are in any danger of becoming obsolete but attitudes such as these should be:
    Yes, medicine is complex so we shouldn’t even try to figure out what works best. Just trust the good old country doctor with his “bias of experience” and limited decision making ability to muddle through to a hopefully correct diagnosis and appropriate treatment.
    RCTs to determine best practice are complicated and may not work so let’s not even try.
    “The reason clinical medicine is not a science but a philosophy informed by science…” Yes, I would rather be treated by a philosopher than a scientist.
    (Meanwhile, most doctors fail even simple measures of practice quality such as eye exams in diabetics and the measurement and control of hypertension.)