“Universal Health Care”, “Single Payer”, “National Health Insurance”, “Socialized Medicine” are all semiotics symbolizing the subjugation of physician and of patient autonomy to government control for the sake of the common good. This is not sophistry. Max Weber was a Prussian political philosopher who laid the foundation for modern sociology with such books as The Theory of Social and Economic Organization (1920, English translation 1947) in which he proclaimed, “Bureaucratic administration means fundamentally the exercise of control on the basis of knowledge. This is the feature of it which makes it specifically rational.” (p. 339).
However, Weber knew that the goal of a rational bureaucracy was more often elusive than realized, if it is ever realized for long. As Karl Marx observed in a mid-19th C critique of Hegelian political philosophy, “The bureaucracy takes itself to be the ultimate purpose of the state.” That observation is mirrored in a televised speech delivered by Ronald Reagan on October 27, 1964, “No government ever voluntarily reduces itself in size. Government programs, once launched, never disappear. Actually, a government bureau is the nearest thing to eternal life we’ll ever see on this earth!”
The upshot is that society has an inherent love-hate relationship with bureaucracy. As I discuss in Citizen Patient, till the end of the 20th Century American medicine contained and controlled its own bureaucracy and was willing to downplay such short-comings as inequities in the delivery and quality of care. Even the legislating of a Medicare bureaucracy had little impact on the medical hegemony; the legislation delegated clinical indications and fees to medicine’s own bureaucracy and much else that is costly to the marketplace. American medicine remained secure in its autonomy. Meanwhile, elsewhere, in nearly all similarly “advanced” countries, the will to tackle inequities in the distribution and quality of care and in cost-effectiveness had come to predominate by early in the 20th C. These countries sought a solution in the evolution of governmental bureaucracies. Several of these bureaucracies, these national health insurance schemes, remain examples of Weber’s rationality. Several of the national health insurance schemes, such as in the United Kingdom and France, are fraying.
The blog and comments point, we think, to a confusing set of principles being considered, perhaps, out of context?
Those comments range from: ACOs will lead to better figuring out what is best (impossible) – to mismatched information regarding a specific clinical case (reasonable). What is striking is that we have medical students worrying about costs of care.
Instead, shouldn’t we be teaching them to understand the value of information for decision-making? Shouldn’t we be teaching them the concepts of co-dependent testing leading to all tests being less useful than we think?
Shouldn’t we be teaching students the concepts of decision-analysis, and thresholds, and patient’s being involved in the decisions? Shouldn’t we be teaching that it is better to know than to think we know? Shouldn’t we be doing studies rather than scratching at the “tragedy of the commons” (so many physicians feasting on the grassy fields of a sick patient)?
With breathtaking speed, atrial fibrillation has gone from “Huh?” to parlance.
“A-fib”, a common cardiac cause of palpitations, is now in the front ranks of evils lurking to smite our well-being. There is no mystery to this transformation. In 2013, the Food and Drug Administration licensed three new drugs to prevent a stroke, the fearful complication of A-fib: apixaban (Eliquis), rivaroxaban (Xarelto) and dibatigatran (Pradaxa).
This unleashed the full might of pharmaceutical marketing: the scientific data for efficacy that convinced the FDA is tortured till it convinced “thought leaders” whose opinions convinced influential journalists. Sales pitches populate print, broadcast and social media. A-fib is now more than a worrisome disease; it is a product line.
Nonetheless, A-fib can be scary for those afflicted. There are lots of choices to make and a lot of confusing, conflictual and counter-intuitive advice to deal with. Troubled by this situation, Mr. X recently sought me out to have a dialogue about his situation.
Mr. X is a 70-year-old business executive who has enjoyed robust good health and is on no prescription drugs. He exercises vigorously and is a consumer of various over the counter treatments purveyed as salutary. Like many “health-wise” Americans, he also takes 80mg of aspirin a day unaware that the benefit is minimal at best and is counterbalanced by a similar magnitude of risk.
A month earlier he had the sudden onset of palpitations, a fluttering in his chest that made him exceedingly anxious and somewhat breathless. He waited an hour before asking his wife to drive him to the local emergency room. By the time he was first seen, another hour passed. The diagnosis of A-fib followed. Another hour passed to find the consulting cardiologists debating whether to convert the A-fib to a normal rhythm by using drugs or an electrical shock.
Before they could decide, Mr X’s heart decided to behave again; he was back in a normal rhythm. It was a frightening experience for Mr. and Mrs. X. He left the ED shaken and shaky.
He also left with a follow-up appointment with a cardiologist who specialized in rhythm disorders and a prescription for a drug that slowed the conduction of electrical impulses initiating in one heart chamber, the right atrium, and traversing the heart. The normal “pacemaker” is a specialized cluster of muscle cells in the right atrium that discharges at regular intervals, initiating a current that causes the heart to contract in the synchronized fashion termed Normal Sinus Rhythm.
In A-fib, for reasons that are poorly understood, multiple pacemakers form in the right atrium leading to chaotic discharging and circular currents in the right atrium. How many of these impulses manage to exit the atrium to traverse the heart depends on the capacity of the conducting tissues; most just stay confined to the atrium causing it to quiver ineffectively.
Any confusion over the recent news of cholesterol guidelines in the U.S. is perfectly understandable. On the one hand, the guidelines suggest that nearly half the population should use statins to stave off heart attacks and strokes. On the other, use of the drugs is not with potential side effects and, to many, will offer no substantive benefits. The controversy highlights a problem mired in an outdated way of thinking about health care and the doctor-patient relationship.
Guidelines came about after generations of physicians wanted to bring something more than “opinion and experience” to the patient’s bedside. In the late 1960s legislation for the U.S. Food and Drug Administration was amended to call for a demonstration of efficacy and an assessment of benefits and risk as prerequisite to the licensing of any pharmaceutical. Modern clinical science resulted, first slowly and now with an avalanche of clinical trials, each pouring forth outcome data galore.
The Burden of Clinical Data
Clinicians are expected to stay current with this wealth of information. The modern medical curriculum instructs all budding physicians on how to evaluate the quality and the clinical relevance of all such contributions to the body of clinical science. Because some (or perhaps many) find this exercise overwhelming, there are organizations—many academic and some without any discernible relationships with purveyors that could pose a conflict of interest—that attempt to bundle the information in a fashion that might be relevant to particular physicians or physicians in particular specialties. Some of this bundling is quite systematic, some quite helter-skelter.
Occasionally there is a contribution to the literature that offers an unequivocal advantage for a particular patient group. More often, the bundlers are faced with a heterogeneous literature that often demonstrates little, if any, efficacy. Faced with these circumstances, biostatistics has offered up many a method to impute more value to the literature than is apparent at first blush. The result is that all this bundling adds to an enormous and ever-expanding secondary literature.
In a previous blog we demonstrated how guidelines can compromise the care of individual patients when designed to serve the health care system.
Why should treating physicians defer to guideline committees at all, we asked? For decades medical students have been taught to read and understand information from published papers.
We are all trained in critical appraisal and can keep up with the clinically meaningful literature, the literature that is relevant and accurate enough to present to patients. Just because there are nearly 20,000 biomedical journals does not mean that any, let alone all are replete with meaningful information. We can discern the valuable from the not valuable; why do we need others to tell us?
In fact, we even argued in our last post that patients can and should judge the value of medical information. After all, they face the consequences of misinterpreting the likelihoods of benefit and of harm associated with various options for care.
No one remembers the numbers that describe the chances for benefit and harm or ask more questions about the veracity of information than a patient who must choose. The smartest information managers we have ever encountered are our patients; when informed, they quickly determine the validity of the information and apply their personal values to the estimations of the chances for benefit and harm.
Take the example of a patient who recently entered into a therapeutic dialogue with one of us, RAM. This was not the traditional clinical interview. This patient had been diagnosed with prostate cancer and was scheduled for an approach to treatment that the diagnosing physician had offered as the most sensible. However, the decision did not rest easily.
The appointment with RAM was scheduled because the patient sought a dialogue that might offer a chance to reflect on the rationale for the approach he was about to initiate. Two hours into the dialogue, the patient, a 40ish year old African-American man accompanied by his wife, were mulling over the marginal benefits and harms of the options for treating an early stage prostate cancer.
The wife asked how many African-Americans were in the study under discussion. “None”. The husband perked up and then asked, “How many people in the study was my age?” “None”. They then asked if the difference in benefit was a certain, fixed amount? “No, it varies over this range.” – examining the descriptive statistics.
They then asked when the study was started and did it pertain to the present day. “It started over 15 years ago” and the stage of disease of the men in the study was generally more aggressive than in this particular case.
The comments posted on THCB in response to the essay, and those the editors and I have directly received, have been most gratifying. The essay is an exercise in informing medical decisions, which is my creed as a clinician and perspective as a clinical investigator.
I use the recent British federal guideline document as my object lesson. This Guideline examines the science that speaks to the efficacy of the last consensus indication for angioplasty, the setting of an acute ST-elevation myocardial infarction (STEMI). Clinical science has rendered all other indications, by consensus, relative at best. But in the case of STEMI, the British guideline panel supports the consensus and concludes that angioplasty should be “offered” in a timely fashion.
I will not repeat my original essay here since it is only a click away. The exercise I display is how I would take this last consensus statement into a trusting, empathic patient-physician discourse. This is a hypothetical exercise to the extent that little in the way of clear thinking can be expected of a patient in the throes of a STEMI, and not much more of the patient’s caring community.
So all of us, we the people regardless of our credentials, need to consider and value the putative efficacy of angioplasty (with or without stenting) a priori. For me, personally, there is no value to be had rushing me from the “door to the balloon” regardless of the speed. You may not share this value for yourself, but my essay speaks to the upper limits of benefit you are seeking in the race to the putative cure by dissecting and displaying the data upon which the British guideline is based.
There is an informative science, most of which cannot deduce any benefit and that which deduces benefit finds the likelihood too remote for me to consider it worth my attempt. A hundred or more patients with STEMI would have to be rushed to the catheterization lab to perhaps benefit one (and to harm more than one).
In a single generation, the evidentiary basis for the practice of medicine has grown from a dream to a massif. No longer need physicians rely solely on experience and opinion in formulating diagnostic and therapeutic approaches to the care of the patient.
However, for any given clinical challenge, the available science is never flawless, monolithic or comprehensive, nor is it likely to be durable in the face of newer studies.
The international medical community has mounted two approaches to sorting the wheat from the chaff: One targets the doctor in convening committees to formulate guidelines for patient care. The other targets the patient for evaluating options, so-called informed medical decision making. Both approaches are now sizable undertakings clothed in organizational imprimaturs and girded by self-promotion.
But they are largely parallel undertakings with work products that can cause considerable cognitive dissonance on the part of the patient and the physician. In a recent article in the British Medical Journal  the Guideline Development Group convened by the National Institute for Health and Clinical Excellence (NICE) summarized the thinking behind the guidance it was offering regarding the management of STEMI. This is an object lesson in such cognitive dissonance.
The insurance industry had a rocky start a century ago. It was clear that there were untoward events that could befall any of us with catastrophic results, from the incineration of a home to the loss of the ability to maintain gainful employment from injury or death.
Insurance offers a mechanism to share this risk. The stumbling block was the possibility that the insured might burn down their home to collect. Once it was realized that “moral hazard” could be held at bay by investigating for fraud, there was little to hinder the growth of an industry designed to serve our risk adverse proclivities. Almost every adult has some experience valuing the expense of sharing risk for a variety of hazards. After all, automobile insurance is generally compulsory and most of us are familiar with notions of deductibles and riders when it comes to homeowners’ policies. The possibilities are not an abstraction; we can envision the house or its contents damaged, destroyed, or stolen leaving us bereft. What would reducing that prospect be worth to us? As is true for many value-based decisions, the answer brings a mix of reason and intuition (1)that can produce surprising outcomes (2).
Health insurance is even more complex, and has always been so. The industrial revolution saw the development of “Friendly Societies” in Britain and the Prussian “Krankenkassen”. These were trade-based institutions that allowed advantaged workers to purchase insurance to provide “sick pay” but there was little else. The sea change was the Prussian “welfare monarchy” (3),an extensive insurance scheme that encompassed universal health care and a complex approach to disability insurance (4). Modifications of the Prussian scheme spread across the industrial world. It made landfall in the United States in time for the presidential election of 1912. Only one component took root in America: Workers’ Compensation Insurance but not as a national insurance scheme. It fell to the each state to regulate an insurance scheme to compensate injured workers for lost income and medical expenses.
This set the stage for state-based regulation of employer-sponsored private health insurance schemes going forward. But forward momentum appears anything but swift or linear in a country that trusted physicians to charge “commensurate with the services rendered and the patient’s ability to pay” (AMA Code of Medical Ethics, 1957.) Health Insurance as both an industry and a product has become a frustrating web of inefficiency and confusion.
We entered the 21st century awash in “evidence” and determined to anchor the practice of medicine on the evidentiary basis for benefit. There is the sense of triumph; in one generation we had displaced the dominance of theory, conviction and hubris at the bedside. The task now is to make certain that evidence serves no agenda other than the best interests of the patient.
Evidence-based medicine is the conscientious and judicious use of current best evidence from clinical care research in the management of individual patients”. [1,2]
But, what does “judicious” mean? What does “current best” mean? If the evidence is tenuous, should it hold sway because it is currently the best we have? Or should we consider the result “negative” pending a more robust demonstration of benefit? Ambiguity is intolerable when defining evidence because of the propensity of people to decide to do something rather than nothing.  Can we and our patients make “informed” medical decisions on thin evidentiary ice? How thin? Does tenuous evidence mean that no one is benefited or that the occasional individual may be benefited or that many may be benefited but only a very little bit?Continue reading…
Last summer President Obama signed the American Recovery and Reinvestment Act into law. Tucked into the legislation was $1.1 billion to support comparative effectiveness research (CER). The legislation charged the Institute of Medicine with defining CER. Its Committee on Comparative Effectiveness Research Prioritization rapidly came up with,
…the generation and synthesis of evidence that compares the benefits and harms of alternative methods to prevent, diagnose, treat and monitor a clinical condition, or to improve the delivery of care. The purpose of CER is to assist consumers, clinicians, purchasers, and policy makers to make informed decisions that will improve health care at both the individual and population levels.
The Committee then elicited over 2500 opinions from 1500 stakeholders and produced a list of the 100 highest-ranked topics for CER (www.iom.edu/cerpriorities). Proposals to undertake CER are pouring forth from investigators across the land. There is no doubt that an enormous amount of data will be generated by 2015. But there is every reason to doubt whether many inferences can be teased out of these data that will actually advantage patients, consumers, or the health of the nation.
I am no Luddite. For me “evidence based medicine” is not a shibboleth; it’s an axiom. Furthermore, having trained as a physical biochemist, I am comfortable with the most rigorous of the quantitative sciences let alone biostatistics. However, you can’t compare treatments for effectiveness unless you are quite certain that one of the comparators is truly efficacious. There must be a group of patients for whom one treatment has unequivocal and important efficacy. Otherwise, the comparison might discern differences in relative ineffectiveness.
The academic epidemiologists who spearheaded the CER agenda are aware of the analytic challenges but are convinced these can be overcome. I would argue that CER can never succeed as the primary mechanism to assure the provision of rational health care. It has a role as a secondary mechanism, a surveillance method to fine tune the provision of rational health care, once such is established.
The difference between efficacy and effectiveness
My assertion may seem counter-intuitive. After all, we hear every day about pharmaceuticals that are licensed by the FDA because of a science that supports the assertion of benefit. In epidemiology-speak, the science that the FDA reviews does not speak to the effectiveness of the drug, but to its efficacy. The science of efficacy tests the hypothesis that a particular drug or other intervention works in a particular group of similar patients. CER asks whether an intervention works better than other interventions in practice where the patients and the doctors are heterogeneous. The rational for the CER movement is the perceived limitations of efficacy research. I argue that the limitations of efficacy research are much more readily overcome than the limitations on CER.
The gold standard of efficacy research is the randomized controlled trial (RCT). In a RCT, patients with a particular disease are randomly assigned to receive either a study intervention or a comparator (often a placebo). After a pre-determined interval, the previously defined clinical outcome is compared in the active and control limbs of the trial. If there is no difference, one can argue that the intervention offers no demonstrable clinical benefit to patients such as those in the study. If there is a difference, the contrary argument is tenable.
This elegant approach to establishing clinical utility has its roots in antiquity, at least as far back as Avicenna. The modern era commences after World War II and escalates dramatically after 1962 when the Kefauver-Harris Amendment to the laws regulating the US Food and Drug Administration mandated demonstration of efficacy before pharmaceuticals could be licensed. Modern biostatistics has probed every nuance of the RCT paradigm. The result is a highly sophisticated understanding of the limitations of the RCT, an understanding that has fueled the call for CER:
The more homogeneous the study population, the more likely any efficacy will be demonstrated and the more compelling any assertion as to its lacking. However, the homogeneity compromises the ability to assume the result generalizes to different kinds of patients.
Many important clinical outcomes are either infrequent or occur late in the course of disease. It is difficult to maintain and fund RCTs that require years or decades before one can hope to see a difference between the active and control limbs. The compromise is to study “surrogate” outcomes, measures that in theory reflect the disease process, but are not themselves clinically important outcomes. Thus we have thousands of studies of blood pressure, cholesterol, blood sugar, PSA and the like but comparatively few studies that use heart attacks, death from prostate cancer, or other untoward clinical outcomes as the end-point.
How big a difference between the active and control limbs is important? Biostatistics has dictated that we should pay attention to any difference that is unlikely to happen by chance too often. “Too often” traditionally is considered no more than 5% of the time, but that’s a matter risk-taking philosophy. What are we to make of a difference that is clinically very small, even if it is unlikely to happen by chance more than 5% of the time? Is it possible that the small effect will be important, perhaps less small, when the constraints of homogeneity are removed in practice? In practice, drugs licensed for one disease are even tried for other “off label” indications where effectiveness may emerge.
The corollary limitation relates to the negative trial. If there is no demonstrable difference, does that mean that there is no effect? Or could the effect have been too small to detect because of the duration of the trial or the size or homogeneity of the population studied? Even a very small effect, advantaging only the occasional patient, can translate into many benefited people when tens of thousands are treated.
Devices and surgical procedures are used practice; rigorous testing as to efficacy is not a statutory requirement. Maybe in the “real world” a treatment that was never studied or studied in a limited fashion turns out to really advantage patients in practice, or advantage some patients – or not.
CER to the rescue?
The methodology employed for CER is not the RCT. CER is an exercise in “observational research”. CER examines real world data sets to deduce benefit or lack thereof. This entails the development of large-scale, clinical and administrative networks to provide the observational data. Then biostatistics must come to grips with issues that make defining the heterogeneity of populations recruited into RCTs seem trivial. In the RCT, the volunteers can be examined and questioned individually and in detail and the criteria for admission into the trial defined a priori. Nothing about the validity of diagnosis, clinical course, interventions, coincident diseases, personal characteristics or outcomes can be assumed in observational data sets. There must be efforts at validating all such crucial variables. No matter how compulsively this is done, CER demands judgments about the importance of each of these variables. It is argued that some of these limitations are overcome because CER is not attempting to ask whether a particular intervention works in practice, but whether it works better than another option also in practice. It is even suggested that encouraging or introducing particular interventions or practice styles into some practice communities and not others would facilitate CER. Perhaps.
The object lesson of interventional cardiology
Interventional cardiology for coronary artery disease is the engine of the American “health care” enterprise. Angioplasties, stents of various kinds, and coronary artery bypass grafting (CABG) have attained “entitlement” status. There are thousands of RCTs comparing one with another, generally leading to much ado about very small differences, usually in surrogate measures such as costliness or patency of the stent. But there are very few RCTs comparing the invasive intervention with non-invasive best medical care of the day: 3 for CABG and 4 for angioplasty with or without stenting. In these large and largely elegant RCTs, the likelihood of death or a heart attack if treated invasively is no different from the likelihood if treated medically. Whether anyone might be spared some degree of chest pain by submitting to an invasive treatment is arguable since the results are neither compelling nor consistent. Yet, interventional cardiology remains the engine of the American “health care” enterprise. It carries on despite the RCTs because its advocates launch such arguments as “We do it differently” or “The RCTs were keenly focused on particular populations of patients and we reserve these interventions for others we deem appropriate.” These arguments walk a fine line between hubris and quackery.
So many invasive procedures are done to the coronary arteries of the young and the elderly that interventional cardiology has long lent itself to CER. We know from observational studies that that it does not seem to matter much if the heart attack patient has an invasive intervention quickly or it is delayed or not at all. We know from observational studies, and even trials rewarding some but not all hospitals for getting doctors to adhere to the “guidelines” for managing heart disease, that adherence does not make much of a difference. Do the results of this CER mean that we need to further improve the efficiency and quality of the performance of invasive treatments as many would argue? Or can we hope that more exacting CER can parse out some meaningful indication from large data sets, some compelling inference that only particular people with particular conditions are advantaged and therefore are the only candidates for interventional cardiology?
Or are we using the promise of CER to postpone calling a halt to the ineffective and inefficacious engine of American “health care”. The available science is consistent with the argument that interventional cardiology is not contributing to the health of the patient. I would argue that interventional cardiology should be halted until someone can demonstrate substantial efficacy and a meaningful benefit-to-risk ratio in some subset. Then CER can ask whether the benefit demonstrated in the efficacy trial translates to benefit in common practice.
Efficacy research is the horse; CER is the cart
Interventional cardiology for coronary artery disease is but one of many object lessons. There is much in common practice that has never been shown to be efficacious in any subset of patients. Some practices take up residence in the common sense despite having never been studied. Some practices, like interventional cardiology, persist because intellectual and fiscal interests are vested in the entrenchment despite the results of efficacy trials. CER can not inform efficacy, and CER can not inform effectiveness unless there is an example of efficacious therapy against which practices are compared. Otherwise, CER can be comparing degrees of ineffectiveness.
The way forward is to design efficacy trials that are more efficient in providing gold standards for comparison and as efficient in defining false starts that are not allowed into common practice until the approach is superseded by one of demonstrated efficacy. This is not all that difficult to do. Let’s return to the limitations of efficacy trials listed above:
Homogeneity of study populations is not a limitation for the quest for a meaningful standard of efficacy. At least we will know the intervention is good for someone.
Surrogate measures are useful to bolster the hypothesis that something might work. They have a dismal track record for testing the hypothesis that something does work. Clinically important outcomes must be invoked for such a test. If it is not feasible because the clinical outcome is too slow to develop or too infrequent, compromise is not an option. The intervention can not be studied at all, or it can not be studied until an appropriate subpopulation can be identified, or one must bite the bullet and undertake a lengthy RCT.
Surrogate outcomes are not the only way that RCT results can lead to spurious clinical assumptions. “Composite outcomes” are even worse. RCTs in cardiology are notorious for an outcome such as “death from heart disease or heart attack or the need for another procedure.” When these studies are closely read, one learns that any difference detected is almost exclusively in “the need for another procedure” which is a highly subjective and interactive outcome that can speak to preconceptions on the part of the doctor or the patient rather than the efficacy of the intervention.
Modern epidemiology is so wedded to the notion of statistical significance that concern about the statistical significance of “What?” is overwhelmed. “What?” is the clinical significance? Just because the difference observed between the active and control limbs of the RCT wouldn’t have happened by chance too often does not mean that the difference is clinically important even in the occasional patient. I’ll illustrate this by touching the Third Rail that the debate over the clinical utility of mammography has become. Malmö is a city in Sweden where women were invited to volunteer for a RCT; half would be offered routine screening mammography for a decade and the other half encouraged see their physicians whenever they had concern about the health of their breasts. That’s the difference between screening and diagnostic protocols; in screening one is agreeing to a test simply as a matter of course, in diagnostics one agrees to the testing in response to a clinical complaint. Back to the Malmö RCT. Over 40,000 women between age 40 and 60 volunteered for the RCT. Invasive cancer was detected in statistically significantly more women who were in the screened group than in the diagnostic group. Impressed? How about if I told you that 7 of 2000 women screened for a year were found to have invasive breast cancer and 5 of 2000 women in the diagnostic group for a year were found to have invasive breast cancer. Was all the screening worth this difference in absolute number of additional cancers detected? I could have told you that screening detected 40% more cancers but you won’t be swayed by the relative increase now that you know the absolute increase was 0.1%, will you? Would you consider the screening valuable if I told you that for every woman whose invasive breast cancer was treated so that they lived long enough to die from something else at a ripe old age, another two were treated unnecessarily since they died from something else before their breast cancer could be their reaper? How about all the false positive mammograms and false positive biopsies? There is a debate about mammography because it is a very marginal test that clearly is not doing as well as the common sense assumes.
How small an effect can we detect in a RCT? Theoretically we can detect a very small effect. Theoretically we can detect an effect even smaller than the Malmö result. In order to do so, you need to randomize a large, homogeneous population whose size is determined by the level of statistical significance you choose and the nature of the health effect you seek. Death is the least equivocal outcome, for example. The quest for the small effect is the mantra of modern epidemiology. However, I consider such “small effectology” a sophism. No human population is homogeneous; we differ one from another in obvious, often measurable ways but also in less obvious, immeasurable ways. When we randomize individuals in any homogeneous population into a treatment group and a control group we assume that all the immeasurable differences randomize 50:50 or if not the randomization errors counterbalance. The smaller effect we are seeking, the more likely we are to be fooled by randomization errors that account for the difference rather than the treatment. That’s why so many small effects that emerge from RCTs do not reproduce.
Evidence Based Medicine can be more than a Shibboleth
The philosophical challenge in the design of efficacy trials relates to the notion of “clinically significant.” How high should we set the bar for the absolute difference in outcome between the treated and control groups in the RCT to be considered compelling? One way to get one’s mind around this question is to convert the absolute difference into a more intuitively appealing measure, the Number Needed to Treat (NNT). If the outcome is readily measured and unequivocal, such as death or stroke or heart attack, I would find the intervention valuable if I had to treat 20 patients to spare 1. Few students of efficacy would be persuaded if we had to treat more than 50 to spare 1. Between 20 and 50 delineates the communitarian ethic; smaller effects are ephemeral. For an outcome that is more difficult to measure than death or the like, an outcome that relates to symptoms or quality of life, I would argue for a more stringent bar.
If we applied this logic to RCTs, the trials would be far more efficient (in investigator/volunteer time, materiel, and cost) and the results far more reliable. If we applied this logic to RCTs, we would eliminate trials designed only to license agents no better than those already licensed (“me too” trials) and trials designed only for marketing purposes (“seed” trials). If we only licensed clinically efficacious interventions going forward, we could turn to CER to understand their effectiveness in practice. If we applied this logic retrospectively, to the trials that have already accumulated, we would soon realize how much of what is common practice is on the thinnest of evidentiary ice, how much has fallen through and how much supports an enterprise that is known to be inefficacious. It would take great transparency and political will to apply this razor retrospectively. We, the people, deserve no less.
Nortin M. Hadler, MD, MACP, FACR, FACOEM (AB Yale University, MD Harvard Medical School) trained at the Massachusetts General Hospital, the National Institutes of Health in Bethesda, and the Clinical Research Centre in London. He joined the faculty of the University of North Carolina in 1973 and was promoted to Professor of Medicine and Microbiology/Immunology in 1985. He serves as Attending Rheumatologist at the University of North Carolina Hospitals.
For 30 years he has been a student of “the illness of work incapacity”; over 200 papers and 12 books bear witness to this interest. He has lectured widely, garnered multiple awards, and served lengthy Visiting Professorships in England, France, Israel and Japan. He has been elected to membership in the American Society for Clinical Investigation and the National Academy of Social Insurance. He is a student of the approach taken by many nations to the challenges of applying disability and compensation insurance schemes to such predicaments as back pain and arm pain in the workplace. He has dissected the fashion in which medicine turns disputative and thereby iatrogenic in the process of disability determination, whether for back or arm pain or a more global illness narrative such as is labeled fibromyalgia. He is widely regarded for his critical assessment of the limitations of certainty regarding medical and surgical management of the regional musculoskeletal disorders. Furthermore, he has applied his critical razor to much that is considered contemporary medicine at its finest.