Uncategorized

The Case For Real World Evidence (RWE)

Randomized control trials – RCTs – rose to prominence in the twentieth century as physicians and regulators sought to evaluate rigorously the performance of new medical therapies; by century’s end, RCTs had become, as medical historian Laura Bothwell has noted, “the gold standard of medical knowledge,” occupying the top position of the “methodologic heirarch[y].”

The value of RCTs lies in the random, generally blinded, allocation of patients to treatment or control group, an approach that when properly executed minimizes confounders (based on the presumption that any significant confounder would be randomly allocated as well), and enables researchers to discern the efficacy of the intervention (does it work better – or worse – than controls) and begin to evaluate the safety and side-effects.

The power and value of RCTs can be seen with particular clarity in the case of proposed interventions that made so much intuitive sense (at the time) that it seemed questionable, perhaps even immoral, to conduct a study. Examples include use of a particular antiarrhythmic after heart attacks (seemed sensible, but actually caused harm); and use of bone marrow transplants for metastatic breast cancer (study viewed by many as unethical yet revealed no benefit to a procedure associated with significant morbidity).

In these and many other examples, a well-conducted RCT changed clinical practice by delivering a more robust assessment of an emerging technology than instinct and intuition could provide.

RCTs: Golden But Not Perfect

Yet, as Bothwell has eloquently highlighted, RCTs aren’t perfect. For one, not all interventions lend themselves equally well to this approach. While drug studies generally work well (because it’s relatively easy to provide a consistent intervention in a blinded fashion), this can be more difficult, Bothwell observes, in areas such as surgery and psychotherapy.

Another challenge associated with many RCTs is the lengthy cycle time. It can take so long to conduct a study that by the time the results are reported out, science and practice may have moved on; Bothwell notes that by the time much-anticipated COURAGE study of bare metal stents was published, many practitioners were already enthusing about the next big thing, drug-eluting stents.

In addition, the subjects who enroll in clinical trials – as highlighted in a recent Health Affairs article published by authors from Flatiron, Foundation Medicine, and the FDA – may not be representative of either the larger population or of the patients who are likely to receive the intervention currently under study; groups underrepresented in clinical trials include the elderly, minorities, and those with poor performance status (the most debilitated).

This begins to get at what may be the most significant limitations of clinical trials: the ability to generalize results.   The issue is that clinical trials, by design, are experiments, often high-stakes experiments from perspective of the subjects (most importantly — it’s their health and often their lives at stake!), as well as the sponsors, who often invest considerable time and capital in the trial.   Clinical trial subjects tend to be showered with attention and followed with exceptional care, and study investigators generally do everything in their power to make sure subjects receive their therapy (whether experimental or control) and show up for their follow-up evaluations. Study personnel strive to be extremely responsive to questions and concerns raised by subjects.

But in real practice, YMMV, as they say on the interwebs — your mileage may vary; adherence is less certain, evaluation can be less systematic, and follow-up more sporadic. Conversely, an astute clinician may have figured out a way to make a medicine better, perhaps implementing a helpful tweak based on a new paper or an empiric observation. Thus the performance of a therapy in a clinical trial may rigorously, scientifically benchmark the potential of a new therapy, compared to a control, but not necessarily predict it’s actual real-world performance.

The Challenge Of Assessing Real World Performance

In fact, assessing a product’s real world performance can be surprisingly difficult; in contrast to a clinical trial, which is designed explicitly to follow each patient’s journey and to methodically, conscientiously observe and compulsively track a number of pre-specified parameters, the data available for real world patients is, while perhaps more plentiful, captured in a far less systematic fashion.

The principle vehicle of real world clinical data capture, the electronic health record, was designed to support billing (primarily) as well as the provision of clinical care (eg affording providers access to test results and previous visit notes). Additional contemporary sources of real world data – as nicely summarized in a recent McKinsey review – include administrative/claims data (organized around billable services) and increasingly, patient-generated data (such as from wearables like Fitbits).

Organizing and analyzing data – hmm that seems like just the sort of thing at which today’s tech companies excel, at least outside healthcare. Google’s stated mission is to organize all the world’s information; Amazon leverages sophisticated analytics to optimize the consumer’s purchasing experience. But healthcare, as we all know, can be a troublesome beast. Healthcare data are notoriously fragmented; quality is uneven at best; and the approach to health information privacy doesn’t lend itself to a value system based on asking forgiveness instead of permission.

Even so, tech companies big and small are pouring into the real world evidence (RWE) space; what do they hope to accomplish?

Why Tech Is Embracing RWE

The recent (earlier-cited) Health Affairs paper led by authors at one of the most advanced and successful companies in this space, Flatiron, offers a roadmap of sorts. Using a combination of technology and manual data extraction and classification, Flatiron attempts to generate near-clinical-research grade data from oncology EHR records, supplemented with other data – “most notably, mortality data from national and commercial sources,” according to the authors. (For more on Flatiron, and its recent acquisition by Roche for $2.2B, see here; for discussion of two other oncology data companies, see here.)

In addition to enabling broader patient representation – affording greater visibility into patient outcomes in groups traditionally underrepresented in clinical trials – robust RWE can potentially change the approach to at least some clinical studies by offering the possibility of what Flatiron calls a “contemporary” control arm, and what others like Medidata’s Glen de Vries, who is also keenly interested in this concept, describes as a “synthetic control arm.” The idea is that under some circumstances, it might make sense from a pragmatic perspective and/or from an ethical perspective not to randomize patients to a control arm – for example, if the disease is uniformly fatal, and without known treatment. Under select circumstances, perhaps a clinical trial could be conducted comparing patients receiving a new treatment to the RWE data of patients receiving best available treatment – especially since there are 2017 data from Pfizer and Flatiron showing, at least in the example studied, that their RWE data lines up exceptionally well with recent data from the control arm of an RCT. The implication is that if these RWE could be used in place of a control arm, an RCT could be performed much faster and cheaper – and it might be extremely attractive to participants because everyone would receive the active treatment, and no one would be randomized to the control.

There’s at least one example of this playing out; according to a recent article by Justin Petrone in Nature Biotechnology, Flatiron’s RWE “greased Roche’s path to regulatory approval.” Petrone continues,

The company relied on Flatiron data to expand the label for Alecensa (alectinib), a treatment for people with non-small-cell lung cancer, to 20 countries. Regulators outside the US wanted more information on controls, and it might have taken Roche a year to satisfy those requirements through another route.”

While the potential of RWE is clearly starting to capture the imagination, many physicians caution that interest in RWE shouldn’t occur at the expense of the RCT gold standard. “As a clinician I would not treat without RCT,” tweeted MGH cardiologist Chris Newton-Cheh, noting “Random, blinded allocation is best protection against biased treatment allocation and confounding.” Or, as Farzad Mostashari tweeted, with characteristic charm, “You will never know the unknown confounders that randomization protected you against. It’s like Batman.”

Even so, RWE –- as described so nicely in the McKinsey review — offers an opportunity to “evaluate new treatments when randomization to placebo for clinical trials may be impossible, impractical, or unethical.” The FDA, for its part, notes,

“In some cases, a “traditional” clinical trial may be impractical or excessively challenging to conduct. Ethical issues regarding treatment assignment, and other similar challenges, may present themselves when developing and attempting to execute a high quality clinical trial. Analyses of RWD [real world data], using appropriate methods, may in some cases provide similar information with comparable or even superior characteristics to information collected and analyzed through a traditional clinical trial.”

While robust RWE isn’t likely to displace the RCT, it may lower the threshold for provisionally embracing an early positive RCT result, knowing that the treatment’s real world performance would be reliably and rapidly evaluable.

Robust RWE also affords the opportunity to better understand other aspects of a product’s performance, including it’s cost (versus other treatments), the populations it seems most effective (enabling the sort of work the led to the approval of pembrolizumab (Keytruda) for patients with high microsatellite instability – for more see here and here), as well as an understanding of populations where it isn’t working, which could lead either to additional labeling restrictions from regulators and/or additional prescribing limitations by payors. Pharma companies would be well served by seeking to substratify in advance, rather than discovering rather quickly after launch that robust RWE suggests a more restricted addressable population than the product team had anticipated.

RWE: Less Prestigious Than RCT, But More Useful?

Thus far, we’ve considered robust RWE as a somewhat imperfect alternative to the scientific gold standard, RCT. But if you take a step back, you might ask why this benchmark takes priority over what may be a more relevant standard – how a product performs in the real world. (As I joked on twitter, this reminds me a bit of a comment biochemists used to make to us yeast geneticists when we saw something in cells not recapitulated in their highly reduced system – a phenomenon they termed, tongue in cheek, an “in vivo artifact.”)

Consider this example: if I wanted to develop a new drug that could reverse type two diabetes, I’d need to prove it worked in two robust RCTs, vetted by the FDA, before I could market it to a single patient. Yet, Virta, a behavioral health company (see here), has a supervised low-carb program they report reverses type two diabetes, and have presented data from a non-randomized trial involving self-selected patients. Unquestionably, a much lower standard.

Yet on the other hand, Virta, and tech-enabled service companies like it, are likely reimbursed (at least in part) only if they deliver particular results – only if they reverse diabetes (and thus reduce costs) in a certain number of patients.  Isn’t this, at some level, a higher standard; a pill just has to show it could work to merit reimbursement; Virta actually has to deliver results.

Chris Hogg, a former pharma strategist who is now COO of the digital health company Propeller Health pointed this out on Twitter, noting that service offerings tend to require from payors “proof of use/retention” as well as “proof the solution works in their specific environment.” He adds, “You could flip it to say pharma _only_ needs to show efficacy in highly controlled settings, but services require that and proof of efficacy in real-world settings. Bar might actually be higher for new services.” (Note: tweets quoted in this post lightly edited for clarity).

Health consultant Andrew Matzkin (of Health Advances) then chimed in, “But pharma also has to prove safety to a degree that is just not relevant for most digital health. For digital health, that’s a feature, not a bug, with advantages for patients and for businesses/investors. Flipside is lack of established pathways for payment and distribution.” He added, “And I think that soon, drugs will be held to account for real world efficacy too as RWE becomes better and more widely accepted. Once that happens, the advantages of digital health (with real efficacy data) will be even starker.”

Matzkin continued, “I think the model will be provisional approval and coverage based on less RCT data, followed by RWE monitoring that could result in restricted $ and/or rescinded label indications. So a little different. But real world outcomes will matter. For everything.”

As Mazkin says, it’s hard to imagine that the ability to measure the performance of an intervention in a trusted, near-real-time fashion wouldn’t profoundly disrupt both pharma and healthcare. Moreover, while the opportunity for potentially faster approvals would seem like a real win for pharma companies (and inevitably of concern to industry critics), it’s likely that such a capability – a real world, real time dashboard of exactly how each medicine and treatment is performing in real patients – could also threaten pharma, for all the right reasons.

Drugs that fail to make a difference in the real world would be rapidly surfaced, as would some subtle safety issues. Performance-based drug contracting, which historically was always discussed but seldom implemented due to the challenge of refereeing, could potentially be meaningfully enabled by trusted real-world reporting. Conversely, new medicines that don’t seem meaningfully different in an RCT setting could turn out to be much more effective in a real world setting if they are actually better tolerated and embraced by actual patients. In addition, approaches (digital or not) that impact real world performance would likely be prioritized as well, and optimization could occur continuously, informed by reliable RWE.

In short, at its best, real world evidence provides an opportunity to evaluate medical interventions on what arguably matters most – real world performance; and evidence delivered reliably and in near real time would provide a meaningful incentive to optimize on this foundational measure, potentially bringing patient, provider, payor, and manufacturer into better alignment, while also sharpening what are likely to be real remaining differences, such as determining the value of a particular increase in performance.

A Dynamic Balance

Which brings us to my last question: what is the optimal role of RCTs in a world with quality, trustworthy RWE (not that we’re quite there yet….). The reflexive answer, of course, is RCTs for everything, followed by RWE. Perhaps this is right, though I wonder if there are circumstances when it’s preferable, or at least permissible (perhaps because the risk is low, as Matzkin suggested in the context of digital health interventions) to develop a less robust clinical trial evidence base, and focus on delivering, and optimizing real world performance.

My hope is that, unlike our tribal politics, we approach this question with humility and nuance, sensitive to the idea that the “right” approach might vary by circumstance. Figuring out the right balance here promises to be a difficult and dynamic process, a challenging problem with which we should all be grateful for the opportunity to wrestle.

David Shaywitz is a Senior Partner with Takada Ventures and a Visiting Scientist at Harvard Medical School. 

Categories: Uncategorized

3 replies »

  1. I am reminded that the research for the initial introduction of the Shingles vaccine was based on a RCT for 5 years with only 60% efficacy. I always wondered if a 10 year trial would have shown even less efficacy, in which the effect of the vaccine was only to delay the appearance of shingles. So, delaying an episode of Shingles for 5-10 years, or more, might actually represent a an adverse effect. No correlation data for the recently introduced, new vaccine.

  2. Aren’t you trying to have, ideally, only one variable as a difference between the control arm and the experimental arm? Isn’t the epitome the situation where, say, in a drug RCT, everything about the subjects in the control arm is exactly like everything in the experimental arm except for the new drug that you are adding to the experimental arm? In other words, it would be the most propitious to have everyone in both arms be identical siblings or clones? same age, same diet, same upbringing, same temperature, same sed rate, same environmental challenges and same epigenetic additions to DNA and histone proteins in the nucleosomes. But, of course, this is far fetched and impossible….except perhaps in an in-vitro experimental proteomics laboratory where you have created an artificial environment for your drug testing. Or, perhaps this could be approximated using hybrid laboratory animals-human creatures that you have assembled genetically.

    But, short of this, and using real world experience, aren’t you always doing less exacting science? Aren’t you always compromising and being lazy and fudging!

  3. RWEs would, I guess, actually be superior for things like determining which protocol for diagnosis or treatment is best. RCTs probably better for individual drugs. That said, assuming we can really drag this data out of our EHRs, I think you could make the case for less extensive drug testing prior to release, and use RWEs for follow up. At present we probably spend too much time, IMHO, on pre-release testing and not enough on follow up.

    Steve