The great promise of wearables for medicine includes the opportunity for health measurement to participate more naturally in the flow of our lives, and provide a richer and more nuanced assessment of phenotype than that offered by the traditional labs and blood pressure assessments now found in our medical record. Health, as we appreciate, exists outside the four walls of a clinical or hospital, and wearables (as now championed by Apple, Google, and others) would seem to offer an obvious vehicle to mediate our increasingly expansive perspective.
The big data vision here, of course, would be to develop an integrated database that includes genomic data, traditional EMR/clinical data, and wearable data, with the idea that these should provide the basis for more precise understanding of patients and disease, and provide more granular insight into effective interventions. This has been one of the ambitions of the MIT/MGH CATCH program, among others (disclosure: I’m a co-founder).
One of the challenges, however, is trying to understand the quality and value of the wearable data now captured. To this end, it might be useful to consider a evaluation framework that’s been developed for thinking about genomic testing, and which I’ve become increasingly familiar with through my new role at a genetic data management company. (As I’ve previously written, there are many parallels between our efforts to understand the value of genomic data and our efforts to understand the value of digital health data.)
The evaluation framework, called ACCE, seems to have been first published by Brown University researchers James Haddow and Glenn Palomaki in 2004, and focuses on four key components: Analytic validity, Clinical validity, Clinical utility, and Ethical, Legal, and Social Implications (ELSI). The framework continues to inform the way many geneticists think about testing today – for instance, it’s highlighted on the Center for Disease Control’s website (and CDC geneticist Muin Khoury was one of the editors of the book in which the ACCE was first published).
Analytic validity refers to how well does a test measure the parameter it’s supposed to measure; in the context of digital health, this might mean whether a pedometer accurately measures the number of steps, for example, or whether a heart rate monitor accurately and consistently captures the true number of heartbeats.
Clinical validity describes how well a test measures the outcome of interest – for example, if a test suggests you have a sleep problem, how likely is this to be true (essentially, the positive predictive value), and how likely is a negative test likely to be true (the negative predictive value).
Clinical utility, as Haddow and Palomaki write, “defines the risks and benefits associated with a test’s introduction into practice.” In other words, what’s the impact of using a particular assessment – how does it benefit patients, how might it adversely impact them? This may be easiest to think about in the context of consumer genetic tests suggesting you may be at slightly elevated risk for condition A, or slightly reduced risk for condition B: is this information (even if accurate) of any real value?
Ethical, Legal, and Social Implications remind us of the potential unintended consequences of testing or monitoring (e.g. there are obvious concerns about being perpetually monitored, for example).
This framework helps us understand why many healthcare professionals may be reluctant to welcome wearable data into the electronic medical record. Let’s consider an example such as activity monitoring – the kind of information you might get from devices such as a Fitbit, although this example is entirely hypothetical and explicitly does not refer to any specific brand or product.
First, you might reasonably wonder how accurate an activity monitor is – how well does it measure the number of steps, and how consistently? If you take 1000 steps, will the device always record about the same number, or might it be far off?
Second, as a physician, you might ask how well it measures the parameter you’re interested in – say activity. Do high measurements always correspond with high levels of activity, and do low measurements always mean you’ve been idle? (You can think of this as asking how confident should you be that a patient with low activity should be diagnosed as “indolent,” or one with high activity should be diagnosed as “active.”)
Third, what is a healthcare professional supposed to do with this information? Operationally, what do you do with months of activity data?
Fourth, what are the unanticipated consequences of including activity monitoring in the medical record? Can a doctor be sued because a patient’s exercise pattern changed and the physician never acted upon this? Would patients who didn’t want their activity to be constantly monitored be subject to higher insurance rates?
One of the most interesting questions likely to emerge from the discussion of wearable data is how much does data quality matter? One view – which I’ve heard expressed with particular eloquence by MGH clinical investigator Bill Crowley (disclosure: a former colleague and long-time friend) – is that high quality phenotypic data are absolutely essential for clinical care and especially medical discovery. Without obsessive attention to the way data are collected, you just have garbage in, and get garbage out. In this view, the key to success is rigorously training clinical investigators, and using carefully validated measurements.
The other extreme, which Stanford geneticist Atul Butte is perhaps best known for advocating, is what might be called the data volume perspective; collect as much data as you possible can, the reasoning goes, and even if any individual aspect of it is sketchy or unreliable, these issues can be overcome with volume. If you examine enough parameters, interesting relationships are likely to emerge, and the goal is to not let the perfect be the enemy of the good enough. Create a database with all the information you can find, the logic goes, and something will emerge. Explains Butte, via email,
“For me as a data scientist, and with my style of research, I would rather take 10 data sets of ‘mediocre’ quality than 1 data set that someone says is perfect. Those data sets of ‘mediocre’ quality aren’t usually that bad, they are just perceived as being bad…. Data quality is always improving. Next year’s data is going to be better than last years. But we will always find some way to criticize data. So it never makes sense to me to wait for perfection with data. The idea instead is to make the most of the measurements you have in front of you, right now.”
I suspect that the routine use of wearable data by the medical establishment will closely parallel that of genomic data: everyone will agree that it’s interesting, and represents an area that should be followed closely, but relatively few pioneers will actually jump in, and really start collecting data and figuring out how all this works; the return on investment will be hard to define, the uncertainty viewed as too high.
It wouldn’t surprise me if many of the same innovators that are early adopters of genomics (e.g. pursue whole genome sequencing on an ambitious scale) will be also be the earliest adopters of data from wearables, with the idea that the combination of rich genotype plus rich phenotype is likely to be an important source of insight (again, keep in mind that I work at a genomic data company). Within pharma, I’d suspect many of the largest companies (playing not to lose) will pursue lightly-resourced exploratory projects in this area, while companies I’ve called mid-size disruptors are more likely to take a real run at this, as part of a more confident and aggressive strategy of playing to win.
I’m obviously a passionate and long-time believer in the value of collecting and colliding large volumes of data, but I also recognize that this remains largely an unproven proposition, and I can understand why anxious administrators, prudent physicians, cautious corporations, and sensible investigators might prefer to place their bets elsewhere at the moment, deciding it’s still too early to jump in.
The skeptics may be right – but they may also arrive late to an amazing scientific party.