Everyone, including this blog writer, has been touting the virtues of the vast troves of data already or soon to be available in the electronic health record (EHR), which will usher in the learning healthcare system [1, 2]. There is sometimes unbridled enthusiasm that the data captured in clinical systems, perhaps combined with research data such as gene sequencing, will effortlessly provide us knowledge of what works in healthcare and how new treatments can be developed [3, 4]. The data is unstructured? No problem, just apply natural language processing [5].
I honestly share in this enthusiasm, but I also realize that it needs to be tempered, or at least given a dose of reality. In particular, we must remember that our great data analytics and algorithms will only get us so far. If we have poor underlying data, the analyses may end up misleading us. We must be careful for problems of data incompleteness and incorrectness.
There are all sorts of reasons for inadequate data in EHR systems. Probably the main one is that those who enter data, i.e., physicians and other clinicians, are usually doing so for reasons other than data analysis. I have often said that clinical documentation can be what stands between a busy clinician and going home for dinner, i.e., he or she has to finish charting before ending the work day.
I also know of many clinicians whose enthusiasm for entering correct and complete data is tempered by their view of the entry of it as a data blackhole. That is, they enter data in but never derive out its benefits. I like to think that most clinicians would relish the opportunity to look at aggregate views of their patients in their practices and/or be able to identify patients who are outliers in one measure or another. Yet a common complaint I hear from clinicians is that data capture priorities are more driven by the hospital or clinic trying to maximize their reimbursement than to aid clinicians in providing better patient care.