Maybe it is just the shock of being post Labor Day and realizing that summer is fading into the rear view mirror or maybe it was something I ate for breakfast that spurred new hope. But I think that this is the year that the patient centric approach to data in life sciences finally takes off. And along with that launch will come the massive rapid migration to cloud and data lake architectures for pharma data.
Really? Why now you may ask?
Yeah – that’s right. Every group I have been talking to is worried that they are sitting atop a jigsaw puzzle of siloed data resources that can’t be assembled fast enough to meet the needs of business and scientific users. Organizations are thinking that they can’t answer their questions about why drugs work in some patients and not others if they can’t link phenotype and genotype data. Groups can’t look across clinical trials. They can’t look beyond and between clinical trials and EMR data. Progressive safety groups are considering using automation and cognitive computing to lower costs in processing events so they can then look in parallel to expanding sensing new signals into 10X current volumes of data within large real world data sets.
So life sciences companies that can are likely going to take action to consolidate towards something integrated, agile, and at scale. They are being dragged forward into the cloud by the need to find new ways to move genomics from in house systems into usable storage and retrieval either through SaaS models or in moving the workload from aging High Performance Computing cluster pipelines into elastic computing. The ‘appliance’ leases for on premise MPP database appliances are coming up for renewal but last year’s model servers are already showing their age to force a fundamental rethinking of where data will live and be combined.
While many Data Lakes have yet to be filled with data or exist only as many distributed data puddles and ponds the data is flowing. Groups are scrambling to establish platform approaches to data catalogs that can reach down into those Data Lakes to enable data exploration. These catalogs aren’t going to just look into a single area like Real World Evidence but instead a single underlying catalog across functions with storefronts for commercial, discovery, clinical, translational medicine, and safety. This mega-catalog will be skinned with views for each functional area to provide a relevant perspective.
Machine learning is popping-up into the dialogs on data curation, data catalog maintenance, and robotic processes. We can use tools now to learn how humans clean-up messy data sets that were collected without thinking of standardization. The hope from these tools is to transform manual curation into a training activity for augmented curation that amplifies the work of the curator to focus on the hard decisions.
It is a very exciting first couple of weeks post labor day and as the year rounds out I think it is inevitable that the perfect storm is gathering that drags Data First and Cloud First platforms up to the top of the strategic heap. More to come I am sure.
Dan Housman is chief technology officer at ConvergeHealth by Deloitte