Thinking ‘oat’ of the box: Technology to resolve the ‘Goldilocks Data Dilemma’

Marielle Gross
Robert Miller


This piece is part of the series “The Health Data Goldilocks Dilemma: Sharing? Privacy? Both?” which explores whether it’s possible to advance interoperability while maintaining privacy. Check out other pieces in the series here.

The problem with porridge

Today, we regularly hear stories of research teams using artificial intelligence to detect and diagnose diseases earlier with more accuracy and speed than a human would have ever dreamed of. Increasingly, we are called to contribute to these efforts by sharing our data with the teams crafting these algorithms, sometimes by healthcare organizations relying on altruistic motivations. A crop of startups have even appeared to let you monetize your data to that end. But given the sensitivity of your health data, you might be skeptical of this—doubly so when you take into account tech’s privacy track record. We have begun to recognize the flaws in our current privacy-protecting paradigm which relies on thin notions of “notice and consent” that inappropriately places the responsibility data stewardship on individuals who remain extremely limited in their ability to exercise meaningful control over their own data.

Emblematic of a broader trend, the “Health Data Goldilocks Dilemma” series calls attention to the tension and necessary tradeoffs between privacy and the goals of our modern healthcare technology systems. Not sharing our data at all would be “too cold,” but sharing freely would be “too hot.” We have been looking for policies “just right” to strike the balance between protecting individuals’ rights and interests while making it easier to learn from data to advance the rights and interests of society at large. 

What if there was a way for you to allow others to learn from your data without compromising your privacy?

To date, a major strategy for striking this balance has involved the practice of sharing and learning from deidentified data—by virtue of the belief that individuals’ only risks from sharing their data are a direct consequence of that data’s ability to identify them. However, artificial intelligence is rendering genuine deidentification obsolete, and we are increasingly recognizing a problematic lack of accountability to individuals whose deidentified data is being used for learning across various academic and commercial settings. In its present form, deidentification is little more than a sleight of hand to make us feel more comfortable about the unrestricted use of our data without truly protecting our interests. More of a wolf in sheep’s clothing, deidentification is not solving the Goldilocks dilemma.

Tech to the rescue!

Fortunately, there are a handful of exciting new technologies that may let us escape the Goldilocks Dilemma entirely by enabling us to gain the benefits of our collective data without giving up our privacy. This sounds too good to be true, so let me explain the three most revolutionary ones: zero knowledge proofs, federated learning, and blockchain technology.

  1. Zero Knowledge Proofs (ZKP)

Zero knowledge proofs use cutting edge mathematics to allow one party (the “prover”) to prove the validity of a statement to another party (the “verifier”) without disclosing the underlying data about their statement. Put another way, zero knowledge proofs let us prove things about our data without giving up our privacy. This could be an extremely valuable strategy in research since we could learn, for example, which treatments worked best for which people without needing to know which people received which treatments or what their individual outcomes were. Zero knowledge proofs are already being used in healthcare today—pharmaceutical manufacturers in the MediLedger project are deploying them to keep our drug supply chains both private and secure. 

  • Federated Learning

Another privacy enabling innovation is federated learning, which enables a network of computers to collaboratively train one algorithm while keeping their data on their devices. Instead of sending their data to a central computer to train an algorithm, federated learning sends the algorithm to the data, trains it on data locally, and only shares the updated algorithm with other parties. By decoupling the training of algorithms from the need to centralize data, federated learning limits the exposure of an individual’s data to privacy risks. With federated learning, several of the world’s largest drug makers, usually fierce competitors, are collaborating in the MELLODDY project to advance drug discovery. Federated learning lets these companies collectively train a single shared algorithm on their highly proprietary data without compromising their privacy to their competitors. Collectively these companies benefit as they are effectively creating the world’s largest distributed database of molecular data, which they hope to use to find new cures and treatments, a process that promises to benefit us all.

  • Blockchain

Blockchain technology also has a critical role to play in creating a secure network for data sharing. The much hyped “blockchain” stems from its first implementation in Bitcoin but has much more broad applicability. Blockchains combine cryptography and game theory such that a network of computers reach consensus on a single state, you can think of them as analogous to a network of computers joining together to create one giant virtual computer. This virtual computer maintains a shared ledger of “the truth,” a sort of database the contents of which are continuously verified by all the computers in the network, and runs autonomous programs called “smart contracts.” These aspects of blockchains provides uniquely strong assurances of trust in data security and use; they execute the rules of the network consistently and objectively, and the whole process is transparent and universally auditable on the shared ledger. When applied to health data these properties could empower individuals with an unprecedented ability to supervise and control the use of their own data, and a thriving market of startups have emerged for exactly this use case.

The way forward

The cumulative significance of these paradigm-shifting technologies is their potential to eliminate the Goldilocks Dilemma between privacy and learning, individuals and the collective, once and for all. Their emergence forces us to rethink not only our national health IT policy, but our underlying ethical and legal frameworks as well. By creating the potential to build a future in which our treatment of data simultaneously respects individual and collective rights and interests, we believe there is an obligation to further develop and scale the core privacy-protecting functions of these technologies. Our aim is to spread awareness of the possibility of resolving a fundamental 21st century ethical dilemma with a technological solution. In this case, “can” implies “ought”– we must advocate for and demand that these and similar innovations be embedded into the future of our data and our health.

Robert Miller is building privacy solutions at ConsenSys Health and manages a blockchain and healthcare newsletter at https://bert.substack.com/.

Marielle S. Gross, MD, MBE is an OB/GYN and fellow at the Johns Hopkins Berman Institute of Bioethics where her work focuses on application of technology and elimination of bias as means of promoting evidence-basis, equity and efficiency in women’s healthcare (@GYNOBioethicist). 

3 replies »

  1. Still, I do not follow your logic for how Ethereum will contribute to the solution. I’m happy to believe that it “could” contribute to the solution but I don’t think that discussion is happening anywhere in the open. ConsenSys , given its resources and the role it claims for the future, can do better than that.

  2. Thanks for your reply! In our opinion, the fundamental principles underlying ZKP (using cryptography to allow learning from data without revealing the underlying content) and federated learning (store data locally, train algorithms “in house,” and export only new-and-improved algorithms) are what give them the potential to resolve a major patient-facing dilemma of health data use: that you must share your data, giving up your privacy and control in the process, in order for society to benefit from learning from your data.

    However, as you note, these technologies alone are not sufficient to protect patients’ rights and interests. This is precisely where blockchain, and re-imagining of legal and ethical frameworks for data treatment, comes in.

    Blockchain technology, in principle, has the potential to redefine the standard for data security, but also how we structure our digital services. Blockchains enable shared infrastructure where no single party has unilateral control, and instead it is the community that governs, and thus “owns,” the network. Similarly, in a federated learning network, algorithms are collaboratively trained and can also be thought of as shared resources underpinned by their communities. These stand in striking contrast to the closed infrastructures that exist today. The point is not to use these technologies as a way to keep trade secrets, but instead to enable collaboration where it previously wasn’t possible for competitive, ethical, or privacy reasons. The auditability of a shared ledger and the “trustlessness” of smart contract architecture can help promote honest collaboration in which disparate parties work together to create proprietary solutions which leverage individual data without compromising privacy and ensure equitable distribution of value to contributors, including future patients.

    Meanwhile, new ethical and legal conceptualizations are needed. We have proposed elsewhere (https://www.researchgate.net/publication/333951332_The_immortal_life_of_us_Health_data_dignity_and_labor and https://www.tandfonline.com/doi/abs/10.1080/15265161.2019.1630500) that health data exemplifies the “data as labor” model, proposed by Jaron Lanier in 2013 (https://www.goodreads.com/book/show/15802693-who-owns-the-future) and expanded upon considerably by Posner and Weyl in 2018 (https://press.princeton.edu/titles/11222.html). Viewing and treating health data as labor, in conjunction with widespread implementation of privacy-preserving technology, re-asserts rather than distracts from the moral obligations we have to individuals whose data we use. Individuals will be able to benefit from their health data contributions to research, whether via recognition for helping others, improved healthcare for themselves, or appropriate financial compensation. We just need to make sure that the smart contracts we embrace are also ethically sound.

    You correctly note that public blockchains have not yet hit their stride for protecting health data privacy. However, it is their unique capacities, combined with the parallel innovations of ZKP and federated learning, likely reinforced by an appropriate blend of on-and-off-chain features, which generates the potential to completely disrupt the landscape, challenging our basic assumptions about the nature of data and learning that may no longer be apt in the age of artificial intelligence, IoT, etc. Medicine, unlike math and physics, is both an art and a science—we envision the path forward as one which syncs cutting-edge data science up with privacy and other central humanitarian values.

  3. As CTO of the non-profit Patient Privacy Rights foundation, I feel qualified to take issue with privacy premise of this post. ZKPs and federated learning are valuable additions to our tech tools quiver but they do not solve the major issue with health data use: Turning open medicine into trade secrets that will add to the cost for future patients and make a mockery of medical science.

    Medicine, like math and physics, is a science and, like other sciences, requires open teaching, open improvements, and limited corporate regulation. The use of “privacy preserving” tech as a distraction to informed consent about the economic impact of trade-secret algorithms is fraught.

    I say this with all due respect to ConsenSys who’s excellent blockchain tech we have used to secure the physician-patient relationship for years. However, I am not aware of any applications of public blockchains, like Ethereum, that have a role in patient privacy. Yes, there are potential blockchain-based enhancements to security, but privacy is different. I would live to know more about ConsenSys Health.