Artificial Intelligence

Thinking ‘oat’ of the box: Technology to resolve the ‘Goldilocks Data Dilemma’

Marielle Gross
Robert Miller


This piece is part of the series “The Health Data Goldilocks Dilemma: Sharing? Privacy? Both?” which explores whether it’s possible to advance interoperability while maintaining privacy. Check out other pieces in the series here.

The problem with porridge

Today, we regularly hear stories of research teams using artificial intelligence to detect and diagnose diseases earlier with more accuracy and speed than a human would have ever dreamed of. Increasingly, we are called to contribute to these efforts by sharing our data with the teams crafting these algorithms, sometimes by healthcare organizations relying on altruistic motivations. A crop of startups have even appeared to let you monetize your data to that end. But given the sensitivity of your health data, you might be skeptical of this—doubly so when you take into account tech’s privacy track record. We have begun to recognize the flaws in our current privacy-protecting paradigm which relies on thin notions of “notice and consent” that inappropriately places the responsibility data stewardship on individuals who remain extremely limited in their ability to exercise meaningful control over their own data.

Emblematic of a broader trend, the “Health Data Goldilocks Dilemma” series calls attention to the tension and necessary tradeoffs between privacy and the goals of our modern healthcare technology systems. Not sharing our data at all would be “too cold,” but sharing freely would be “too hot.” We have been looking for policies “just right” to strike the balance between protecting individuals’ rights and interests while making it easier to learn from data to advance the rights and interests of society at large. 

What if there was a way for you to allow others to learn from your data without compromising your privacy?

To date, a major strategy for striking this balance has involved the practice of sharing and learning from deidentified data—by virtue of the belief that individuals’ only risks from sharing their data are a direct consequence of that data’s ability to identify them. However, artificial intelligence is rendering genuine deidentification obsolete, and we are increasingly recognizing a problematic lack of accountability to individuals whose deidentified data is being used for learning across various academic and commercial settings. In its present form, deidentification is little more than a sleight of hand to make us feel more comfortable about the unrestricted use of our data without truly protecting our interests. More of a wolf in sheep’s clothing, deidentification is not solving the Goldilocks dilemma.

Tech to the rescue!

Fortunately, there are a handful of exciting new technologies that may let us escape the Goldilocks Dilemma entirely by enabling us to gain the benefits of our collective data without giving up our privacy. This sounds too good to be true, so let me explain the three most revolutionary ones: zero knowledge proofs, federated learning, and blockchain technology.

  1. Zero Knowledge Proofs (ZKP)

Zero knowledge proofs use cutting edge mathematics to allow one party (the “prover”) to prove the validity of a statement to another party (the “verifier”) without disclosing the underlying data about their statement. Put another way, zero knowledge proofs let us prove things about our data without giving up our privacy. This could be an extremely valuable strategy in research since we could learn, for example, which treatments worked best for which people without needing to know which people received which treatments or what their individual outcomes were. Zero knowledge proofs are already being used in healthcare today—pharmaceutical manufacturers in the MediLedger project are deploying them to keep our drug supply chains both private and secure. 

  • Federated Learning

Another privacy enabling innovation is federated learning, which enables a network of computers to collaboratively train one algorithm while keeping their data on their devices. Instead of sending their data to a central computer to train an algorithm, federated learning sends the algorithm to the data, trains it on data locally, and only shares the updated algorithm with other parties. By decoupling the training of algorithms from the need to centralize data, federated learning limits the exposure of an individual’s data to privacy risks. With federated learning, several of the world’s largest drug makers, usually fierce competitors, are collaborating in the MELLODDY project to advance drug discovery. Federated learning lets these companies collectively train a single shared algorithm on their highly proprietary data without compromising their privacy to their competitors. Collectively these companies benefit as they are effectively creating the world’s largest distributed database of molecular data, which they hope to use to find new cures and treatments, a process that promises to benefit us all.

  • Blockchain

Blockchain technology also has a critical role to play in creating a secure network for data sharing. The much hyped “blockchain” stems from its first implementation in Bitcoin but has much more broad applicability. Blockchains combine cryptography and game theory such that a network of computers reach consensus on a single state, you can think of them as analogous to a network of computers joining together to create one giant virtual computer. This virtual computer maintains a shared ledger of “the truth,” a sort of database the contents of which are continuously verified by all the computers in the network, and runs autonomous programs called “smart contracts.” These aspects of blockchains provides uniquely strong assurances of trust in data security and use; they execute the rules of the network consistently and objectively, and the whole process is transparent and universally auditable on the shared ledger. When applied to health data these properties could empower individuals with an unprecedented ability to supervise and control the use of their own data, and a thriving market of startups have emerged for exactly this use case.

The way forward

The cumulative significance of these paradigm-shifting technologies is their potential to eliminate the Goldilocks Dilemma between privacy and learning, individuals and the collective, once and for all. Their emergence forces us to rethink not only our national health IT policy, but our underlying ethical and legal frameworks as well. By creating the potential to build a future in which our treatment of data simultaneously respects individual and collective rights and interests, we believe there is an obligation to further develop and scale the core privacy-protecting functions of these technologies. Our aim is to spread awareness of the possibility of resolving a fundamental 21st century ethical dilemma with a technological solution. In this case, “can” implies “ought”– we must advocate for and demand that these and similar innovations be embedded into the future of our data and our health.

Robert Miller is building privacy solutions at ConsenSys Health and manages a blockchain and healthcare newsletter at

Marielle S. Gross, MD, MBE is an OB/GYN and fellow at the Johns Hopkins Berman Institute of Bioethics where her work focuses on application of technology and elimination of bias as means of promoting evidence-basis, equity and efficiency in women’s healthcare (@GYNOBioethicist). 

Livongo’s Post Ad Banner 728*90

Leave a Reply

1 Comment threads
2 Thread replies
Most reacted comment
Hottest comment thread
2 Comment authors
MarielleSGrossAdrian Gropper, MD Recent comment authors
newest oldest most voted
Adrian Gropper, MD
Adrian Gropper, MD

As CTO of the non-profit Patient Privacy Rights foundation, I feel qualified to take issue with privacy premise of this post. ZKPs and federated learning are valuable additions to our tech tools quiver but they do not solve the major issue with health data use: Turning open medicine into trade secrets that will add to the cost for future patients and make a mockery of medical science. Medicine, like math and physics, is a science and, like other sciences, requires open teaching, open improvements, and limited corporate regulation. The use of “privacy preserving” tech as a distraction to informed consent… Read more »


Thanks for your reply! In our opinion, the fundamental principles underlying ZKP (using cryptography to allow learning from data without revealing the underlying content) and federated learning (store data locally, train algorithms “in house,” and export only new-and-improved algorithms) are what give them the potential to resolve a major patient-facing dilemma of health data use: that you must share your data, giving up your privacy and control in the process, in order for society to benefit from learning from your data. However, as you note, these technologies alone are not sufficient to protect patients’ rights and interests. This is precisely… Read more »

Adrian Gropper, MD
Adrian Gropper, MD

Still, I do not follow your logic for how Ethereum will contribute to the solution. I’m happy to believe that it “could” contribute to the solution but I don’t think that discussion is happening anywhere in the open. ConsenSys , given its resources and the role it claims for the future, can do better than that.