Uncategorized

Data Socialism

In an unusually candid editorial in the NEJM, Longo and Drazen say that data sharing may be problematic because some researchers fear that the data could be used to by others to disprove their results. The editors predicted a new class of researchers who use data created by other researchers without ever taking the trouble to generate data themselves – research parasites.

With this editorial, the NEJM has firmly established itself as descriptive (the way the world is), rather than normative (the way the world ought to be). I, for one, find this move rather refreshing. I have been pumped to a diabetic state by the saccharine naivety of the hopey-changey, “we need this and that” brigade. The editors merely said what some researchers secretly think, and how many actually behave.

Once, I asked the PI of an RCT a specific question about outcomes. I received a reply within seconds. The PI sent me a pdf of the data. The email ended with that banal academic signature “Best, MD.”

I was flattered by the promptness of her response – many PIs who publish in high impact journals don’t bother replying. Then I discovered she sent me the supplementary appendix, which was also available online. Unsurprisingly, my question was not answered. But it was not supposed to be answered. The unpublished data, which included the answer to my question, was going to be used by the PI for more papers in high impact journals, as it should be.

Another time I asked an economist to share an economic model of a technology, which I did not believe was as cost-effective as he said it was. After a few evasive responses, when it became apparent that I was not getting the message through my thick skull, he replied, “sorry I can’t show you my model. I spent my PhD developing it.” If he thought that I was a data parasite gagging to prove him wrong he was, to put it plainly, spot on.


Karl Popper, the philosopher of science, said that what tells science apart from astrology is that science is falsifiable – if it can’t be disproven it’s not science. Replication, the key to progress in physical sciences, is medical science’s Achilles’ heel.

We should dwell a bit longer on the hard sciences because they are instructive. No body had to do a meta-analysis of experimental evidence of the presence of ether (the mysterious substance believed to conduct gravity). Why? Because ether either exists or doesn’t exist, and it doesn’t exist. Since Michelson-Morley’s famous failed attempt to show the presence of ether, several physicists have tried, and failed.

There is no meta-analysis of experiments of possible curvature of space-time with I2 to measure the heterogeneity of sample size. Arthur Eddington showed that gravity bent space-time, during a solar eclipse, and several others have, since, verified Einstein’s theories. Publication bias is not an issue in physics.

While physicists seek truth which uncovers the mysteries of the universe, the objective truth in medical sciences is settling petty quibbles probabilistically, such as:

Should we give 150 mg or 75 mg of aspirin after a myocardial infarction?

Do patient-centered ward rounds improve outcomes?

Is cardiac CT superior to SPECT in the diagnosis of obstructive coronary artery disease?

A bunch of studies root for cardiac CT. Another set root for SPECT. Then we have a meta-analysis. Then an RCT. Then more RCTs. Then a meta-analysis of RCTs. Then finally an analysis of an administrative database with dodgy risk adjustment renders all previous research obsolete.

To which one is tempted to yell – FFS it doesn’t matter, use either or neither. But it does matter. It matters because we’re rational optimizers. We cannot ever, ever, not be doing the best we can, even if the best is like adding a gnome on top of Everest and celebrating the total height.

Optimization is endless quibbling because the differences are so small. Optimization like is a train which departs on a lengthy journey but never leaves the platform. Optimization makes careers. Optimization leads to lots of publications. Lots. There is an infinite number of ethers and space-times to prove and disprove in healthcare sciences.

But optimization is a methodological nightmare. Because when you’re dealing with such small differences, to make sure those Lilliputian differences are real, the measuring instrument has to be precise. And one cause of an imprecise instrument, other than inherent imprecision, is sloppy research.

If you tell physicists that facts have changed they’ll say “welcome to science and have a nice day.” If you tell physicians that facts change they’ll scream “research fraud.” If Einstein and Newton were doctors, Einstein would have asked Newton to retract his theory of gravity.

One reason why there is no culture of replication in the biomedical sciences is that falsification is suffused with moral outrage. Retraction should be a normal clearing house for biomedical sciences – like shedding hair or clipping nails. It has become a consumer watchdog.

I wonder if the illiteracy most urgently in need of redress in doctors is philosophy of science, rather than evidence-based medicine or statistics. Science is a provisional assumption. Facts are supposed to change. Changing facts doesn’t mean that science is broken. It means that science is happening.

Much angst against the NEJM editorial, mostly hilarious on Twitter, is a common misunderstanding of the ought-is dilemma – confusing the description of the way the world is with endorsement of the way the world is.

I stand with the detractors though. Data ought to be shared. But, as I have learnt painfully, there is a way to ask for it. You must endear yourself to the researchers – a few complimentary emails followed by a chance encounter at a national meeting where you prostrate with reverie of the PI. Parasites must be charming.

It takes a lot of effort to generate data in biomedical sciences. To expect researchers to surrender the data for the greater good is fuzzy, and lamentably boring, adolescent naivety. If we do not recognize the self-interest of researchers, data socialism, like other forms of socialism, is condemned to failure. This is what I think Longo and Drazen are warning us in their editorial.

Saurabh Jha is a radiologist based in Philadelphia and a contributing editor for THCB.

Categories: Uncategorized

Tagged as:

4 replies »

  1. I do not fault Longo and Drazen for giving voice to the reality of the situation. But while telling it like it is may be desirable goal for journalism, for an editorial in a scientific publication, some degree of advocacy for a preferred state of affairs is to be expected. To the extent that this was done, it was advocacy for data-fiefdoms where the original investigators are royalty.

    I agree that there needs to be a reasonable tolerance for “facts” changing, in acknowledgment of the reality that all knowledge is tentative. Which things are regarded as likely to be true changes all the time in the biological sciences, and mostly it happens without a fuss. Progress and new results, rather than retraction is the mechanism. Retraction is reserved for the case where something is woefully wrong, and that _shouldn’t_ happen terribly often.

    I reject the idea that the only people who ought to get a look at the primary data are those who have endeared themselves to the researchers who collected it. Good pragmatic advice, but I think the responsibility for endearment should be in exactly the other direction. If a group of researchers undertakes a massive data collection project without adequate ability to analyze it, and still gets funding to do it (one wonders if they should, but let’s agree that it does and will continue to happen), they should quickly endear themselves to some data scientists who can help them understand what they’ve got. Should they fail to do so, the data should quickly become publicly available so people who are interested and know how to work with data are enabled to do so.

    Yes, data collection is hard. So is data analysis. So is making a decent living and paying taxes to support science. Researchers who receive funding because their work is in the public interest but who will not allow their data to be used for the public interest in an efficient way should simply _not be funded_. It is indeed naive to expect it to just happen (c.f. open access publication). But society, in its pursuit of expansion of human knowledge and the improvement of the human condition that this can bring, has every right to demand it.

  2. Fun post. Logic doesn’t pervade science completely. Without ether, what is being frame-dragged? or curved by gravity? Or, how can we divide by a infinitesimal–derivative which is as close to zero as we desire, but never quite reaches zero=nothing? Dividing by zero is verbotten. But, dividing by as close to zero as you–and god–want is OK?…as long as it isn’t exactly zero.

    Our brains, I guess, aren’t good enough and can’t surround these thoughts.

  3. OK, where’s the dad-gumbed “Like” button? I also nominate this post for some kind of award. Kudos.

    Not that I agree unreflectively with the blanket diss on “Socialism.”