A couple of weeks ago, President Obama launched a new open data policy (pdf) for the federal government. Declaring that, “…information is a valuable asset that is multiplied when it is shared,” the Administration’s new policy empowers federal agencies to promote an environment in which shareable data are maximally and responsibly accessible. The policy supports broad access to government data in order to promote entrepreneurship, innovation, and scientific discovery.
If the White House needed an example of the power of data sharing, it could point to the Psychiatric Genomics Consortium (PGC). The PGC began in 2007 and now boasts 123,000 samples from people with a diagnosis of schizophrenia, bipolar disorder, ADHD, or autism and 80,000 controls collected by over 300 scientists from 80 institutions in 20 countries. This consortium is the largest collaboration in the history of psychiatry.
More important than the size of this mega-consortium is its success. There are perhaps three million common variants in the human genome. Amidst so much variation, it takes a large sample to find a statistically significant genetic signal associated with disease. Showing a kind of “selfish altruism,” scientists began to realize that by pooling data, combining computing efforts, and sharing ideas, they could detect the signals that had been obscured because of lack of statistical power. In 2011, with 9,000 cases, the PGC was able to identify 5 genetic variants associated with schizophrenia. In 2012, with 14,000 cases, they discovered 22 significant genetic variants. Today, with over 30,000 cases, over 100 genetic variants are significant. None of these alone are likely to be genetic causes for schizophrenia, but they define the architecture of risk and collectively could be useful for identifying the biological pathways that contribute to the illness.
We are seeing a similar culture change in neuroimaging. The Human Connectome Project is scanning 1,200 healthy volunteers with state of the art technology to define variation in the brain’s wiring. The imaging data, cognitive data, and de-identified demographic data on each volunteer are available, along with a workbench of web-based analytical tools, so that qualified researchers can obtain access and interrogate one of the largest imaging data sets anywhere. How exciting to think that a curious scientist with a good question can now explore a treasure trove of human brain imaging data—and possibly uncover an important aspect of brain organization—without ever doing a scan.