I am happy to announce the release of the doctor “referral” social graph. This dataset, which I obtained using a Freedom of Information Act request against the Medicare claims database, details how most doctors, hospitals and other providers team together to deliver care in the United States. This graph is nothing less than a map of how healthcare is delivered in this country.
For the time being, the only way to get a copy of this data set is to support the Medstartr crowd funding campaign for either $100 (for the viral “open source eventually” version of the data) or $1000 (for the proprietary friendly version of the data, that any business can freely “merge” with other data). If you need consulting around this data, you can buy in at the $5k or $10k levels. Also, we are going to have really awesome t-shirts.
I will be writing a more in-depth technical article about this dataset over on the brand new O’Reilly Strata blog (which focuses specifically on Big Data) so I will gloss over most of the technical details here, with a few important exceptions.
First, when I say a “graph” I am not talking about a diagram. I am talking about a mathematical model that supports nodes and connections between those nodes. These are visualized as diagrams, but it is not possible to really analyze large graphs without a database. In this case, the nodes are doctors, hospitals and other providers and the connections between those nodes represent the degree to which they collaborate on specific patients.
Also, despite my branding to the contrary, this is not strictly a “referral” data set, although a fairly large portion of the data do represent referral relationships. Instead, it depicts the degree to which any healthcare provider “works” on a patient in the same time frame as some other provider. This means, for instance, that many primary care doctors are linked to emergency rooms. But this just means that a patient they were seeing was also seen by the emergency room in the same time period. Referral relationships can be inferred from this data, but not presumed.Continue reading…