Joyce Lee, MD is a pediatrician, diabetes specialist, and Associate Professor at the University of Michigan. She blogs about design and healthcare at joycelee.tumblr.com.
GitHub: How an Open Source Programming Tool With a Funny Name Could Help Revolutionize Medical Research
Most people I work with in medicine have never heard of GitHub .
For the unfamiliar, GitHub is an online repository, which is an essential tool used by computer programmers to store their programming code. It has a number of virtues, including giving users the ability to track multiple versions of their code (sort of like remembering all the track changes you ever made to your word document). This is an essential tool for programmers but its value goes beyond its function as a track changes repository, as it is a site that facilitates open source collaboration, given its “social” features, similar to social networks like Facebook or Twitter, in which you follow the content of others or others follow you.
The most amazing thing about GitHub is that many users post their code (their work, their blood, sweat, and tears) publicly on their GitHub profile. Individuals will comment on others code, providing valuable input that the owner of the code can use to improve their work. In addition, can “fork” another person’s code repository, and work directly on the code in their own Github profile to make changes or improvements, sort of like a tag team collaboration. GitHub is the tool to help facilitate large-scale open source collaboration for the software/web programming world (such as that which lead to the Linux revolution).
By early 2012 there were apparently 1.2 million users hosting over 3.6 million repositories. Now that’s collaboration to scale!
So again, you may ask, why should physicians or medical researchers care about GitHub? Because it can have broader application beyond the software/web programming world, as shown by its use among non programmers, who are currently repurposing Github to advance collaborate in their own respective fields. They are posting book projects and transcripts of talks on the site, to encourage conversation and collaboration. One user even published his personal DNA information to encourage development of open-source DNA analysis. It has been suggested that Github could even be used by US citizens to “fork” the law so that they can propose their own amendments to their elected officials.
How might we use Github to democratize the world of medical research?
As researchers we do so many different activities that we perform in isolation, which forces us to “reinvent the wheel” constantly, from drafting of ethics board applications, to creation of research protocols, to the writing of snippets of statistical code or code for web programs.
I love interactive data visualization (#dataviz). It is one of the things that I definitely wanted to explore when I came out to the Bay Area on sabbatical, because I believe that it has great potential for helping both patients and clinicians with diabetes management. The sheer volume of numbers available for this disease is overwhelming; we need #dataviz tools that can help us achieve greater understanding and make actionable clinical decisions to improve health.
This is what we usually see in clinic: numbers written down on a piece of paper.
Yes there are computer systems that link to blood glucose meters, but there are a number of complexities with the downloading of blood sugar numbers in clinic (which deserves an entire blog post sometime in the future).
You can see there is some visual analysis and annotation that we do perform, albeit primitive. The circles represent high blood sugars (>150 mg/dl)and the triangles represent low blood sugars (<70 mg/dl). This is almost better than the cave painters don’t you think?
Pie charts, need I say more? I can extract some useful insights from these charts, which improve over the previous one I showed, but a few things strike me: (1) some of the scatter plots overlay weeks of data, which I don’t find helpful because you can’t tell how BS on a given day are responding and relate them to life events; (2) some visualizations show a lot of numbers in many of the sections, and it just becomes onerous to go through them and find trends; (3) many provide statistics (area under the curve, MAD%) which I think only a minority of families and children really understand; (4) although some of the software programs do provide interactivity and let you see the data at different time scales (day, week, month), if you change to a different view, you are stuck trying to remember in your head what you saw on a previous screen because you can’t see the multiple levels at once; (4) finally, I find that the user interface and design could use major improvement.