“It’s such a fine line between stupid and…uh, clever.”
This is all too true when it comes to science. You can design a breathtakingly clever experiment, using state of the art methods to address a really interesting and important question. And then at the end you realize that you forgot to type one word when writing the 1,000 lines of software code that runs this whole thing, and as a result, the whole thing’s a bust.
It happens all too often. It has happened to me, let me think, three times in my scientific career and, I know of several colleagues who had similar problems and I’m currently struggling to deal with the consequences of someone else’s stupid mistake.
Here’s my cautionary tale. I once ran an experiment involving giving people a drug or placebo and when I crunched the numbers I found, or thought I’d found, a really interesting effect which was consistent with a lot of previous work giving this drug to animals. How cool is that?
So I set about writing it up and told my supervisor and all my colleagues. Awesome.
About two or three months later, for some reason I decided to reopen the data file, which was in Microsoft Excel, to look something up. I happened to notice something rather odd – one of the experimental subjects, who I remembered by name, was listed with a date-of-birth which seemed wrong: they weren’t nearly that old.
Slightly confused – but not worried yet – I looked at all the other names and dates of birth and, oh dear, they were all wrong. But why?
Then it dawned on me and now I was worried: the dates were all correct but they were lined up with the wrong names. In an instant I saw the horrible possibility: m ixed up names would be harmless in themselves but what if the group assignments (1 = drug, 0 = placebo) were lined up with the wrong results? That would render the whole analysis invalid… and oh dear. They were.
As the temperature of my blood plummeted I got up and lurched over to my filing cabinet where the raw data was stored on paper. It was deceptively easy to correct the mix-up and put the data back together. I re-ran the analysis.
No drug effect.
I checked it over and over. Everything was completely watertight – now. I went home. I didn’t eat and I didn’t sleep much. The next morning I broke the news to my supervisor. Writing that email was one of the hardest things I’ve ever done.
What happened? As mentioned I had been doing all the analysis in Excel. Excel is not a bad stats package and it’s very easy to use but the problem is that it’s too easy: it just does whatever you tell it to do, even if this is stupid.
In my data as in most people’s, each row was one sample (i.e. a person) and each column was a piece of info. What happened was that I’d tried to take all the data, which was in no particular order, and reorder the rows alphabetically by subject name to make it easier to read.
How could I screw that up? Well, by trying to select “all the data” but actually only selecting a few of the columns. Then I reordered them, but not the others, so all the rows became mixed up. And the crucial column, drug=1 placebo=0, was one of the ones I reordered.
The immediate lesson I learned from this was: don’t use Excel, use SPSS, which simply does not allow you to reorder only some of the data. Actually, I still use Excel for making graphs and figures but every time I use it, I think back to that terrible day.
The broader lesson though is that if you’re doing something which involves 100 steps, it only takes 1 mistake to render the other 99 irrelevant. This is true in all fields but I think it’s especially bad in science, because mistakes can so easily go unnoticed due to the complexity of the data, and the consequences are severe because of the long time-scale of scientific projects.
Here’s what I’ve learned: Look at your data, every step of the way, and look at your methods, every time you use them. If you’re doing a neuroimaging study, the first thing you do after you collect the brain scans is to open them up and just look at them. Do they look sensible?
Analyze your data as you go along. Every time some new results come in, put it into your data table and just look at it. Make a graph which just shows absolutely every number all on one massive, meaningless line from Age to Cigarette’s Smoked Per Week to EEG Alpha Frequency At Time 58. For every subject. Get to know the data. That way if something weird happens to it, you’ll know. Don’t wait to the end of the study to do the analysis. And don’t rely on just your own judgement – show your data to other experts.
Check and recheck your methods as you go along. If you’re running, say, a psychological experiment involving showing people pictures and getting them to push buttons, put yourself in the hot seat and try it on yourself. Not just once, but over and over. Some of the most insidious problems with these kinds of studies will go unnoticed if you only look at the task once – such as the old “randomized”-stimuli-that-aren’t-random issue.
Trust no-one. This sounds bad, but it’s not. Don’t rely on their work, in experimental design or data analysis, until you’ve checked it yourself. This doesn’t mean you’re assuming they’re stupid, because everyone makes these mistakes. It just means you’re assuming they’re human like you.
Finally, if the worst happens and you discover a stupid mistake in your own work: admit it. It feels like the end of the world when this happens, but it’s not. However, if you don’t admit it, or even worse, start fiddling other results to cover it up – that’s misconduct, and if you get caught doing that, it is the end of the world, or your career, at any rate.