Health Care in the Cloud: A ‘Case Study of What Not To Do’

Amazon Web Services (AWS), “the cloud” for many, experienced a serious interruption in service beginning on April 21st. The problem lingered for at least 6 days. Many websites that relied on Amazon services went down or saw their performance degraded during the event.

The AWS failure disproportionately affected startups like Foursquare, Quora and Reddit, companies that are “focused on moving fast in pursuit of growth, and less apt to pay for extensive backup and recovery services.”

One of the affected companies was a health care startup. What follows is a transcription (including typos) of an AWS Discussion Forum that this company initiated 24 hours after the outage began. The company’s contributions are in italics.

Life of our patients is at stake—I am desperately asking you to contact

Sorry I could not get through in any other way. We are a monitoring company and are monitoring hundreds of cardiac patients at home. We are unable to see their ECG signals since 21st of April. Can you please contact us? Or please let me know how can I contact you more ditectly. Thank you.

Oh this is not good. Man mission critical systems should never be run in the cloud. Just because AWS is HIPPA certified doesn’t mean it won’t go down for 48+ hours in a row.

(+30 minutes since comment thread began) Well, it is supposed to be reliable…
Anyway, I am begging anyone from Amazon team to contact us directly. Thank you

Go to your backups? Or make a big deal out of it on the forums maybe someone will take a look. In any case anecdotal empirical evidence has shown don’t bother with premium support its a freaking joke.

Thanks for the comments, but we are really desparate. Amazon team – please contact us

(+10 hours since comment thread began) Not restored. Not heard from Amazon. People out there – please take a look at our volumes! This not just some social network website issue, but a serious threat to peoples lives!

Your only option at this point is Premium support. However, they’re just going to tell you to wait. Sorry.

(+ 13 hours) There is some progress. 2 servers are operational and one still not working. Unfortunately, the one on which we have the most patients

Aren’t you braking some compliance laws by not having a highly-available environment?

You put a life critical system on virtual hosted servers? What the hell is wrong with you

Not sure whether you’re plain incompetent or irresponsible. Anyway, you should be ashamed and prepare yourself with lots of money to pay for the lawyers. Would it be so difficult to have a contingency plan? another provider? or even another availability zone? Are you so fsklong dumb as to think that nothing could ever happen to a data center.

(+ 15 hours) This is a home based system, not an intra hospital system. So the promised 99.95% uptime is fine. But this situation showed that the promised 99.95% = fiction… BTW. All three servers are working – hopefuly the situation will remain stable

While I’m not going to suggest Amazon shouldn’t be ashamed of themselves.. I have to admit this is a pretty sickening tale. If I were running a system that could potentially lead to loss of human life. You’d better believe hot-spare data center would be in my mind.

Your CTO will be a serious liability, and your board is going to crush your C*O staff very soon, if they’re awake. If you haven’t notified doctors and patients already, your liabilities just got worse. If you can’t roll over your IP routing, then you should not be in business. This should be going to a different server and duplicated by your own policies to ensure compliance with ALL regulatory requirements. You’re failing and you probably don’t even know how bad your company is failing. If I were you, I’d beg John Halamka to guide you out of this mess.

“This not just some social network website issue, but a serious threat to peoples lives!” Which begs the question, why did you leave yourself — and your patients — open to this risk in the first place? I hope for your patients’ sake that you begin taking more seriously your IT planning. Since you apparently don’t have a fail-over — and are waiting for Amazon anyway — you might want to think about solving the weakness you built into your own system, i.e., start working on an alternative method of getting what you need. And if you can’t find a way to do that even now, I submit that you should never have launched your service at all.

Not even your servers are redundant? One of your servers is offline, and there’s not a hot swapable replacement? for a life-critical system? Man, pray God nothing happens, because on contrary, the responsibles for this design are surely going to serve sometime on a nearby prison.

If you were smart, you would have a distaster recovery plan for just this kind of thing. Judging from your lack of said preparations, you lot figured the cloud never goes down, and got greedy by not wanting to spend money on hot standby machines on a different infrastructure. Good going. Hope none of your cardiac patients croak because you’re going to get sued into next week…

(+15 hours) As I wrote, this is not a life saving system.Which does not mean that patient’s life cannot be saved using it.That is all I have to say. Good luck to others

Dude/Dudet. You put that patients lives are at stake in your title … Don’t try to back track. Just admit it was stupid and move on.

Ah, so the title of this thread was a ruse? Either it isn’t so critical after all, and shame on you for trying to make it seem like it was, or else it is critical, and now you’re lying about it in order to not be shamed by others. Either way, shame on you.

A perfect case study of what NOT TO DO. Why gamble when people’s LIVES are at stake!?

We all do mistakes, but the important thing is learn from them. I’ll also have to review and change my policies. As for Amazon, it is a total shame that didn’t give ANY kind of assistance not even to this request. Regards

Agreed. Sounds like he’s a startup. Failing over to other data centers is extremely expensive to set up and operate. Particularly if his data is write-heavy. No reason for everyone to go all self-righteous on him. In the end, the market will decide. if his patients die, he’ll be fired and/or his company will go out of business. Others will learn, the marketplace will move on.

This is a Hoax. There are NO Patients in Danger. This was pure Hype from a Sick Person. Don’t fall for this BS. Use your Common Sense. Nobody in charge would allow this FruitCake to load any sort of critical monitoring systems up. You have been had. I respect your very real emotions, and your helpful and constructive responses to this fool, but he made all of this up, to get a Rise out of you. Next time, be more logical and think, before you answer crap like this.

Pizaazz Note: The long-term impact of the AWS outage on cloud computing is uncertain. It may be negligible. IDC estimates that corporate cloud computing will grow by more than 25% per year to $55.5 billion by 2014.

Glenn Laffel, MD, PhD, is a successful entrepreneur in health information technology. He blogs at Pizaazz.