Health Care in the Cloud: A ‘Case Study of What Not To Do’

Amazon Web Services (AWS), “the cloud” for many, experienced a serious interruption in service beginning on April 21st. The problem lingered for at least 6 days. Many websites that relied on Amazon services went down or saw their performance degraded during the event.

The AWS failure disproportionately affected startups like Foursquare, Quora and Reddit, companies that are “focused on moving fast in pursuit of growth, and less apt to pay for extensive backup and recovery services.”

One of the affected companies was a health care startup. What follows is a transcription (including typos) of an AWS Discussion Forum that this company initiated 24 hours after the outage began. The company’s contributions are in italics.

Life of our patients is at stake—I am desperately asking you to contact

Sorry I could not get through in any other way. We are a monitoring company and are monitoring hundreds of cardiac patients at home. We are unable to see their ECG signals since 21st of April. Can you please contact us? Or please let me know how can I contact you more ditectly. Thank you.

Oh this is not good. Man mission critical systems should never be run in the cloud. Just because AWS is HIPPA certified doesn’t mean it won’t go down for 48+ hours in a row.

(+30 minutes since comment thread began) Well, it is supposed to be reliable…
Anyway, I am begging anyone from Amazon team to contact us directly. Thank you

Go to your backups? Or make a big deal out of it on the forums maybe someone will take a look. In any case anecdotal empirical evidence has shown don’t bother with premium support its a freaking joke.

Thanks for the comments, but we are really desparate. Amazon team – please contact us

(+10 hours since comment thread began) Not restored. Not heard from Amazon. People out there – please take a look at our volumes! This not just some social network website issue, but a serious threat to peoples lives!

Your only option at this point is Premium support. However, they’re just going to tell you to wait. Sorry.

(+ 13 hours) There is some progress. 2 servers are operational and one still not working. Unfortunately, the one on which we have the most patients

Aren’t you braking some compliance laws by not having a highly-available environment?

You put a life critical system on virtual hosted servers? What the hell is wrong with you

Not sure whether you’re plain incompetent or irresponsible. Anyway, you should be ashamed and prepare yourself with lots of money to pay for the lawyers. Would it be so difficult to have a contingency plan? another provider? or even another availability zone? Are you so fsklong dumb as to think that nothing could ever happen to a data center.

(+ 15 hours) This is a home based system, not an intra hospital system. So the promised 99.95% uptime is fine. But this situation showed that the promised 99.95% = fiction… BTW. All three servers are working – hopefuly the situation will remain stable

While I’m not going to suggest Amazon shouldn’t be ashamed of themselves.. I have to admit this is a pretty sickening tale. If I were running a system that could potentially lead to loss of human life. You’d better believe hot-spare data center would be in my mind.

Your CTO will be a serious liability, and your board is going to crush your C*O staff very soon, if they’re awake. If you haven’t notified doctors and patients already, your liabilities just got worse. If you can’t roll over your IP routing, then you should not be in business. This should be going to a different server and duplicated by your own policies to ensure compliance with ALL regulatory requirements. You’re failing and you probably don’t even know how bad your company is failing. If I were you, I’d beg John Halamka to guide you out of this mess.

“This not just some social network website issue, but a serious threat to peoples lives!” Which begs the question, why did you leave yourself — and your patients — open to this risk in the first place? I hope for your patients’ sake that you begin taking more seriously your IT planning. Since you apparently don’t have a fail-over — and are waiting for Amazon anyway — you might want to think about solving the weakness you built into your own system, i.e., start working on an alternative method of getting what you need. And if you can’t find a way to do that even now, I submit that you should never have launched your service at all.

Not even your servers are redundant? One of your servers is offline, and there’s not a hot swapable replacement? for a life-critical system? Man, pray God nothing happens, because on contrary, the responsibles for this design are surely going to serve sometime on a nearby prison.

If you were smart, you would have a distaster recovery plan for just this kind of thing. Judging from your lack of said preparations, you lot figured the cloud never goes down, and got greedy by not wanting to spend money on hot standby machines on a different infrastructure. Good going. Hope none of your cardiac patients croak because you’re going to get sued into next week…

(+15 hours) As I wrote, this is not a life saving system.Which does not mean that patient’s life cannot be saved using it.That is all I have to say. Good luck to others

Dude/Dudet. You put that patients lives are at stake in your title … Don’t try to back track. Just admit it was stupid and move on.

Ah, so the title of this thread was a ruse? Either it isn’t so critical after all, and shame on you for trying to make it seem like it was, or else it is critical, and now you’re lying about it in order to not be shamed by others. Either way, shame on you.

A perfect case study of what NOT TO DO. Why gamble when people’s LIVES are at stake!?

We all do mistakes, but the important thing is learn from them. I’ll also have to review and change my policies. As for Amazon, it is a total shame that didn’t give ANY kind of assistance not even to this request. Regards

Agreed. Sounds like he’s a startup. Failing over to other data centers is extremely expensive to set up and operate. Particularly if his data is write-heavy. No reason for everyone to go all self-righteous on him. In the end, the market will decide. if his patients die, he’ll be fired and/or his company will go out of business. Others will learn, the marketplace will move on.

This is a Hoax. There are NO Patients in Danger. This was pure Hype from a Sick Person. Don’t fall for this BS. Use your Common Sense. Nobody in charge would allow this FruitCake to load any sort of critical monitoring systems up. You have been had. I respect your very real emotions, and your helpful and constructive responses to this fool, but he made all of this up, to get a Rise out of you. Next time, be more logical and think, before you answer crap like this.

Pizaazz Note: The long-term impact of the AWS outage on cloud computing is uncertain. It may be negligible. IDC estimates that corporate cloud computing will grow by more than 25% per year to $55.5 billion by 2014.

Glenn Laffel, MD, PhD, is a successful entrepreneur in health information technology. He blogs at Pizaazz.

6 replies »

  1. Right, let’s face it. There are still plenty of places where the Internet is not an option.Tavis J. Hampton recently posted..Simulating Two-Finger Scrolling in Linux

  2. As with any “mission critical system” one would be wise to have redundancy. Those that put full faith in any one thing will reep the consequences. Really doesn’t matter whether it is up in the cloud or on the local desktop or back office server, they are all vulnerable.

  3. Trust no one. Folks that have only one system, whether it is on a cloud, or in a datacenter across town, or in the basement of their building, are playing with fire.
    Cloud computing is not intended to change the paradigm of having redundancy for critical systems. If all you are capable of running is a cloud, then by all means use multiple clouds. That’s the lesson that should be learned here.

  4. Interesting exchange. Whether or not this is true, it does highlight the need for redundant systems for critical data. Cloud systems may be more reliable but the can fail and having a completely separate backup (redundant from the router level to the hardware) is essential.
    I do think that everyone who runs cloud systems is looking at their backup failure plans critically now. I hope that those who run dedicated in-house systems are also reviewing their backup plans (and not just feeling smug).

  5. Whether or not this was a hoax, the notion that only cloud based systems can fail is complete tosh Glenn. Perhaps you might consider the case of one of the leading hospitals in the nation in your own backyard that is run by the most famous CIO in health care. It lost its network which for sure had lifesaving information on it for 4 full days in 2003.

    All systems of all types can fail–cloud systems are probably more reliable than others, but at some point you have to trust some system to run any operation.