Cloud Zone is brought to you in partnership with:

Ben Kepes is an analyst, and entrepreneur, an commentator, and a business adviser. His interests include a diverse range of industries from manufacturing to property technology. As a commentator he has a broad presence both in the traditional media and as an extensive blogger. He sits on the boards of a number of organizations, both commercial and not-for-profit. Ben is a DZone MVB and is not an employee of DZone and has posted 197 posts at DZone. You can read more from them at their website. View Full User Profile

The Amazon Outage is OK, OK? (Well kind of)

04.27.2011
| 3608 views |
  • submit to reddit

I’d kind of avoided posting about the recent Amazon outage – it’s an event that has had monstrous amounts written about it – some of it beyond hyperbole. Case in point; NetworkWorld says that;

the Amazon outage set cloud computing back years

I mean this is just plain wrong. Yes the AWS event means people will think long and hard about their architecture. Yes, some enterprises that were toying with the idea of public cloud might pull back for awhile. Yes private cloud providers will use the event ad infinitum to justify private versus public but let’s be a little realistic, it doesn’t spell the end of the cloud.

There are then the other extremes, exemplified by George Reese who says that this is a shining moment and shows that, with proper design, the Cloud can be amazingly resilient. Reese calls out the example of NetFlix that, despite being an AWS customer, had no real issues during the outage because they’ve designed for failure – built their system to be redundant and resilient outside of their providers setup.

Even the ex-president of private cloud, Christian Reilly tells us that;

for the traditional enterprise folks, it really doesn’t take much more than an outage of this nature, combined with some horror stories of how certain customers were catastrophically affected, and paradoxically worrying cases of what it took for certain customers not to be affected, to push the exploration of private cloud further up the to-do lists of many enterprise CIOs.

I know Reilly isn’t himself justifying the event as a gravestone moment for the public cloud but, and here I’m getting a little pent up, these same execs who are decrying he public cloud because it isn’t safe (when in actual fact it’s single zone/region/data center use of the public cloud which isn’t safe) are stepping back and oftentimes relying on private infrastructure sitting in… you guessed it – one data center. In a lovely circular way we’ve just recreated the very risk factors which caused so much impact in the AWS case.

Finally we have Klint Finley over on ReadWriteWeb who does a great job of clearing the FUD away from he issues and contends that the fault for the outage, and ensuing failure of downstream services, is entirely on AWS.

So, where does one look, and what is the prognosis for the cloud?

Well, like Reese, I see the “Amazonoclypse” as a bad event with a silver lining to it. In the most stark of situations, highlight has been made of the need to think beyond one zone, one data center, one region and one provider to build a robust and resilient service.

So, what are the components and solutions needed to build a service that would avoid issues were an outage like the one we saw recently to occur?

Multi site

All Cloud vendors are quick to point out just how reliable their data centers are with their redundant communication channels, power supply structures and the like. Any application running on the clouds needs to consider the same issues – it is unrealistic to rely completely on one single data center – a chain is only as strong as its weakest link ad by relying on one DC only the idea of multiple redundancies is rendered a fiction.

Multi provider

This one is a little more contentious, and difficult to effect right now. But with the advent of more open standards (OpenStack anyone?), Cloud users have the ability to obtain service across multiple providers. More and more third party solutions are helping with this process.

Automaticity

The real opportunity here is for providers that offer infrastructure-vendor agnostic orchestration and automation services. Case in point Layer7 who came out quickly with a post that explains why their own rules based cloud broker product would have avoided downstream issues from the AWS event. I’ll talk more specifically about the Layer7 offering tomorrow but suffice it to say that third party services management just became very relevant.

So – yes the outage was truly bad. Yes some people got a serious fright from what happened. No we shouldn’t let Amazon off the hook and should expect a very thorough post-mortem. But in no way does this change the landscape for the age old public-private debate.

References
Published at DZone with permission of Ben Kepes, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)