Okay, so Amazon Web Services, one of the largest, most secure and most trusted Web services in the world, showed a vulnerability April 21 when it went down for hours, taking some busy Websites with it. Thankfully, this doesn’t happen every day, or even every year.
So what else is new? No site on the Internet is sacrosanct or immune from power shutoffs or a major service-denial attack.
The outage that started at 1:41 a.m. PDT April 21 at an AWS (Amazon Web Services) data center in Northern Virginia caused service disruptions in its EC2 (Elastic Cloud 2) hosting service, knocking thousands of Websites-including such popular ones as Foursquare, Reddit, Quora and Hootsuite-off the Internet for more than 30 hours. Those and many smaller Websites were still offline in some systems by mid-afternoon April 22.
Businesses that depend on the AWS hosting service have lost money during these hours-income that cannot be regained.
Is it too much of a snap reaction to ask if the outage will cause CIOs to hesistate about upgrading their IT systems with cloud-type deployments? Will IT execs question the cloud’s prime-time readiness for key business operations?
In a word: Nah.
There’s no question that the Amazon outage raises important points for enterprises to consider about which services to subscribe to from a public cloud, which should remain on the organization’s physical premises, or which to deploy as private cloud services. But those are questions that IT decision-makers grapple with every day.
“The first thing to understand [about this event] is that this changes nothing,” Andi Mann, longtime storage industry analyst who’s currently serving as chief cloud strategy guru at CA Technologies, told eWEEK.
“Cloud will have downtime-it’s a fundamental issue. But you need to be ready for downtime, whether it’s your own infrastructure or cloud infrastructure. You need to understand what the risk is. It’s all just about risk management.”
Additional Risks Always Involved
When you look at subscribing to services in a public cloud like EC2, it’s important to remember that there are indeed additional risks involved, Mann said.
“When you move into the cloud, you can’t just take an old application and throw it in the cloud and think you’ve done something special,” Mann said, “because you’re introducing additional risks.
“You’re sharing infrastructure, you’re relying more on networking, and you’re relying on your cloud provider to do things you used to do-like disaster recovery, continuity planning, performance management, change management and so on.”
If you’re going to leave all that to a service provider, then you’re going to get into trouble, Mann said. Each enterprise needs to manage its cloud systems as if they were in their own data center, he said.
“You can’t just throw it to Amazon and say, ‘I’m done.’ You need to monitor it yourself, you need to have a backup plan, you need to have a disaster-recovery plan, you need to manage licensing,” Mann said. “Cloud doesn’t mean no management. It’s still virtually within your four walls.”
Nothing Out of the Ordinary
The Amazon outage is certainly nothing new or out of the ordinary.
“We’ve seen this many times before. Gmail’s been down, Amazon’s been down before, many of the CDN’s [content delivery networks] have been down,” Mann said. “Heck, CDNs have been shut down. Look at the Wikileaks thing, for example. Pastor Terry Jones [a controversial Florida anti-Muslim preacher] was shut down. Look at Amazon’s terms of service; they can shut you down themselves.
“You might get shut down [at any time] using the cloud. Just manage it.”
The probable results of this event are that Amazon will work harder to prove itself and add more safeguards. Customers will look closer at paying extra for online backup in the form of more “availability zones”-which means more mirrored content within the cloud service.
But there’s no stopping outages like this one.
Lydia Leong of Gartner Research wrote in an advisory that Amazon EC2 didn’t actually violate its service-level agreement when the outage occurred.
“Amazon’s SLA for EC2 is 99.95 percent for multi-AZ deployments,” Leong wrote. “That means that you should expect that you can have about 4.5 hours of total region downtime each year without Amazon violating its SLA.
“Note, by the way, that this outage does not actually violate their SLA. Their SLA defines unavailability as a lack of external connectivity to EC2 instances, coupled with the inability to provision working instances. In this case, EC2 was just fine by that definition. It was Elastic Block Store [EBS] and Relational Database Service [RDS] which weren’t, and neither of those services have SLAs.”
So, one more important admonition: Read the SLA very, very carefully when you commit to a cloud service. What you don’t understand, or don’t realize, may come back to bite you.
Just ask Quora, Reddit and a few other online businesses.