How one little Amazon error can destroy the internet
The fact that Amazon controls a vast swath of cloud computing services became dreadfully clear on Wednesday morning when a string of errors brought countless websites to their knees. This consolidation of power is, perhaps suddenly, a very big problem.
Unlike its internet marketplace, Amazon Web Services (AWS) works more like a house of cards than a traditional retail business. After all, instead of selling books and reasonably priced electronics, AWS caters to enterprise clients to provide cloud-computing services. Amazon Simple Storage Service (S3), the product that suffered errors and knocked out a solid portion of the web today, provides storage for cloud-based apps like Slack and Trello. Amazon says that its S3 service is “designed to deliver 99.999999999 per cent durability”. But when it one piece of the infrastructure fails, AWS fails big.
This is because Amazon controls a ridiculous portion of the market share when it comes to cloud computing and, specifically, cloud storage. A Gartner study from August 2016 claims that AWS controls 31 per cent of the market in global cloud infrastructure, and the business is growing. The same study said that AWS accounted for 51 per cent of Amazon’s profits. (Another study from the same time period puts Amazon’s market share at 45 per cent.) Microsoft, IBM and Google are all expanding their cloud offerings as well, but Amazon’s been the leader in the space since 2006.
So for over a decade, Amazon has been king of the cloud. During that span of time, the company’s business model, which Jeff Bezos once compared to the early days of electricity, enabled startups to scale and yet still afford the cost of hosting. Ingrid Burrington explained in The Atlantic last year:
In practice, this meant that pricing for services was entirely contingent on actual use, an approach that allowed developers to rapidly scale small startups into massive companies by paying for infrastructure support on an as-needed basis and scaffolding as needs grew. Thanks to AWS, the initial overhead for starting a service like Airbnb or Slack (both AWS customers) is so low that those companies can afford to expand quickly.
But what happens when any service gets so big that its tentacles touch the entire industry? Its failures become amplified to a destructive degree. In the case of AWS, that .000000001 per cent of the time when things don’t work just right means that over a third of the internet ceases to function well. Amazon won’t say how many cloud computing customers it has or the exact percentage of internet traffic that’s affected when an error happens. But today’s outage showed that it could bring entire networks of websites grinding to a halt. (Gizmodo Media is an AWS customer, so I can confirm that this was a messed up day.)
Meanwhile, the fact that many of Amazon’s AWS servers are located in northern Virginia, where an unholy number of tubes come together to form one of the most congested bottlenecks of internet traffic, certainly doesn’t help. Amazon says that this region, known as US-EAST-1, was the source of Tuesday’s outage.
So while this week’s paralysing series of errors gave Amazon engineers a terrible headache, cloud computing competitors like Microsoft, IBM and Google must be thrilled. As mentioned earlier, they’re all gaining on Amazon’s absurd market share, and now their salespeople will have a single incident to show that AWS is not 100 per cent durable. The fact that added competition should improve services and lower prices for everyone is undeniably a good thing, too.
Amazon still hasn’t explained exactly what went down this morning. In response to a Gizmodo request for comment the company said:
We continue to experience high error rates with S3 in US-EAST-1, which is impacting various AWS services. We are working hard at repairing S3, believe we understand root cause, and are working on implementing what we believe will remediate the issue.
That’s basically a different version of the error notice posted on the AWS website. Good luck using the internet. It’s a mess out there.
Amazon Web Services, the remote data centres that power some of the world’s most popular websites, is experiencing a big disruption that’s making numerous apps and websites — including Business Insider US — difficult to access for many users.
On its status page on Tuesday, Amazon places the blame with its S3 storage service, which it says is seeing “high error rates” for websites and apps hosted from its flagship US East (Northern Virginia) Region data center.
Among the sites and services that appear to be affected are Slack, Quora, Lonely Planet, Snapchat’s Bitmoji, and even the US Securities and Exchange Commission website.
Let’s hope Snapchat parent company Snap doesn’t file an update to its IPO prospectus.
Amazon S3 is a very common service that sites use to store files, and the US East data center is one of its biggest facilities, meaning that this is wreaking havoc all over the web. Sites like Imgur use S3 to store their photo files, for instance, making those sites slow to load, if they load at all.
“We continue to experience high error rates with S3 in US-EAST-1, which is impacting various AWS services. We are working hard at repairing S3, believe we understand root cause, and are working on implementing what we believe will remediate the issue,” an Amazon spokesperson said on Tuesday afternoon about 1 p.m. PST.
Notably, this isn’t technically an “outage,” since Amazon’s S3 is not entirely out of commission and some services are only partially affected.
put it all in the cloud they said..what could possibly go wrong they said?
“But what happens when any service gets so big that its tentacles touch the entire industry? Its failures become amplified to a destructive degree. In the case of AWS, that .000000001 per cent of the time when things don’t work just right means that over a third of the internet ceases to function well.”
“Amazon S3 is a very common service that sites use to store files, and the US East data center is one of its biggest facilities, meaning that this is wreaking havoc all over the web. Sites like Imgur use S3 to store their photo files, for instance, making those sites slow to load, if they load at all.”
that was 2 days ago..we are already being shown the tremors..a full “earthquake” could shut it all down..