Doorbells, Vacuums, websites, applications, and many web services went down after Amazon Web Services (AWS) encountered issues in the US-EAST-1 Region last November 25, spelling disaster for millions of clients who avail AWS services in different forms across multiple industries.
Amazon was quick to address the issue with an advisory tweeted by AWS stating “increased error rates for Kinesis Data Streams within our US-EAST-1 Region” causing the Amazon services to be affected in that specific region.
We're currently experiencing increased error rates for Kinesis Data Streams within our US-EAST-1 Region, affecting other services within the region. Please watch our Service Health Dashboard for details: https://t.co/QLlsejEe3z. We are actively working to resolve the issue.
— AWS Support (@AWSSupport) November 25, 2020
A number of major services and websites were affected due to the service outage, including Adobe, The Washington Post, Downdetector.com, Roku, RSS Podcasting and many more.
The service in question that was affected was Amazon Kinesis, which allows clients of AWS to “collect, process, and analyze real time, streaming data” in order for clients to obtain timely insights with respect to the data collected.
In their technical report summary, posted Nov. 25, Amazon detailed key causes and events that caused the service outage. According to the report, the event that triggered a cascade of issues and eventually caused the service outage was a “relatively small addition of capacity” that was added to the service at around 2:44 AM PST and lasted for an hour.
The new capacity added caused the servers to “exceed the maximum number of threads allowed by an operating system configuration.” As a result, front-end servers were “unable to route requests to back-end clusters.” The errors and the eventual outage began as early as 5:15 AM PST and alarms were cleared at 10:31 PM PST as per CloudWatch. This marks a total of 17 hours in which the outage has affected AWS clients.
As a preventive measure, Amazon will be upgrading CPU and memory services and in turn, reducing the number of servers and “lessen the threads required by each server to communicate across the fleet.” The majority of their major adjustments for Kinesis also lie in isolation and partitioning the front-end fleet with long-term goals towards cellularization to ensure isolation in the event that a certain AWS service fails.
The Amazon Web Service is a cloud computing service that serves as a backbone for many applications and websites and allows them to host their application, websites, and store data with their database solution.
While this isn’t the first AWS outage in recent years (AWS also had an outage two years ago), each outage that AWS has and will encounter in the future will always bring a chunk of the internet along with it.
According to the Washington Post (which is also owned by Amazon owner Jeff Bezos), Amazon held as much as 45 percent of the global market in 2019. As such, any outage at this level would bring a large number of internet services to a halt as many of these now rely on AWS to operate.
Major companies that are clients of AWS include Facebook, Netflix, Twitch.tv, LinkedIn, BBC and other companies and services that are accessed by millions every day.
If you like reading our content, why not show your appreciation by treating us to a cup of coffee? (or two, if you’re feeling generous)