Home Cloud Amazon Web Services (Updated) AWS Outage Resolved, All Operations Return to Normal

(Updated) AWS Outage Resolved, All Operations Return to Normal

-

As we reported earlier, Amazon Web Services was experiencing an outage with its Amazon Kinesis Data Streams. AWS also stated the issue on its health dashboard earlier this week. The Seattle-based cloud giant now claims to have effectively resolved the outage issue. Amazon Kinesis collects and analyzes data in real-time to get precise insights. The outage impacted multiple services, including Roku, Adobe, and Flickr. Multiple other services, including Amazon Elastic Container Service (fully managed container orchestration service), EventBridge (event bus to make a connection of applications easier), and Amazon Elastic Kubernetes Service were also affected.

Read More: Amazon Web Services Experiences Outage, Major Customers Impacted

According to AWS’ service page, the Kinesis data service in U.S. East 1 region was “being impaired”. The outage also made it difficult to update the status page, which was being closely monitored by AWS’ multiple users. AWS offers its users more than 175 services, including storage services and machine learning software. The outage provided AWS’ competitors Microsoft Azure and Google Cloud the chance to step up to the occasion and make advances. However, there have been no reports of any shift of the companies using the AWS’ to other services. AWS users often use multiple services together and the outage made it difficult as the services work in accordance with each other. When one service suffers a downtime, a majority of services get affected due to a domino effect.

According to the latest reports, the outage had affected multiple vendors, including DataCamp, Roku, Vonage, The Philadelphia Inquirer, Coinbase, 1Password, Adobe Spark, Flickr, iRobot, Acorn, RadioLab, Pocket, Glassdoor, Getaround, and many more.

Read More: AWS Inks Cloud Deal With Two E-Commerce Giants — Mercado Libre and Zalando

On Thursday morning, AWS made the statement, “we have restored all traffic to Kinesis Data Streams via all endpoints, and it is now operating normally. We have also resolved the error rates invoking CloudWatch APIs. We continue to work towards full recovery for IoT SiteWise, and details of the service status are below. All other services are operating normally. We have identified the root cause of the Kinesis Data Streams event and have completed immediate actions to prevent a recurrence.”

Most of its users experiencing the outage also confirmed the restoration of their services through various social media channels. Adobe took to the Twitter platform to state, “all services impacted by the Amazon AWS outage yesterday are now fully resolved- thanks for waiting while we worked to get things back up and running.”

Read More: AWS Lists How Its Machine Learning Tools are Helping Users Overcome COVID Induced Challenges

AWS Service Health Dashboard also updated all the services from suffering from the outage to normal operation.

Updated on 1 December: Earlier, we reported that Amazon Kinesis Data Streams were disrupted and AWS suffered an outage causing multiple vendors to suffer setbacks. The outage also made it difficult for Amazon Web Services to update its closely monitored Health Dashboard. Although the service was resolved, many questions were raised about Kinesis Data Stream’s capacity to handle data without hiccups. The disruption was only seen in Amazon’s US East-1 region in Northern Virginia.

In a lengthy summary, Amazon noted that the disruption was caused due to a “small addition of capacity”, however this addition of capacity wasn’t the root cause of the outage. The increased amount of data was more than allowed by Kinesis’ operating system configuration.

In summary, Amazon stated, “For communication, each front-end server creates operating system threads for each of the other servers in the front-end fleet. Upon any addition of capacity, the servers that are already operating members of the fleet will learn of new servers joining and establish the appropriate threads. It takes up to an hour for any existing front-end fleet member to learn of new participants.”

Cloud

Cloud Management