
Table of Contents
- AWS Outage Timeline: What Happened and When
- Initial Detection and Escalation
- Recovery Phase
- List of Major Services and Platforms Affected
- E-Commerce and Retail
- Financial Services
- Social Media and Communication
- Entertainment and Gaming
- Productivity and Business Tools
- Government and Public Services
- AWS Official Statement and Root Cause Analysis
- AWS Acknowledges the Issue
- Identified Root Cause
- Ongoing Investigations
- Duration and Impact of the AWS Outage
- How Long Did the AWS Outage Last?
- Global Economic and Operational Impact
- Why Did AWS Go Down? Understanding the Technical Causes
- DNS Resolution Failures
- Network Load Balancer Subsystem
- Dependency on US-EAST-1
- Lessons Learned and Future Implications for Cloud Reliability
- Need for Multi-Region Redundancy
- Transparency and Communication
- Resilience and Backup Planning
- How Businesses Can Mitigate Future Cloud Outage Risks
- Diversify Cloud Providers
- Implement Failover Mechanisms
- Regular Stress Testing
- Monitor Third-Party Dependencies
- Conclusion
On October 20, 2025, Amazon Web Services (AWS), the world’s leading cloud computing platform, experienced a major outage that sent shockwaves across the global digital landscape. The disruption, centered in the US-EAST-1 region, caused widespread downtime for thousands of websites, applications, and critical online services.
AWS Outage Timeline: What Happened and When
Initial Detection and Escalation
-
11:49 PM PDT, October 19, 2025: AWS first reported increased error rates and latencies for multiple services in the US-EAST-1 region, which is based in Northern Virginia and serves as a critical hub for global cloud operations.
-
12:26 AM PDT, October 20, 2025: AWS identified the issue as DNS resolution problems affecting the regional DynamoDB service endpoints, a core database service used by countless applications.
-
2:24 AM PDT, October 20, 2025: AWS mitigated the initial issue, but residual disruptions persisted for several hours, particularly for services relying on DynamoDB and AWS Lambda.
Recovery Phase
-
Morning of October 20, 2025: AWS confirmed the root cause was an internal subsystem responsible for monitoring the health of network load balancers. The company began applying fixes and restoring services.
-
Afternoon of October 20, 2025: Most AWS services returned to normal operations, although some connectivity issues lingered, especially for AWS Lambda and EC2 instance launches.
-
Evening of October 20, 2025: AWS declared that all services had returned to normal operations, but promised a detailed post-event summary in the coming weeks.
List of Major Services and Platforms Affected
The AWS outage had a cascading effect, disrupting a wide range of popular platforms and services:
E-Commerce and Retail
-
Amazon.com: Users reported checkout failures and slow page loads.
-
McDonald’s App: Partial outages affected mobile ordering and payments.
Financial Services
-
Coinbase: Cryptocurrency trading was temporarily halted, with users unable to access their accounts.
-
Robinhood: Trading app experienced downtime, impacting financial transactions.
-
Venmo: Payment processing was disrupted, leaving users unable to send or receive funds.
Social Media and Communication
-
Snapchat: Users faced login issues and app crashes.
-
Reddit: Elevated error rates and service degradation were reported.
-
Hinge: Dating app users encountered difficulties loading profiles and initiating matches.
Entertainment and Gaming
-
Prime Video: Streaming interruptions and buffering issues.
-
Fortnite: Gamers were unable to log in or access multiplayer features.
-
Disney+: Subscribers experienced playback errors and service unavailability.
Productivity and Business Tools
-
Canva: Design platform suffered downtime, affecting users worldwide.
-
Ring: Security cameras and doorbells went offline, raising concerns about home security.
Government and Public Services
-
Gov.uk: UK Government Websites experienced accessibility issues.
-
Lloyds Banking Group: Online banking services were temporarily unavailable.
AWS Official Statement and Root Cause Analysis
AWS Acknowledges the Issue
AWS publicly acknowledged the outage via its status page, stating:
“We are investigating increased error rates and latencies for multiple AWS services in the US-EAST-1 Region.”
Identified Root Cause
The primary cause was traced to DNS resolution issues with DynamoDB endpoints, compounded by problems in the internal subsystem monitoring network load balancers. AWS later clarified:
“The root cause is an underlying internal subsystem responsible for monitoring the health of our network load balancers.”
Ongoing Investigations
AWS has committed to releasing a comprehensive post-event summary, detailing the technical failures and preventive measures to avoid future incidents.
Duration and Impact of the AWS Outage
How Long Did the AWS Outage Last?
The outage officially began at 11:49 PM PDT on October 19 and was largely resolved by the evening of October 20, 2025. However, full recovery for all services took up to 24 hours, with some residual issues persisting into October 21.
Global Economic and Operational Impact
-
Business Disruptions: Companies reliant on AWS faced lost revenue, decreased productivity, and reputational damage.
-
Consumer Inconvenience: Millions of users were unable to access essential services, from banking to entertainment.
-
Market Reaction: While Amazon’s stock remained stable, the outage underscored the risks of cloud dependency and prompted calls for more resilient infrastructure.
Why Did AWS Go Down? Understanding the Technical Causes
DNS Resolution Failures
The outage originated from DNS issues in the US-EAST-1 region, which hosts critical AWS services like DynamoDB and IAM (Identity and Access Management). These services are foundational to many applications, making the region a single point of failure for global operations.
Network Load Balancer Subsystem
AWS attributed the disruption to an internal subsystem that monitors the health of network load balancers. A failure in this system led to cascading errors across dependent services.
Dependency on US-EAST-1
Many businesses default to using the US-EAST-1 region due to its historical significance and robust infrastructure. However, this concentration increases vulnerability to region-wide outages.
Lessons Learned and Future Implications for Cloud Reliability
Need for Multi-Region Redundancy
The outage highlighted the importance of distributing cloud resources across multiple regions to minimize downtime risks.
Transparency and Communication
AWS’s delayed detailed communication emphasized the need for real-time updates during major incidents to maintain user trust.
Resilience and Backup Planning
Businesses are now reevaluating their cloud strategies, exploring hybrid and multi-cloud solutions to enhance resilience.
How Businesses Can Mitigate Future Cloud Outage Risks
Diversify Cloud Providers
Adopting a multi-cloud approach can reduce dependency on a single provider and improve fault tolerance.
Implement Failover Mechanisms
Automated failover systems can redirect traffic to backup servers during outages, ensuring continuous service availability.
Regular Stress Testing
Conducting regular simulations of cloud failures can help identify vulnerabilities and improve incident response plans.
Monitor Third-Party Dependencies
Businesses should audit their reliance on third-party cloud services and develop contingency plans for critical operations.
Conclusion
The October 20, 2025, AWS outage served as a stark reminder of the internet’s dependence on cloud infrastructure. While AWS has restored services and pledged improvements, the incident underscores the need for robust backup systems, transparent communication, and proactive risk management. As cloud computing continues to evolve, ensuring resilience and reliability will be paramount for both providers and users.