AWS Outage October 2025: A Comprehensive Breakdown of the Global Cloud Disruption

AWS Outage Timeline: What Happened and When
Initial Detection and Escalation
Recovery Phase
List of Major Services and Platforms Affected
E-Commerce and Retail
Financial Services
Social Media and Communication
Entertainment and Gaming
Productivity and Business Tools
Government and Public Services
AWS Official Statement and Root Cause Analysis
AWS Acknowledges the Issue
Identified Root Cause
Ongoing Investigations
Duration and Impact of the AWS Outage
How Long Did the AWS Outage Last?
Global Economic and Operational Impact
Why Did AWS Go Down? Understanding the Technical Causes
DNS Resolution Failures
Network Load Balancer Subsystem
Dependency on US-EAST-1
Lessons Learned and Future Implications for Cloud Reliability
Need for Multi-Region Redundancy
Transparency and Communication
Resilience and Backup Planning
How Businesses Can Mitigate Future Cloud Outage Risks
Diversify Cloud Providers
Implement Failover Mechanisms
Regular Stress Testing
Monitor Third-Party Dependencies
Conclusion

On October 20, 2025, Amazon Web Services (AWS), the world’s leading cloud computing platform, experienced a major outage that sent shockwaves across the global digital landscape. The disruption, centered in the US-EAST-1 region, caused widespread downtime for thousands of websites, applications, and critical online services.

AWS Outage Timeline: What Happened and When

Initial Detection and Escalation

11:49 PM PDT, October 19, 2025: AWS first reported increased error rates and latencies for multiple services in the US-EAST-1 region, which is based in Northern Virginia and serves as a critical hub for global cloud operations.
12:26 AM PDT, October 20, 2025: AWS identified the issue as DNS resolution problems affecting the regional DynamoDB service endpoints, a core database service used by countless applications.
2:24 AM PDT, October 20, 2025: AWS mitigated the initial issue, but residual disruptions persisted for several hours, particularly for services relying on DynamoDB and AWS Lambda.

Recovery Phase

Morning of October 20, 2025: AWS confirmed the root cause was an internal subsystem responsible for monitoring the health of network load balancers. The company began applying fixes and restoring services.
Afternoon of October 20, 2025: Most AWS services returned to normal operations, although some connectivity issues lingered, especially for AWS Lambda and EC2 instance launches.
Evening of October 20, 2025: AWS declared that all services had returned to normal operations, but promised a detailed post-event summary in the coming weeks.

List of Major Services and Platforms Affected

The AWS outage had a cascading effect, disrupting a wide range of popular platforms and services:

E-Commerce and Retail

Amazon.com: Users reported checkout failures and slow page loads.
McDonald’s App: Partial outages affected mobile ordering and payments.

Financial Services

Coinbase: Cryptocurrency trading was temporarily halted, with users unable to access their accounts.
Robinhood: Trading app experienced downtime, impacting financial transactions.
Venmo: Payment processing was disrupted, leaving users unable to send or receive funds.

Social Media and Communication

Snapchat: Users faced login issues and app crashes.
Reddit: Elevated error rates and service degradation were reported.
Hinge: Dating app users encountered difficulties loading profiles and initiating matches.

Entertainment and Gaming

Prime Video: Streaming interruptions and buffering issues.
Fortnite: Gamers were unable to log in or access multiplayer features.
Disney+: Subscribers experienced playback errors and service unavailability.

Productivity and Business Tools

Canva: Design platform suffered downtime, affecting users worldwide.
Ring: Security cameras and doorbells went offline, raising concerns about home security.

Government and Public Services

Gov.uk: UK Government Websites experienced accessibility issues.
Lloyds Banking Group: Online banking services were temporarily unavailable.

AWS Official Statement and Root Cause Analysis

AWS Acknowledges the Issue

AWS publicly acknowledged the outage via its status page, stating:

“We are investigating increased error rates and latencies for multiple AWS services in the US-EAST-1 Region.”

Identified Root Cause

The primary cause was traced to DNS resolution issues with DynamoDB endpoints, compounded by problems in the internal subsystem monitoring network load balancers. AWS later clarified:

“The root cause is an underlying internal subsystem responsible for monitoring the health of our network load balancers.”

Ongoing Investigations

AWS has committed to releasing a comprehensive post-event summary, detailing the technical failures and preventive measures to avoid future incidents.

Duration and Impact of the AWS Outage

How Long Did the AWS Outage Last?

The outage officially began at 11:49 PM PDT on October 19 and was largely resolved by the evening of October 20, 2025. However, full recovery for all services took up to 24 hours, with some residual issues persisting into October 21.

Global Economic and Operational Impact

Business Disruptions: Companies reliant on AWS faced lost revenue, decreased productivity, and reputational damage.
Consumer Inconvenience: Millions of users were unable to access essential services, from banking to entertainment.
Market Reaction: While Amazon’s stock remained stable, the outage underscored the risks of cloud dependency and prompted calls for more resilient infrastructure.

Why Did AWS Go Down? Understanding the Technical Causes

DNS Resolution Failures

The outage originated from DNS issues in the US-EAST-1 region, which hosts critical AWS services like DynamoDB and IAM (Identity and Access Management). These services are foundational to many applications, making the region a single point of failure for global operations.

Network Load Balancer Subsystem

AWS attributed the disruption to an internal subsystem that monitors the health of network load balancers. A failure in this system led to cascading errors across dependent services.

Dependency on US-EAST-1

Many businesses default to using the US-EAST-1 region due to its historical significance and robust infrastructure. However, this concentration increases vulnerability to region-wide outages.

Lessons Learned and Future Implications for Cloud Reliability

Need for Multi-Region Redundancy

The outage highlighted the importance of distributing cloud resources across multiple regions to minimize downtime risks.

Transparency and Communication

AWS’s delayed detailed communication emphasized the need for real-time updates during major incidents to maintain user trust.

Resilience and Backup Planning

Businesses are now reevaluating their cloud strategies, exploring hybrid and multi-cloud solutions to enhance resilience.

How Businesses Can Mitigate Future Cloud Outage Risks

Diversify Cloud Providers

Adopting a multi-cloud approach can reduce dependency on a single provider and improve fault tolerance.

Implement Failover Mechanisms

Automated failover systems can redirect traffic to backup servers during outages, ensuring continuous service availability.

Regular Stress Testing

Conducting regular simulations of cloud failures can help identify vulnerabilities and improve incident response plans.

Monitor Third-Party Dependencies

Businesses should audit their reliance on third-party cloud services and develop contingency plans for critical operations.

Conclusion

The October 20, 2025, AWS outage served as a stark reminder of the internet’s dependence on cloud infrastructure. While AWS has restored services and pledged improvements, the incident underscores the need for robust backup systems, transparent communication, and proactive risk management. As cloud computing continues to evolve, ensuring resilience and reliability will be paramount for both providers and users.

FX Radar