Amazon Outage Shows How Vulnerable Cloud Services Are
Imagine an AI virus wreaking havoc on the Internet...
In case you hadn’t noticed, half the Internet is hosed today because Amazon’s AWS cloud services. The outage started last night and affected everything from McDonald’s websites to online universities. It’s not just impacting people in the US, but globally.
At around 2 am Pacific last night, Amazon announced that they discovered a “potential root cause” - a DNS failure. I’ll explain what this means in a moment:
Oct 20 2:01 AM PDT We have identified a potential root cause for error rates for the DynamoDB APIs in the US-EAST-1 Region. Based on our investigation, the issue appears to be related to DNS resolution of the DynamoDB API endpoint in US-EAST-1. We are working on multiple parallel paths to accelerate recovery. This issue also affects other AWS Services in the US-EAST-1 Region. Global services or features that rely on US-EAST-1 endpoints such as IAM updates and DynamoDB Global tables may also be experiencing issues. During this time, customers may be unable to create or update Support Cases. We recommend customers continue to retry any failed requests. We will continue to provide updates as we have more information to share, or by 2:45 AM.
Then, they came up with a “new” root cause - network load balancers malfunctioning:
Oct 20 8:43 AM PDT We have narrowed down the source of the network connectivity issues that impacted AWS Services. The root cause is an underlying internal subsystem responsible for monitoring the health of our network load balancers. We are throttling requests for new EC2 instance launches to aid recovery and actively working on mitigations.
So I used to do some web server sysadmin back in the day. I will note that I am NOT an expert on this. I am NOT an expert in cybersecurity. But in my albeit somewhat ignorant view, I do not believe a simple DNS failure would cascade to this level of forked-uppedness.
DNS stands for “Domain Name Server” and it is basically an addressing system that tells your web browser to go to such and such address. A domain name is the dot-com human name - like “amazon.com.” It is inevitably pointing to an IP (Internet Protocol) address code, such as this non-working example 54.239.28.85.
If DNS goes down, the IP code should still functioning and no data is lost. Your computer just can’t access it because the DNS info isn’t properly mapped. DNS information is propagated from main servers to subservers and throughout the Internet, so it’s a bit redundant in that one server shouldn’t collapse the entire system.
DNS failures can cause massive traffic bottlenecks, but these should be easily resolved quickly, with initial relief within an hour. DNS fixes need to propagate though, which is why the fix isn’t instant. Some smaller, regional DNS servers may lag to catch up by perhaps as much as 12-24 hours. But connection problems should start easing up and get better and better as time goes on.
THIS is why I don’t think that a simple DNS problem caused today’s Amazon catastrophe, but what do I know?
Now network load balancers are a LOT more complex than DNS servers, as they are responsible for managing online traffic. To make it simple - if you had a road collapse on portion of the 10 freeway, you might reroute traffic along Pico or Olympic Blvd. This is precisely what happened after the 1994 Northridge earthquake in Los Angeles. The network load balancers are the traffic cops directing and rerouting traffic.
Network load balancers need to be properly tuned and managed, but even the best managed system may not be able to handle a massive influx of malicious traffic, called a DOS “denial of service” or DDOS “distributed denial of service” attack - which is a fancy way of saying a bunch of hackers programmed some bots to hit your website(s) all at once in order to crash them.
DDOS was a problem way before AI - imagine what damage AI bots could do now?!
I can tell you from personal experience that pretty much all web servers are getting hammered constantly by hackers, including small ones. The IP addresses of the hackers are typically from Ukraine, Russia, or China, although that could be spoofed so they could be from anywhere. This was the case years ago even before the Ukraine war started.
My Occam’s Razor therefore tells me that the Amazon failure is likely due to some sort of cyberattack, although it could also just be incompetence by some worker over at Amazon.
The problem is, Amazon’s failure today showcases why cloud services are probably a bad idea for online security in the future, especially with the coming AI viruses and trojans.
Cloud Services Vulnerability
What is a cloud service and why would one going down crash the Internet? There are three major players today in cloud services and hosting:
Amazon Web Services (AWS)
Microsoft Azure
Google Cloud Platform (GCP)
Let me back up and explain where these services came from. When I first started doing web development, I had my own web server. It was literally a Pentium II computer that was housed at a local Internet Service Provider, and my friend gave me the bandwidth for free. I managed the server entirely on my own and it ran on Linux.
Eventually, I didn’t need a full server and moved to shared web-hosting. This is where you and a bunch of other people rent space from a web hosting company. They run the server and then give you a directory to run your website from.
But there were businesses that needed more security than shared web hosting offered, but didn’t need to run a full web server, which was a separate computer. So somebody invented the VPS - a virtual private server, which is basically a partitioned computer where you have what works like your own server but is a part of a bigger machine. It’s like a townhouse where you both share the same building but there’s a massive wall separating one property from the other.
You can still get a VPS, but what if you are an influencer who suddenly starts to get a ton of traffic? Your web server may not have enough power.
Back in the day, what we would do is first set up the influencer on their own server, and put the database (that delivers the content) on a separate server to balance load.
But this would not be enough for someone who suddenly got really, really popular.
Enter in “cloud hosting.” This came out…I dunno, in the mid-2000s? I don’t fully remember. So here’s how this works. When you need more computing power to handle all your traffic, instead of adding a new physical server, you rent extra “power” from a third party, a cloud hosting company.
The other benefit is that you wouldn’t just have your servers in one location, they would be spread out across locations…important if you are living in a place like earthquake prone California.
Most importantly, cloud hosting capabilities can be spun up quickly, meaning you don’t have to go through all the hassle of setting up and maintaining a new separate server.
But cloud hosting, weirdly, is akin to shared hosting in that your data is being housed on massive server networks with thousands or millions of other companies and businesses.
This setup obviously calls into question the security of such arrangements. You also don’t have any control over the cloud hosting like you would if your company kept all your server hosting in-house.
But cloud hosting is cheaper and easier, and so it has exploded.
The cost is, when your cloud host goes down, you are forked.
Now cloud service outages don’t just impact businesses. If you are using Microsoft Office, you likely have OneDrive and if you run your work files off of it, like I do, an outage means you won’t be able to access your documents.
“The cloud” has taken over everything, to the point where we can’t just buy software and run it on our own computers these days without an Internet connection. My younger self thought cloud software was a stupid idea when it first came out, and my younger self was in many ways right.
So today, I have a major concern that hosting most of the Internet’s major server infrastructure on three large companies (Amazon, Microsoft, Google) is just asking for trouble.
Ironically, using cloud hosting was first seen as “decentralizing” the hosting - but it in fact centralizes it in another way. In fact, the whole point of the original Internet was to have a decentralized structure that would be less vulnerable to direct attack.
As we can see with Amazon’s outages, everyone is now too reliant on the techs at Amazon, Microsoft, and Google to have their act together for a functioning Internet.
But you can hire the best system administrators in the world and still not be able to handle a massive cyberattack using AI technologies.
It would be best for future cyber defense if organizations stopped centralizing all cloud hosting in these three Big Tech companies and start putting power back into more localized hosting systems.
It would be far too easy for a malicious actor to take down much of the Internet simply by targeting Amazon, Microsoft, or Google…or all three at once.



It’s is the cycle of things isn’t it… from decentralization to centralization and back again. 🔁
Deploying your infra into the cloud is like stock trading, don't assume resiliency and spread out the risk.