Notable IaaS Security Vulnerabilities and Vendor Responses
Here’s a look at specific, well-documented vulnerabilities across the major IaaS providers — AWS, Azure, and Google Cloud — along with vendor responses and prevention analysis.
Amazon Web Services (AWS)
Log4Shell Impact on AWS Services (December 2021) The Log4j vulnerability (CVE-2021-44228) affected several AWS services that relied on the Java logging library, including CloudFront, OpenSearch, and EC2 instances running customer workloads. AWS was relatively quick to respond, publishing patches and mitigations within days and deploying WAF rules to its managed rulesets. However, the breadth of exposure highlighted how deeply a single open-source dependency can penetrate even a mature cloud platform. AWS’s transparency was moderate — they acknowledged affected services but were criticized for not being more proactive in notifying customers about which specific managed services were at risk.
Confused Deputy / Cross-Tenant IAM Issues (2022 — “Superglue” and “nOAuth” adjacent) Researchers at Ermetic (now Tenable) and others identified scenarios where IAM misconfiguration in AWS services like Glue, EMR, and SageMaker could allow privilege escalation across account boundaries, sometimes called the “confused deputy” problem. In several cases, service roles were overly permissive by default. AWS addressed these by tightening default role policies and adding guardrails, but the fixes were incremental rather than architectural. The underlying issue — that services were granted more permissions than necessary by default — reflects a violation of the least-privilege principle from the outset.
Elastic IP and EC2 Metadata Service (IMDS) Exploitation The 2019 Capital One breach, while not an AWS vulnerability per se, exposed a fundamental design weakness in the EC2 Instance Metadata Service (IMDSv1). The SSRF attack allowed a misconfigured WAF to query the metadata endpoint and retrieve IAM credentials. AWS had already developed IMDSv2, which requires session-oriented tokens and is resistant to SSRF, but had not made it the default or enforced it. AWS’s response was slow by modern standards — they encouraged adoption of IMDSv2 but did not mandate it for years. This became a case study in how “opt-in” security features fail at scale.
Microsoft Azure
OMIGOD (CVE-2021-38647, September 2021) This was one of the most egregious cloud security failures in recent memory. Microsoft silently installed the Open Management Infrastructure (OMI) agent — an open-source Linux management daemon — on Azure Linux VMs when customers enabled certain extensions (Azure Monitor, OMS, Log Analytics). OMI had a critical unauthenticated RCE vulnerability with a CVSS score of 9.8. Customers had no idea the agent was running, had no visibility into it, and had no way to patch it proactively. Microsoft’s response was widely criticized: they patched OMI but did not automatically push the update to all affected VMs — customers had to manually update or trigger a VM extension update. The incident raised serious questions about Microsoft’s practice of silently installing software with root privileges on customer machines without disclosure.
ChaosDB / Cosmos DB (August 2021) Researchers at Wiz discovered that a Jupyter Notebook feature added to Azure Cosmos DB contained a privilege escalation chain that allowed them to gain read/write access to the primary keys of other customers’ Cosmos DB instances — essentially a cross-tenant data exposure at massive scale. Microsoft patched the Jupyter component within 48 hours, but the primary keys themselves could not be rotated by Microsoft on behalf of customers, meaning customers had to manually rotate their own keys. Microsoft notified only a fraction of the affected customers (those they could definitively confirm were at risk), while Wiz believed the actual exposure was far broader. The response, though prompt on the technical side, was criticized for underestimating the scope communicated to customers.
Azure Active Directory (Entra) Token Forgery — “Storm-0558” (2023) Chinese threat actors exploited a flaw in how Microsoft validated authentication tokens for Exchange Online and Outlook, forging tokens using a stolen MSA signing key. This allowed the attackers to access email accounts of approximately 25 U.S. government agencies. Microsoft’s initial response was opaque — they did not proactively notify non-paying customers that their audit logs (needed to detect intrusion) were locked behind premium tiers. This drew sharp criticism from CISA and senators who argued Microsoft was effectively charging customers to detect breaches caused by Microsoft’s own failures. The post-incident review by the Cyber Safety Review Board (CSRB) was scathing, concluding that the breach was “preventable” and that Microsoft’s security culture was inadequate.
Google Cloud Platform (GCP)
GCP Metadata Server Exposure (SSRF, recurring) Similar to AWS’s IMDS issue, GCP’s metadata server at 169.254.169.254 was exploitable via SSRF attacks on misconfigured workloads. GCP introduced metadata concealment and updated their metadata server to require custom request headers, but enforcement was not universal or automatic for all legacy workloads. Google’s response was more proactive than AWS’s in terms of documentation and enforcing the header requirement in newer APIs, though legacy compatibility left residual risk.
Google Kubernetes Engine (GKE) Privilege Escalation (CVE-2018-1002105 and related) GKE, as a managed Kubernetes service, was affected by the critical Kubernetes API server privilege escalation vulnerability (CVE-2018-1002105, CVSS 9.8) that allowed unauthenticated API requests to be escalated to cluster admin. Google patched GKE clusters rapidly — within hours to a day of disclosure — and this response was broadly praised as a best-practice example of coordinated vendor patching in a managed service context. Google’s transparency in notifying affected customers and publishing a detailed advisory was considered exemplary.
Vertex AI / AI Platform Notebook Privilege Escalation (2023) Researchers found that default service accounts attached to Vertex AI and AI Platform Notebooks had overly broad project-level Editor roles, meaning a compromised notebook environment could be used to escalate to broad GCP project access. This echoed the AWS SageMaker issue in structure. Google’s response was to update default service account configurations, but the fix required customer action to apply retroactively, and older environments remained exposed until customers acted.
Cross-Cutting Prevention Analysis
Least Privilege by Default is the most consistently violated principle across all three vendors. In nearly every case above — OMI running as root, SageMaker/Vertex AI over-permissioned service accounts, GKE defaults — the attack surface was enlarged by granting more access than necessary at design time. Enforcing minimal permissions as the default, not an opt-in best practice, would have materially reduced impact.
Supply Chain and Dependency Hygiene is implicated in Log4Shell and OMIGOD. Vendors must maintain a rigorous software bill of materials (SBOM) for every component installed on or alongside customer infrastructure and have automated pipelines to detect and patch vulnerable dependencies. Silent installation of agents (as in OMIGOD) should be categorically prohibited without documented customer consent and visibility.
Automatic, Mandatory Security Updates would have blunted both OMIGOD and Cosmos DB. When a managed service component is patched, the vendor should push that patch automatically rather than placing the burden on the customer, especially when the customer had no knowledge of the component’s existence.
Cryptographic Key and Token Lifecycle Management, as exposed by Storm-0558, demands rigorous separation of signing key environments, hardware security module (HSM) enforcement, and automated key rotation. The stolen MSA signing key should never have been in an environment where it was accessible to systems handling customer-facing tokens.
Transparent Disclosure and Audit Log Access was a systemic failure in the Storm-0558 incident. Security telemetry necessary for breach detection should be a baseline service offering, not a premium add-on. Restricting audit logs to paid tiers creates a perverse incentive structure where vendors profit from their own security failures.
Cross-Tenant Isolation Testing should be a mandatory part of the security review cycle for any multi-tenant service. The Cosmos DB and AWS confused deputy issues both involved attack paths that crossed tenant boundaries — a category of vulnerability that requires dedicated red-team exercises specifically targeting isolation boundaries.
The overall pattern is clear: all three major IaaS vendors have shipped products where security was treated as a feature to be added rather than a constraint built into the architecture from day one. The vendors with the best reputations in any given incident tend to be those who patched fastest, communicated most broadly, and made fixes automatic rather than optional.
