When the walls come down: multitenancy separation failures in IaaS

Shared infrastructure is the economic engine of cloud computing. It is also one of its most persistent security liabilities. Here is what the incident record shows — and what customers should do about it.

The shared-infrastructure bargain

Infrastructure as a Service works because many customers share the same physical hardware. A cloud provider buys one server; dozens of tenants rent slices of it. The economics are compelling on both sides. The security premise is simple: logical isolation — enforced by hypervisors, virtual machines, and software-defined networking — substitutes for physical separation.

For most workloads, most of the time, this works. Major providers invest enormous resources in their isolation stacks. But the history of IaaS security is also the history of that premise being tested, and on a number of well-documented occasions, breached.

A successful cross-tenant compromise does not require physical access. A single hypervisor can host dozens or hundreds of VMs. One breakout can jeopardize all of them simultaneously.

A taxonomy of separation failures

Multitenancy failures in IaaS fall into three broad categories, each with a distinct threat model and incident history.

1. VM escape — the hypervisor breakout

A VM escape is the canonical IaaS isolation failure. An attacker running code inside a guest virtual machine exploits a vulnerability in the hypervisor, virtual device emulation, or the host OS to gain execution on the physical host — and from there, potentially every co-resident VM. These attacks are rare and technically demanding, but the public CVE record documents real instances across all major hypervisor platforms.

CVE	Year	Platform	Description	Severity
`CVE-2008-0923`	2008	VMware Workstation	Directory traversal in the shared folders feature allowed guest-to-host interaction. A working exploit (Cloudburst) was demonstrated publicly at Black Hat USA 2009.	High
`CVE-2015-7835`	2015	Xen Hypervisor	Flaw in Xen’s PV pagetable fast-path allowed a guest to break isolation and interact with the hypervisor, threatening all co-resident guests on the same host.	Critical
`CVE-2019-18420–18425`	2019	Xen / Citrix Hypervisor	A cluster of six CVEs allowed guest VMs to compromise the host system through denial of service and privilege escalation.	Critical
`CVE-2021-29657`	2021	KVM (Linux kernel)	Google Project Zero disclosed the first public KVM guest-to-host breakout not relying on QEMU userspace bugs. Affected kernel v5.10-rc1 through v5.12-rc6 on AMD platforms.	Critical

The KVM finding is particularly significant. KVM is the de-facto standard hypervisor for Linux-based cloud environments — outside of Azure, almost all large-scale providers run on top of it. A kernel-level escape that does not rely on userspace components like QEMU represents a deeper class of isolation failure than prior public research had demonstrated.

2. Side-channel attacks — leakage without escape

Side-channel attacks do not require the attacker to break out of their VM. Instead, they exploit shared hardware resources — CPU caches, execution pipelines, DRAM row buffers — to infer data belonging to a co-resident tenant. These attacks erode isolation guarantees without triggering conventional security controls.

Meltdown & Spectre (2018)

These vulnerabilities affected virtually every modern processor. In IaaS environments, a guest VM could extract data across tenant boundaries via shared L1/L2 CPU caches. They required emergency microcode and OS patches from every major cloud provider simultaneously — a coordinated industry response with no precedent at the time.

ZombieLoad / MDS family (2019)

ZombieLoad, ZombieLoad v2, and related Microarchitectural Data Sampling (MDS) variants enabled L1 cache attacks allowing VMs to read memory outside their sandbox. Citrix and Xen issued emergency advisories. These CVEs — including CVE-2018-12130, CVE-2019-11135, and CVE-2020-0548 — demonstrated that speculative execution attacks were not a one-time problem but an ongoing vulnerability class.

AMD Adrenalin pixel shader injection (2019)

CVE-2019-5124, CVE-2019-5146, and CVE-2019-5147 demonstrated that on Windows 10 with VMware and AMD Radeon hardware, an attacker in a guest could use a pixel shader to cause a host memory error and inject code into the host system — a reminder that GPU passthrough and graphics emulation are underappreciated attack surfaces.

Side-channel attacks do not break the wall. They listen through it — and what they hear can include encryption keys, session tokens, and plaintext data belonging to a neighboring tenant.

3. Control plane and misconfiguration failures

Not all cross-tenant exposure comes from hypervisor exploits. Misconfigurations in storage accounts, overly permissive IAM policies, and inadequate API access controls have caused practical data exposure incidents affecting multiple customers of shared infrastructure. In 2024, AT&T paid a $13 million FCC fine after a breach at a third-party cloud vendor exposed information on nearly nine million customers — a reminder that the blast radius of a shared-infrastructure failure extends well beyond the provider’s own systems.

Forensic investigation is also complicated in multitenant settings: log aggregation across shared components often lacks clear tenant boundaries, making it difficult to determine the precise scope of compromise for any individual customer after an incident.

Why the risk profile is asymmetric

IaaS multitenancy failures are low-frequency events. True cross-tenant exploits in production cloud environments are rare — most disclosed vulnerabilities are found by security researchers before active exploitation. But the severity distribution is sharply asymmetric. A single hypervisor breakout can simultaneously compromise every tenant on that physical host, violating confidentiality, triggering regulatory notification obligations, and requiring providers to evacuate and rebuild entire host machines.

Co-residency is not random. An attacker who can influence VM placement — through timing analysis, network latency measurements, or knowledge of provider allocation patterns — can increase the probability of landing on the same physical host as a target before attempting a cross-tenant attack.

The risk is compounded by the fact that many IaaS workloads are themselves multi-tenant. A SaaS provider running on shared cloud infrastructure introduces a second tier of shared-infrastructure exposure for their own customers — and a breach at the IaaS layer can cascade upward through the stack.

Recommendations for cloud customers

The following recommendations are written for security and infrastructure teams consuming IaaS services. They address what customers can control within the shared responsibility model.

01 — Classify workloads by isolation requirement

Not every workload warrants dedicated tenancy. Identify which applications process regulated data (PII, PHI, PCI) or hold high-value secrets. For those, evaluate dedicated host options — AWS Dedicated Hosts, Azure Isolated VMs, GCP sole-tenant nodes — that provide physical-level separation from other customers.

02 — Apply microcode and kernel patches urgently

Spectre, Meltdown, and MDS variants required microcode updates that cloud providers applied at the hypervisor layer — but guest OS patches remain the customer’s responsibility. Establish a patching SLA that treats hypervisor-class CVEs as P0 incidents requiring action within 24–72 hours of public disclosure.

03 — Minimize the emulation attack surface

Many VM escape CVEs have exploited emulated virtual devices: shared folders, legacy NICs, floppy controllers, USB passthrough, and 3D acceleration. Audit guest VM configurations and disable every virtual device that is not operationally required. Prefer paravirtualized drivers over emulated hardware wherever available.

04 — Encrypt data independently of provider-managed keys

If a hypervisor compromise gives an attacker access to a guest VM’s memory, provider-managed encryption keys stored in the same environment offer limited protection. Use customer-managed keys (CMKs) in hardware security modules (HSMs), and where available, evaluate confidential computing options — AMD SEV, Intel TDX — that encrypt VM memory against the hypervisor itself.

05 — Assume the hypervisor host may be compromised

Design applications so that a compromised host does not give an attacker lateral movement into the control plane or other hosts. Secrets must be scoped: disk images, encryption keys, and service credentials should only be exposed to the hosts that operationally need them. Treat host-level compromise as a scenario to design against, not merely to prevent.

06 — Instrument for anomalous cross-tenant signals

Co-residency detection and cache-timing attacks have known network and performance signatures. Monitor for unusual latency patterns, cache behavior anomalies, and unexpected API calls. In environments processing sensitive data, consider disabling simultaneous multithreading (SMT / Hyperthreading) on sensitive instance types — a trade-off against performance that eliminates an entire class of side-channel attack vectors.

07 — Verify the provider’s isolation architecture

Before selecting an IaaS provider for sensitive workloads, review their published security architecture documentation, third-party audit reports (SOC 2 Type II, ISO 27001), and CVE response history. Ask specifically how quickly critical hypervisor CVEs are patched in production and whether customers receive notification when hosts running their workloads are evacuated and rebuilt.

08 — Plan incident response for a host-level breach

Most cloud incident response plans assume application-layer or credential-based compromise. Extend your runbook to cover the scenario of a shared host being compromised. This includes rotating all secrets accessible from affected instances, treating instance memory as potentially observed, and coordinating with the provider to confirm whether forensic analysis of the host has established the scope of any cross-tenant access.

The shared responsibility model does not end at the hypervisor. Customers who treat cloud infrastructure as inherently trustworthy below their application layer are accepting risks that are documented, exploitable, and in some cases already exploited.

References: CVE-2008-0923 (VMware); CVE-2015-7835 (Xen); CVE-2019-18420–18425 (Xen/Citrix); CVE-2021-29657 (KVM/Linux, Google Project Zero); Meltdown/Spectre (Intel/AMD/ARM, 2018); ZombieLoad MDS family (Intel, 2019); CVE-2019-5124/5146/5147 (AMD/VMware). AT&T FCC consent decree, 2024.