XXE in Apache Tika PDF/XFA parsing
CVE-2025-66516 is a critical XML External Entity (XXE) injection vulnerability in Apache Tika affecting org.apache.tika:tika-core versions 1.13 through 3.2.1, org.apache.tika:tika-parser-pdf-module versions 2.0.0 through 3.2.1, and org.apache.tika:tika-parsers versions 1.13 through 1.28.5. The issue is triggered when Tika processes a crafted PDF containing malicious XFA content, causing unsafe XML parsing and external entity resolution. Although the original entry point was reported in the PDF parser module, the vulnerable code and the fix are in tika-core; in Tika 1.x, the PDFParser resided in tika-parsers, which is why older parser bundles are also affected. This CVE covers the same underlying flaw as CVE-2025-54988 but expands the affected package scope to reflect the actual vulnerable component relationships.
Are you exposed to this one?
Mallory correlates every CVE against your assets, your vendors, and active adversary campaigns. Know which vulnerabilities matter for you, not just which ones are loud.
Impact, mitigation & remediation
What it means. What to do now. Patch path, mitigations, and the assume-compromise checklist.
Impact
What an attacker gets, and what they’ve been doing with it.
Mitigation
If you can’t patch tonight, do this now.
Remediation
Patch, then assume compromise.
Exploits
3 valid exploits after Mallory filtered fakes, detection scripts, and README-only repos (4 hidden).
This repository provides a proof-of-concept (POC) exploit for CVE-2025-66516, an XML External Entity (XXE) vulnerability in Apache Tika (versions 3.2.1 and below) when parsing PDF files containing XFA forms. The repository consists of two main Java files: 1. ExploitGenerator.java: Generates a malicious PDF (poc-xxe.pdf) with an embedded XFA form containing an XXE payload. The payload references a sensitive file on the server (either /etc/passwd for Linux/Mac or C:/Windows/win.ini for Windows) using an external entity. 2. VulnerabilityVerifier.java: Uses Apache Tika to parse the generated PDF. If the target Tika version is vulnerable, the contents of the referenced file are extracted and displayed, demonstrating successful exploitation. The exploit demonstrates the ability to read arbitrary files from the server's filesystem by leveraging the XXE vulnerability. The repository is structured as a minimal Maven project, with dependencies specified for Tika and PDFBox. No network endpoints are involved; the attack is local to the server processing the malicious PDF. The exploit is a POC and does not include weaponized or automated attack features.
This repository provides a comprehensive proof-of-concept (POC) exploit for CVE-2025-66516, a critical XXE vulnerability in Apache Tika (prior to 3.2.2) affecting PDF XFA parsing. The structure includes: - `gen_poc.py`: Python script to generate malicious PDF files with embedded XFA/XXE payloads for local file disclosure. - `gen_oob_poc.py`: Python script to generate PDFs that trigger out-of-band (OOB) XXE, exfiltrating file contents to an attacker-controlled HTTP server. - `http_listener.py`: Python HTTP server to receive exfiltrated data and serve the malicious DTD for OOB XXE. - `DocumentProcessor.java`: Example Java application using Apache Tika in a vulnerable configuration, demonstrating how the exploit is triggered during PDF parsing. - Documentation files (`README.md`, `DISCLAIMER.md`, `SECURITY.md`) provide detailed setup, legal, and ethical guidance. The exploit demonstrates both local file read and OOB exfiltration vectors. The attacker crafts a PDF with a malicious XFA form, which, when processed by a vulnerable Tika instance, causes the server to read arbitrary files and (optionally) send their contents to an external HTTP endpoint. The repository is well-documented, with clear instructions for setup, testing, and responsible use. No fake or destructive payloads are present; all code is focused on demonstrating the vulnerability for educational and research purposes.
This repository provides a full operational exploit and lab environment for CVE-2025-66516, a critical XXE vulnerability in Apache Tika (1.13-3.2.1 and related modules). The exploit chain is implemented in Python and includes: - `poc/exploit.py`: An automated exploitation tool that generates malicious PDF files with XFA/XXE payloads, uploads them to a target Tika server, and extracts sensitive data. It supports arbitrary file reads, SSRF to cloud metadata endpoints (AWS, GCP, Azure), Kubernetes secrets extraction, and exfiltration to attacker-controlled servers. - `poc/generate_payload.py`: A payload generator for crafting custom malicious PDFs targeting specific files or URLs, with support for OOB (out-of-band) exfiltration. - `docker-compose.yml` and Dockerfiles: Provide a lab environment with both vulnerable and protected Tika server instances, a demo Flask web application (webapp/app.py) that uploads documents and interacts with Tika, and an attacker listener server. - The web application demonstrates a realistic scenario where user-uploaded documents are processed by Tika, exposing the XXE vulnerability. The exploit is highly customizable, operational, and demonstrates real-world impact. Numerous fingerprintable endpoints are targeted, including local files, cloud metadata services, and internal configuration files. The repository is well-structured for both research and practical exploitation.
Affected products & vendors
Products and vendors Mallory has correlated with this vulnerability. Open in Mallory to drill down to specific CPE configurations and version ranges.
Vendor-confirmed product mapping. Mallory continuously reconciles this list against your asset inventory.
Recent activity
119 sources tracked across advisories, community write-ups, and news. New activity surfaces here as Mallory finds it.
Critical XXE vulnerability in Apache Tika modules (tika-core, tika-pdf-module, tika-parsers) enabling XML external entity injection; urgent patch advised.
Unknown
A high-severity XML External Entity (XXE) vulnerability in Apache Tika (core library) that allows attackers to exfiltrate sensitive files, perform SSRF, or cause DoS by processing malicious XFA files embedded in PDFs. The flaw affects tika-core versions 1.13 through 3.2.1 and related parser modules.
A maximum severity vulnerability in Apache Tika, identified as CVE-2025-66516, was fixed by Atlassian.
The version that knows your environment.
Query your assets running an affected version, and investigate the blast radius.
Every observed campaign linking this CVE to a named adversary.
Malware families riding this exploit, with evidence and IOCs.
YARA, Sigma, Snort, and vendor rules, auto-deployed to your SIEM.
Cross-references every affected SKU, including bundled OEM variants.
Community discussion across Reddit, Mastodon, and other social sources.