Back in July this year, in Mitigating XPath Injection Attacks in PHP I was writing about how to properly quote a string in PHP’s Xpath 1.0.
The code presented there was based on the assumption that the resulting expression is binary safe.
However that was too shortsighted because Xpath in PHP can be attacked using null-byte-injection. The PHP extension does cut-off the string at the first null-byte, allowing you to truncate an expression early.
/*/user[name = 'Mirza']/secret<NUL>]/location
Technically XML covers the full Unicode repertoire excluding the surrogate blocks FFFE and FFFF and excluding most US-ASCII control characters (those below space), only Tab, Line-Feed (LF) and Carriage-Return (CR) are allowed in XML.
This is also the reason when you need to safely transport binary data with XML, that you need to encode it, for example in base64 (See base64Binary primitive XML datatype), because otherwise the XML would be broken resulting in data-loss.
Back to the mentioned XPath injection attacks and how to mitigate them. If an injected string is able to cut-off at the first null-byte position, the quoting as described does not work stable any longer. An attacker can break out of it by injecting a null-byte. The impact is not very high, because of the quoting that xpath_string() applies, injecting a null-byte will result in a Unfinished literal warning.
However when data is injected not as string with the help of xpath_string(), null-bytes do still play against you in PHP Xpath. As those are not valid anyway in XML and therefore no text or identifiers can contain it, you can safely reject or sanitze null-bytes further up in the input processing. For example as Suhosin can do.
So better keep in mind to verify incoming (Unicode) data your application accepts. Even valid Unicode, it might not always be appropriate.