While playing with a parser experiment that fully supports the CSS selectors syntax and the discovery of the Selectors API I started to think about the transformation from CSS selectors to XPath. Sure I’m not the only one, so I collected some existing resources to get a broad overview (you find the list below). What I have missed inside those documents is that often not the full picture is shown, so I took the opportunity to add more details about case-sensitivity in the HTML context.
For PHP developers it might be interesting, that the xpath examples here have been tested to work with DOMXpath
which is part of PHP’s DOMDocument
extension and they are for HTML use. The CSS examples are an adoption of the CSS3 selectors summary table, pseudo-classes have been left out, I might write about them soon. Some combination examples have been put in additionally because I thought they are interesting:
CSS | Xpath | Meaning |
---|---|---|
* |
//* |
any element |
P |
//P|//p |
an element of type P |
Remark: CSS syntax is case-insensitive within the ASCII range (i.e., [a-z] and [A-Z] are equivalent). Most examples don’t reflect that, here the xpath node-sets union operator | is used to get p and P tags, which is only one example to achieve case-insensitivity. |
||
BODY |
//*['BODY' = translate(name(.), 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')] |
an element of type BODY |
Remark: Another variant for case-insensitive element name matching. | ||
P[align] |
(//P|//p)[@align] |
a P element with a “align” attribute |
CSS | Xpath | Meaning |
P[align] |
(//P|//p)/@*['ALIGN' = translate(name(.), 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')]/.. |
a P element with a “align” attribute |
Remark: This example is about case-sensitivity again which does apply to attribute names as well. It’s a more correct variant of the previous example if the CSS is about a HTML document, which is normally the case. | ||
P[class~="intro"] |
(//P|//p)[contains(concat(' ', normalize-space(@class), ' '), concat(' ', 'intro', ' '))] |
a P element whose “class” attribute value is a list of whitespace-separated values, one of which is exactly equal to “intro” |
Remark: This is the famous CSS class selector, P.intro in this specific case. |
||
P.intro |
(//P|//p)[contains(concat(' ', normalize-space(@class), ' '), concat(' ', 'intro', ' '))] |
a P element whose class is “intro” (the document language specifies how class is determined). |
Remark: As this example is for HTML, it’s the same as the previous P[class~="intro"] . Because this is too simple, the next example will add case-insensitivity for all parts. |
||
CSS | Xpath | Meaning |
P.intro |
//*['P' = translate(name(.), 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')]/@*['CLASS' = translate(name(.), 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ') and contains(concat(' ', normalize-space(.), ' '), concat(' ', 'intro', ' '))] |
a P element whose class is “intro” (the document language specifies how class is determined). |
Remark: Case-insensitive class attribute, the classname itself is case sensitive. | ||
P[align^="le"] |
(//P|//p)[starts-with(@align, 'le')] |
a P element whose “align” attribute value begins exactly with the string “le” |
P[align$="t"] |
(//P|//p)[substring(@align, string-length(@align), 1) = 't'] |
a P element whose “align” attribute value ends exactly with the string “t” |
Remark: Different to the previous example, there is no ends-with string function in xpath, so the string-length and substring functions are used. |
||
P[align$="t"] |
//*['P' = translate(name(.), 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')]/@*['ALIGN' = translate(name(.), 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ') and substring(., string-length(.), 1) = 't']/.. |
a P element whose “align” attribute value ends exactly with the string “t” |
Remark: Case-Insensitive variant for both the tag- and the attributename. This example demonstrates well which impact the CSS specification has when a simple looking CSS selector is ported to xpath. | ||
CSS | Xpath | Meaning |
P[align*="igh"] |
(//P|//p)[contains(@align, 'igh')] |
a P element whose “align” attribute value contains the substring “igh” |
P[lang|="en"] |
(//P|//p)[@lang='en' or starts-with(@lang, 'en-')] |
a P element whose “lang” attribute has a hyphen-separated list of values beginning (from the left) with “en” |
Remark: This is not the same as :lang("en") which to the best of my knowledge is not possible to port to xpath in a single expression. |
||
P * |
(//P|//p)//* |
all descendant elements of a P element (Descendant combinator) |
P > * |
(//P|//p)/* |
all child elements of a P element (Child combinator) |
CSS | Xpath | Meaning |
P > *:first-child |
(//P|//p)/*[1] |
any element, first child of its parent P element |
H1 + P |
(//P|//p)['H1' = translate(name(preceding-sibling::*[1]), 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')] |
a P element immediately preceded by an H1 element |
H1 ~ P |
(//P|//p)['H1' = translate(name(preceding-sibling::*), 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')] |
a P element preceded by an H1 element |
This subset of CSS selectors shows, that as far as HTML documents are concerned, it’s not as simple as it has been outlined in existing documents – because of case-sensitivity. But next to case-sensitivity, there are also some xpath string issues. For example if your search for the string more "of 'this'"
you can not put that literally into any of the xpath expressions above. If you plan to manually write those xpath expressions, things become more and more akward – but it’s still possible.
Namespaces aren’t reflected in full and this needs additional discussion. CSS can have a default namespace now and can have other namespaces. Luckily as far as HTML documents are concerned, practically there is not much namespacing involved, so probably it’s ok to keep it out of this first table.
However pseudo-classes are largely missing and they are quite interesting as well. So there is some room for a follow-up post.
Resources
I’ve used the following blog-posts for conversion suggestions/examples:
- XPath and CSS Selectors (13 Dec 2005; by John Resig)
- How to map CSS selectors to XPath queries (24 Sep 2006; by Aristotle Pagaltzis)
- CSS Selectors And XPath Expressions (07 Apr 2010; by Thomas Weinert)
Related Stackoverflow CSS Selector to XPATH questions:
- Problem with upper-case and lower-case xpath functions in selenium IDE
- XPath: How to match attributes that contain a certain string
- Encoding XPath Expressions with both single and double quotes
- New: Cleaning/sanitizing xpath attributes
- New: Select elements with attribute value of self or descendant but not “overruled” (like a lang attribute)
Pingback: PHP: XPath on HTML and XHTML | hakre on wordpress