XPath Overnight


A fascinating thing has happened in the world of JavaScript DOM traversal: Over the course of a couple months in 2007 three of the major JavaScript libraries (Prototype, Dojo, and Mootools) all switched their CSS selector engines to using the browser’s native XPath functionality, rather than doing traditional DOM traversal. What’s interesting about this is that the burden of functionality and performance has, literally, flipped overnight on to the browser’s XPath engine, from its pure DOM implementation.

There’s some really interesting things about this switch:

  • Native XPath is blazing fast. For a majority of CSS selectors it completely trumps using native DOM methods (like getElementsByTagName, for example). Sometimes it pays to special-case your code for selectors like #id, but overwhelmingly XPath is the direction in which JavaScript libraries are heading.
  • Since a large percentage of JavaScript users use JavaScript libraries (and, thus, use the behind-the-scenes XPath, as well) this means that browsers are now spending significantly more time processing XPath queries than they ever were before. This means that the performance field is now, effectively, split between two areas: Traditional DOM querying and XPath.
  • No one is analyzing the performance of browser XPath queries. Or, if they are, it’s certainly not public. I’m working on some new XPath performance tests, in order to bring them some more visibility, and hope to have them released this week.
  • XPath, while incredibly useful, is a black box. The developer has no control over how fast the results come back – or if they are even correct. Contrast this with traditional DOM scripting (where you can fine-tune your queries to perfection). Browsers will always be bound to have some bugs in their implementations. For example, Safari 3 isn’t capable of doing “-of-type” or “:empty” style CSS selectors, nor is any browser able to access the ‘checked’ property, or namespaced attributes (all noted in Prototype’s implementation) which means that they have to fall back to a traditional DOM scripting model.
  • Internet Explorer is a dead-end. Since most users want a CSS selector implementation that will work against HTML documents – and IE is unable to provide one – all CSS Selector implementations must provide two (2) side-by-side selector engines in order to handle these cases (not to mention the aforementioned cases where browsers provide unexpected behavior).

A couple things to take away from all of this:

  1. XPath (and new methods like querySelector) are the way of the future for a lot of JavaScript libraries – and the next frontier for browser optimization.
  2. These implementations are black boxes that are unable to be modified by the developer (leaving them vulnerable to browser bugs).
  3. A dual DOM-only CSS selector engine must be provided well into the foreseeable future, by libraries, in order to account for browser mis-implementations.

I should, also, probably answer the inevitable question: “Why doesn’t jQuery have an XPath CSS Selector implementation?” For now, my answer is: I don’t want two selector implementations – it makes the code base significantly harder to maintain, increases the number of possible cross-browser bugs, and drastically increases the filesize of the resulting download. That being said, I’m strongly evaluating XPath for some troublesome selectors that could, potentially, provide some big performance wins to the end user. In the meantime, we’ve focused on optimizing the actual selectors that most people use (which are poorly represented in speed tests like SlickSpeed) but we hope to rectify in the future.

Posted: February 10th, 2008


If you particularly enjoy my work, I appreciate donations given with Gittip.

19 Comments (Show Comments)



Comments are closed.
Comments are automatically turned off two weeks after the original post. If you have a question concerning the content of this post, please feel free to contact me.


Secrets of the JavaScript Ninja

Secrets of the JS Ninja

Secret techniques of top JavaScript programmers. Published by Manning.

Ukiyo-e Database and Search

Ukiyo-e.org

Japanese woodblock print database and search engine.


John Resig Twitter Updates

@jeresig

Infrequent, short, updates and links.