Blog


getElementsByClassName Speed Comparison

Mark Finkle suggested that I do some speed testing, now that a native implementation of getElementsByClassName has landed in the Mozilla trunk (destined for Firefox 3).

So I went around and dug up all of the different, existing, implementations that I could find. Currently, implementations fall into one of three categories (with some straddling more than one):

  • Pure DOM
    This usually involves a calls to .getElementsByClassName("*") and traversing through all matched elements, analyzing each element's className attribute along the way. Generally, the fastest method is to use a pre-compiled RegExp to test the value of the className attribute.
  • DOM Tree Walker
    Is a less-popular means of traversing DOM documents by setting some simple parameters, as specified by the DOM Level 2 Spec. For example, you could traverse through all text nodes in a document (something that you can't easily do in any other way).
  • XPath
    The most recent technique, to be popularized, was the use of XPath to find elements by classname. The implementation is generally simple: Building a single expressions and letting the XPath engine traverse through the document, finding all the relevant elements.

I've chosen some implementations that were representative of each of these techniques.

Tree Walker

An implementation using the DOM Level 2 Tree Walker methods. Builds a generic filter function and traverses through all elements.

document.getElementsByClass = function(needle) {
  function acceptNode(node) {
    if (node.hasAttribute("class")) {
      var c = " " + node.className + " ";
       if (c.indexOf(" " + needle + " ") != -1)
         return NodeFilter.FILTER_ACCEPT;
    }
    return NodeFilter.FILTER_SKIP;
  }
  var treeWalker = document.createTreeWalker(document.documentElement,
      NodeFilter.SHOW_ELEMENT, acceptNode, true);
  var outArray = new Array();
  if (treeWalker) {
    var node = treeWalker.nextNode();
    while (node) {
      outArray.push(node);
      node = treeWalker.nextNode();
    }
  }
  return outArray;
}

The Ultimate getElementsByClassName

Uses a pure DOM implementation, tries to make some optimizations for Internet Explorer.

function getElementsByClassName(oElm, strTagName, strClassName){
    var arrElements = (strTagName == "*" && oElm.all)? oElm.all :
        oElm.getElementsByTagName(strTagName);
    var arrReturnElements = new Array();
    strClassName = strClassName.replace(/\-/g, "\\-");
    var oRegExp = new RegExp("(^|\\s)" + strClassName + "(\\s|$)");
    var oElement;
    for(var i=0; i<arrElements.length; i++){
        oElement = arrElements[i];     
        if(oRegExp.test(oElement.className)){
            arrReturnElements.push(oElement);
        }   
    }
    return (arrReturnElements)
}

Dustin Diaz's getElementsByClass

A pure DOM implementation, caches the regexp, and is generally quite simple and easy to use.

function getElementsByClass(searchClass,node,tag) {
        var classElements = new Array();
        if ( node == null )
                node = document;
        if ( tag == null )
                tag = '*';
        var els = node.getElementsByTagName(tag);
        var elsLen = els.length;
        var pattern = new RegExp("(^|\\s)"+searchClass+"(\\s|$)");
        for (i = 0, j = 0; i < elsLen; i++) {
                if ( pattern.test(els[i].className) ) {
                        classElements[j] = els[i];
                        j++;
                }
        }
        return classElements;
}

Prototype 1.5.0 (XPath)

Mixes an XPath and DOM implementation; using XPath wherever possible.

document.getElementsByClassName = function(className, parentElement) {
  if (Prototype.BrowserFeatures.XPath) {
    var q = ".//*[contains(concat(' ', @class, ' '), ' " + className + " ')]";
    return document._getElementsByXPath(q, parentElement);
  } else {
    var children = ($(parentElement) || document.body).getElementsByTagName('*');
    var elements = [], child;
    for (var i = 0, length = children.length; i < length; i++) {
      child = children[i];
      if (Element.hasClassName(child, className))
        elements.push(Element.extend(child));
    }
    return elements;
  }
};

Native, Firefox 3

A native implementation, written in C++; is a part of the current CVS version of Firefox, will be included in Firefox 3.

document.getElementsByClassName

The Speed Results

For the speed tests I copied the Yahoo homepage into a single HTML file and used that as the test bed. They make good use of class names (both single and multiple) and is a considerably large file with lots of elements to consider.

You can find the test files, for each of the implementations, here:
http://ejohn.org/apps/classname/

Note: "XPath" is just Prototype's implementation.

From these figures we can see that the native implementation of getElementsByClassName, in Firefox 3, is a full 8x faster than the XPath implementation. Additionally, it's a stunning 77x faster than the fastest DOM implementation.

Note: These numbers have been revised from what was originally posted as the lazy-loading nature of document.getElementsByClassName wasn't taken into account. The resulting arrays are completely looped-through now, making sure that all elements are accounted for.

Currently, Prototype has the best general-use implementation: Use XPath selectors wherever possible, fall back to fast DOM parsing.

Interestingly, only Prototype actually tries to implement the document.getElementsByClassName interface (all others do one-off names). However, Prototype doesn't check to see if the document.getElementsByClassName property already exists, and completely overwrites the, incredibly fast, native implementation that Firefox 3 provides (oops!).

In all, the results are quite astounding. The native implementation is absolutely much faster than anything I could've imagined. It completely decimates all the other pieces of code. I can't wait until this hits the general public - users will, absolutely, feel a significant increase in speed.

Tags: javascript, speed, firefox3, firefox, whatwg, xpath, dom

getElementsByClassName in Firefox 3

I've been waiting for this one for a while - and it was just committed to the Mozilla trunk yesterday:
Firefox 3 is going to have support for getElementsByClassName.

Robert Sayre just merged in his changes yesterday, taking this feature live.

If you're curious as to why this feature is being included (or where the reasoning for it originated from) - it's because it's part of the Web Applications 1.0 (HTML 5) specification. The implementation that's in Firefox is slightly different from what's presented in the specification; however, you can expect that the specification will probably be updated to reflect that changes that've been made.

getElementsByClassName has long been a mainstay of web developers everywhere - and by making it official (both in specification and in implementation), web applications are going to see a huge jump in speed.

I've pulled together some simple examples of what you can do with this new element selector.

Get all elements that have a class of 'test'

document.getElementsByClassName('test')

Get all elements that have a class of 'red' and 'test'

document.getElementsByClassName('red test')

Get all elements that have a class of 'test', inside of an element that has the ID of 'main'

document.getElementById('main').getElementsByClassName('test')

And if we go ahead and add in JavaScript 1.6's Array extras, we can do some really-cool matches.

Find all div elements that have a class of 'test'

Array.filter( document.getElementsByClassName('test'), function(elem){
    return elem.nodeName == 'DIV';
});

Find all elements that have a class of 'test' (as do their parent element)

var test = document.getElementsByClassName('test');
Array.filter( test, function(elem){
    return Array.indexOf( test, elem.parentNode ) > -1;
});

Some basic code can be found in the test cases for this feature. You'll need to have a nightly version of Firefox in order to run the code contained within it. I really can't wait until this is live.

Update: This post has been submitted to Digg.

Update: This function is now documented on the MDC Wiki.

Tags: javascript, traversing, whatwg, dom, getelementsbyclassname, traversal, html5

DOM Storage Answers

In a follow-up to my previous post on DOM Storage, here's an attempt to answer some of the questions posed in the comments, and elsewhere.

What’s to stop a script from filling an entire storage area with random data?

As best as I can tell with the implementation in Firefox, every time you save data to a storage area, not only is the name of the storage area saved (e.g. "org" or "ejohn.org") but so is the domain name of the original script. This domain name is what has the 5MB limitation imposed upon it.

This means that a single script could put 2MB of data in globalStorage['org'] and another 3MB in globalStorage['ejohn.org'], but attempting to put any more data in will cause an error to occur.

I may be wrong on this point; but if that's the case, then it will be very hard to stop scripts from filling up "common" areas with nonsense data (and which seems like something that was considered as a fundamental design decision).

(Update: This is, currently, a moot point in Firefox as it does not enable any of the TLD or public storage areas, for fear of an issue like this happening.)

Storing data to globalStorage['localdomain']

I forgot to mention that all LAN addresses receive a TLD of 'localdomain'. For example, if your computer is named 'anthrax', you could store data in the following stores: globalStorage['anthrax.localdomain'], globalStorage['localdomain'], and globalStorage[''].

This means that you can, in fact, store data "globally" across all pages of an intranet.

What's with the name "DOM Storage"?

I've been having trouble pinpointing this one - but I agree, the name is particularly deceptive. What's especially confusing is that "DOM Storage" isn't the official name for this particular feature, "Client-side session and persistent storage" is.

As it happens, Mozilla's internal name of this features is "DOMStorage" (the names "Storage", "mozStorage", and "sessionStorage" were all already in use), I'm beginning to suspect that this naming confusion has stemmed from this, original, feature-naming. (Note: This has been confirmed.)

In reality, the storage area doesn't really have much to do with the Document Object Model - it really only attaches to the WindowHTML Object. The only time it actually interacts with the DOM, as we know it, is to trigger a "storage" event on the document body.

The storage event

I forgot to mention this in my last post, but this is a rather interesting sub-feature of DOM Storage. Whenever a key/value is changed, added, or removed within a storage area that you have permission to access, a 'storage' event is triggered, originating from the HTML document body and bubbling its way up.

The only piece of event metadata that is passed along in the event object is the name of the domain that the changed-key is within - and nothing else. (Providing any additional information would serve to be a security risk)

Update: Firefox does currently support this, and you can view a test case here.

Server-side Data

No data from DOM Storage areas is automatically passed to the server. This differs from cookies, which serve as a two-way form of synchronous communication between a server and a browser.

You could work around this limitation with an XMLHttpRequest, POSTing the contents of your relevant storage areas.

Known Bugs

As I mentioned before, Firefox has the only known implementation of DOM Storage available to any browser. However, there are still some serious limitations with it.

  • sessionStorage - Does not restore data across browser crash/restore.
  • globalStorage['org'] - Storing data to a single TLD fails
  • globalStorage[''] - Storing data to a public area fails

Update: Neil Deakin has gotten back to me and mentioned that the lack of TLD or public storage areas was intentional. When implementing the specification he felt that there was, quite simply, too much of a security risk in leaving those public areas open. I suspect that since Firefox is the first browser to make an attempt an implementation, much of this will trickle back to the specification.

We're still in the rocky first moments of this feature, but once it starts to pick up some solid adoption (especially amongst other browsers), I suspect that we'll see the level of quality really start to take off.

Tags: domstorage, whatwg, javascript, html5, firefox, mozilla

DOM Storage

The first thing that I tasked myself with, when I hopped on board with Mozilla, was to learn everything that I could about the new DOM Storage functionality provided in the HTML 5 specification. Interestingly, it's actually very impressive.

The brief summary: DOM Storage is a way to store meaningful amounts of client-side data in a persistent and secure manner.

I've written up a thorough piece of documentation on DOM Storage, and it can be found on the DevMo Wiki.

If you hadn't guessed already, DOM Storage is most similar, in nature, to HTTP Cookies. I'm going to already assume that you know how Cookies work, so with that in mind, here is how DOM Storage compares.

Storage Space

It is implied that, with DOM Storage, you have considerably more storage space than the typical user agent limitations imposed upon Cookies. However, the amount that is provided is not defined in the specification, nor is it meaningfully broadcast by the user agent.

If you look at the Mozilla source code we can see that 5120KB is the default storage size for an entire domain. This gives you considerably more space to work with than a typical 2KB cookie.

However, the size of this storage area can be customized by the user (so a 5MB storage area is not guaranteed, nor is it implied) and the user agent (Opera, for example, may only provide 3MB - but only time will tell.)

Storage / Retrieval

Where as in cookies you have to encode your key/value pairs into a string, along with other information (such as expiration and path), DOM Storage is a breath of fresh air. It provide a clean, JavaScript-like API for accessing and mutating key/value pairs. For example:

// Save a value
sessionStorage[ 'name' ] = "John";
// Display a value
alert( sessionStorage[ 'name' ] );
=> 'John'

This is absolutely fantastic, in comparison to Cookies. Note that, like with cookies, you can only store strings of data as a value. Additionally, no form of object serialization is provided by the browser, making some storage tasks non-trivial.

At this point, I'd be happy with one of two solutions: 1) For the user agent to take care of object (de)serialization for us, making it completely seamless or 2) To provide us with a really good, really fast, JSON (de)serializer. Honestly, I'd prefer the second solution, simply because it could be used in so many other situations. (And if I've heard right, one is definitely in the works by multiple browser vendors.)

Security

Cookies provide you with the ability to limit the accessibility of your key/value pairs to a certain domain, and even a certain path within that domain. DOM Storage works similarly, but only allows you to restrict access to domain (or TLD, or even public!) or session.

The data stored in a DOM Storage area is much more "public" than what's provided by cookies. For example, let's say you wanted to use DOM Storage on a public site like LiveJournal. There would be no way for you to hide your data from other users -- other than to choose a key that is completely private and un-reproducible; since there is no "path" limiter for DOM Storage.

Additionally, unlike in cookies, where you're immediately provided with all available key/value pairs for your particular domain (and path), in DOM Storage you must request a key by its specific name (you can't iterate through all available keys).

In short: If you are only using DOM Storage within a single domain (and you control that domain), then you have nothing to worry about - your data is very safe. In any other context you must choose a secure key name that cannot be duplicated by other clients on other sites (unless, of course, that's your goal).

Scope

DOM Storage provides you with a greater variety of scope for a piece of data, than for that of Cookies. For example, if I wanted to store a piece of data on this domain I could store it with the sessionStorage, globalStorage['ejohn.org'], globalStorage['org'], or globalStorage[''] DOM Storage areas. Here is how the restrictions break down:

  • sessionStorage is limited to the current browser session, only. No other client can access this data.
  • globalStorage['ejohn.org'] All pages on my web site have access to this area.
  • globalStorage['org'] All web sites with the TLD of 'org' can access this area (including, for example, mozilla.org).
  • globalStorage[''] All pages on all sites can access this area.

I, personally, think that we're going to see the most interesting results come from sessionStorage and globalStorage['']. sessionStorage for its practicality (which I'll discuss in a minute) and globalStorage[''] for its potential deviousness.

Duration

In DOM Storage it is not possible to specify an expiration period for any of your data. All expiration rules are left up to the user. In the case of Mozilla, most of those rules are inherited from the Cookie-related expiration rules. Because of this you can probably expect most of your DOM Storage data to last at least for a meaningful amount of time.

The one storage area that is particularly interesting (and where expiration does not apply) is that of sessionStorage. sessionStorage is seemingly useless at first blush - it only stores data within the context of a single session. However, there are two, very important, qualifiers to that:

  1. Data is persistent across page refreshes. This is nice because you can now help to protect against accidental page refreshes, by temporarily caching a user's unsaved data.
  2. Data is persistent across browser crashes. This is up to the user agent, but in the case of Firefox if you browser crashes, and you restore your previous session, then your sessionStorage area will be restored. (Well, that'll be the case once Firefox actually implements this feature).

This isn't to say that either of these features aren't possible using cookies, but the similar solution wouldn't be nearly as elegant.

Summary

In all, I'm really looking forward to digging in to DOM Storage - I think it has a lot of potential. Currently, the state of affairs is pretty slim (Firefox is the only browser that has an implementation, and even then, sessionStorage isn't terribly useful yet.) - but the future is bright.

As I mentioned before, if you're interested in some basic examples and documentation on the subject, feel free to read through the DOM Storage documentation that I wrote.

Update: Here's the original bug report that detailed its implementation in Gecko.

Tags: mozilla, domstorage, html5, whatwg, dom, javascript, firefox

Current Projects

jQuery JavaScript Library

jQuery

Comprehensive DOM, Event, Animation, and Ajax JavaScript Library.

Recent Projects

Pro JavaScript Techniques

JavaScript Book

The best techniques for professional JavaScript. Published by Apress.