Blog
August 9th, 2008
Decided to start something new with Dion Almaer (of Ajaxian.com fame) and Alex Russell (Dojo frame) - a podcast! We talk about the "Open Web" (the topics break down in to standards, development, JavaScript, tools - all sorts of things). We're working to get it up on iTunes (I'll be sure to update when it that's the case) but in the meantime you can subscribe to the RSS feed.
Dion has written up a nice overview of the first episode, which I've included below:
Welcome to the inaugural episode of a new podcast to cover news, happenings, and our opinions on the Open Web (download the Open Web Podcast episode one directly or subscribe to it). When I say “our” I am talking about the founding podcasters: Alex Russell, John Resig, and myself. It is a pleasure to be able to share air time with two of the real leaders of the Open Web, and specifically the Ajax space thanks to Dojo and jQuery.
What is the state of the Open Web?
That is how we started out the podcast, and we got to see very different opinions. John discusses the decentralization and new openness that we see across the Web. Alex was a little more wary, and talks about how he wants the Open Web to progress faster. He noted that a lot of the good work has been a little away from the client, and instead in the area of identity, transport, and formats.
We then move on to HTML 5, where we discuss items in Mark Pilgrim’s This Week in HTML 5 piece including Web Workers (think: Gears Workers), and the clarification of alt tag usage in the img tag to have you using alt="{diagram}" and the like.
We have a detailed chat about Web Workers, and where we see them being useful. John talks about issues around not being able to talk to the DOM, Alex talks about mashups, and I talk about some tests showing how they can help performance in a few areas. Matthew Russell did a demo using the Dojo 2d code at OSCON, and showed how he doubled the performance by pushing out computation into a Worker. John also talked about a special case for passing DOM fragments or the like to a Worker with special serialization. Of course, security is a concern for all of this.
John brought up the new data- embedding tactic that showed up in the HTML 5 spec. A conversation ensued around how you should separate your data from presentation. Is the DOM there to store data? Isn’t it a good place to keep it? Is “data-” just too long?
It is exciting to think that the W3C Selectors API will soon be implemented in Firefox 3.1, Safari 3, IE 8, and probably Opera 10. That seemed to happen pretty quickly. John and Alex talk about how this is going to mean a lot of chopping code from their frameworks, the increase in performance, and the subtle differences between the spec and how they were doing things.
The discussion leads to a new feature, named scoped CSS, that allows you to say “this CSS only works over here.” This could be huge, especially if you have an application such as a CMS, where people upload their own content that can mess with your application structure itself.
Next, we delve into the world of Firebug. John talks about how Firebug development is being bootstrapped by Mozilla and other contributors, and he discusses the upcoming versions and what you can expect. Stability and performance are top of the list. Don’t forget the Firebug Lite improvements too, which mean that you get more than just console to play with in non-Firefox browsers. I just posted the notes on that meeting, kindly taken by Steve Souders.
We talked about the Open Web Foundation, and Alex discussed what he would like to see come of it. He is optimistic, and thinks that the real test will be if we see the incubation of projects that really push the Web on the client side, as well as the identity side.
Finally, there is news in the Dojo community and Alex spills the beans. After over 4 years of service, Alex is stepping down as the project lead of Dojo, and handing over the reins to Peter Higgins who has shown great chops as both a commiter and an external leader. We wish Pete the best of luck! Alex isn’t sneaking off into the sunset though, as he talks about in his post on the subject, he will see be an active member of the Dojo community for a long time to come.
Finally, thanks again to John and Alex for taking the time to start this up with me. Please let us know what you think, and what you would like us to talk about.
Tags: html5, whatwg, w3c, podcast
10 Comments on 'Open Web Podcast #1'
March 8th, 2007
Mark Finkle suggested that I do some speed testing, now that a native implementation of getElementsByClassName has landed in the Mozilla trunk (destined for Firefox 3).
So I went around and dug up all of the different, existing, implementations that I could find. Currently, implementations fall into one of three categories (with some straddling more than one):
- Pure DOM
This usually involves a calls to .getElementsByClassName("*") and traversing through all matched elements, analyzing each element's className attribute along the way. Generally, the fastest method is to use a pre-compiled RegExp to test the value of the className attribute.
- DOM Tree Walker
Is a less-popular means of traversing DOM documents by setting some simple parameters, as specified by the DOM Level 2 Spec. For example, you could traverse through all text nodes in a document (something that you can't easily do in any other way).
- XPath
The most recent technique, to be popularized, was the use of XPath to find elements by classname. The implementation is generally simple: Building a single expressions and letting the XPath engine traverse through the document, finding all the relevant elements.
I've chosen some implementations that were representative of each of these techniques.
An implementation using the DOM Level 2 Tree Walker methods. Builds a generic filter function and traverses through all elements.
document.getElementsByClass = function(needle) {
function acceptNode(node) {
if (node.hasAttribute("class")) {
var c = " " + node.className + " ";
if (c.indexOf(" " + needle + " ") != -1)
return NodeFilter.FILTER_ACCEPT;
}
return NodeFilter.FILTER_SKIP;
}
var treeWalker = document.createTreeWalker(document.documentElement,
NodeFilter.SHOW_ELEMENT, acceptNode, true);
var outArray = new Array();
if (treeWalker) {
var node = treeWalker.nextNode();
while (node) {
outArray.push(node);
node = treeWalker.nextNode();
}
}
return outArray;
}
Uses a pure DOM implementation, tries to make some optimizations for Internet Explorer.
function getElementsByClassName(oElm, strTagName, strClassName){
var arrElements = (strTagName == "*" && oElm.all)? oElm.all :
oElm.getElementsByTagName(strTagName);
var arrReturnElements = new Array();
strClassName = strClassName.replace(/\-/g, "\\-");
var oRegExp = new RegExp("(^|\\s)" + strClassName + "(\\s|$)");
var oElement;
for(var i=0; i<arrElements.length; i++){
oElement = arrElements[i];
if(oRegExp.test(oElement.className)){
arrReturnElements.push(oElement);
}
}
return (arrReturnElements)
}
A pure DOM implementation, caches the regexp, and is generally quite simple and easy to use.
function getElementsByClass(searchClass,node,tag) {
var classElements = new Array();
if ( node == null )
node = document;
if ( tag == null )
tag = '*';
var els = node.getElementsByTagName(tag);
var elsLen = els.length;
var pattern = new RegExp("(^|\\s)"+searchClass+"(\\s|$)");
for (i = 0, j = 0; i < elsLen; i++) {
if ( pattern.test(els[i].className) ) {
classElements[j] = els[i];
j++;
}
}
return classElements;
}
Mixes an XPath and DOM implementation; using XPath wherever possible.
document.getElementsByClassName = function(className, parentElement) {
if (Prototype.BrowserFeatures.XPath) {
var q = ".//*[contains(concat(' ', @class, ' '), ' " + className + " ')]";
return document._getElementsByXPath(q, parentElement);
} else {
var children = ($(parentElement) || document.body).getElementsByTagName('*');
var elements = [], child;
for (var i = 0, length = children.length; i < length; i++) {
child = children[i];
if (Element.hasClassName(child, className))
elements.push(Element.extend(child));
}
return elements;
}
};
A native implementation, written in C++; is a part of the current CVS version of Firefox, will be included in Firefox 3.
document.getElementsByClassName
The Speed Results
For the speed tests I copied the Yahoo homepage into a single HTML file and used that as the test bed. They make good use of class names (both single and multiple) and is a considerably large file with lots of elements to consider.
You can find the test files, for each of the implementations, here:
http://ejohn.org/apps/classname/

Note: "XPath" is just Prototype's implementation.
From these figures we can see that the native implementation of getElementsByClassName, in Firefox 3, is a full 8x faster than the XPath implementation. Additionally, it's a stunning 77x faster than the fastest DOM implementation.
Note: These numbers have been revised from what was originally posted as the lazy-loading nature of document.getElementsByClassName wasn't taken into account. The resulting arrays are completely looped-through now, making sure that all elements are accounted for.
Currently, Prototype has the best general-use implementation: Use XPath selectors wherever possible, fall back to fast DOM parsing.
Interestingly, only Prototype actually tries to implement the document.getElementsByClassName interface (all others do one-off names). However, Prototype doesn't check to see if the document.getElementsByClassName property already exists, and completely overwrites the, incredibly fast, native implementation that Firefox 3 provides (oops!).
In all, the results are quite astounding. The native implementation is absolutely much faster than anything I could've imagined. It completely decimates all the other pieces of code. I can't wait until this hits the general public - users will, absolutely, feel a significant increase in speed.
Tags: javascript, speed, firefox3, firefox, whatwg, xpath, dom
25 Comments on 'getElementsByClassName Speed Comparison'
February 2nd, 2007
I've been waiting for this one for a while - and it was just committed to the Mozilla trunk yesterday:
Firefox 3 is going to have support for getElementsByClassName.
Robert Sayre just merged in his changes yesterday, taking this feature live.
If you're curious as to why this feature is being included (or where the reasoning for it originated from) - it's because it's part of the Web Applications 1.0 (HTML 5) specification. The implementation that's in Firefox is slightly different from what's presented in the specification; however, you can expect that the specification will probably be updated to reflect that changes that've been made.
getElementsByClassName has long been a mainstay of web developers everywhere - and by making it official (both in specification and in implementation), web applications are going to see a huge jump in speed.
I've pulled together some simple examples of what you can do with this new element selector.
Get all elements that have a class of 'test'
document.getElementsByClassName('test')
Get all elements that have a class of 'red' and 'test'
document.getElementsByClassName('red test')
Get all elements that have a class of 'test', inside of an element that has the ID of 'main'
document.getElementById('main').getElementsByClassName('test')
And if we go ahead and add in JavaScript 1.6's Array extras, we can do some really-cool matches.
Find all div elements that have a class of 'test'
Array.filter( document.getElementsByClassName('test'), function(elem){
return elem.nodeName == 'DIV';
});
Find all elements that have a class of 'test' (as do their parent element)
var test = document.getElementsByClassName('test');
Array.filter( test, function(elem){
return Array.indexOf( test, elem.parentNode ) > -1;
});
Some basic code can be found in the test cases for this feature. You'll need to have a nightly version of Firefox in order to run the code contained within it. I really can't wait until this is live.
Update: This post has been submitted to Digg.
Update: This function is now documented on the MDC Wiki.
Tags: javascript, traversing, whatwg, dom, getelementsbyclassname, traversal, html5
16 Comments on 'getElementsByClassName in Firefox 3'
January 31st, 2007
In a follow-up to my previous post on DOM Storage, here's an attempt to answer some of the questions posed in the comments, and elsewhere.
What’s to stop a script from filling an entire storage area with random data?
As best as I can tell with the implementation in Firefox, every time you save data to a storage area, not only is the name of the storage area saved (e.g. "org" or "ejohn.org") but so is the domain name of the original script. This domain name is what has the 5MB limitation imposed upon it.
This means that a single script could put 2MB of data in globalStorage['org'] and another 3MB in globalStorage['ejohn.org'], but attempting to put any more data in will cause an error to occur.
I may be wrong on this point; but if that's the case, then it will be very hard to stop scripts from filling up "common" areas with nonsense data (and which seems like something that was considered as a fundamental design decision).
(Update: This is, currently, a moot point in Firefox as it does not enable any of the TLD or public storage areas, for fear of an issue like this happening.)
Storing data to globalStorage['localdomain']
I forgot to mention that all LAN addresses receive a TLD of 'localdomain'. For example, if your computer is named 'anthrax', you could store data in the following stores: globalStorage['anthrax.localdomain'], globalStorage['localdomain'], and globalStorage[''].
This means that you can, in fact, store data "globally" across all pages of an intranet.
What's with the name "DOM Storage"?
I've been having trouble pinpointing this one - but I agree, the name is particularly deceptive. What's especially confusing is that "DOM Storage" isn't the official name for this particular feature, "Client-side session and persistent storage" is.
As it happens, Mozilla's internal name of this features is "DOMStorage" (the names "Storage", "mozStorage", and "sessionStorage" were all already in use), I'm beginning to suspect that this naming confusion has stemmed from this, original, feature-naming. (Note: This has been confirmed.)
In reality, the storage area doesn't really have much to do with the Document Object Model - it really only attaches to the WindowHTML Object. The only time it actually interacts with the DOM, as we know it, is to trigger a "storage" event on the document body.
The storage event
I forgot to mention this in my last post, but this is a rather interesting sub-feature of DOM Storage. Whenever a key/value is changed, added, or removed within a storage area that you have permission to access, a 'storage' event is triggered, originating from the HTML document body and bubbling its way up.
The only piece of event metadata that is passed along in the event object is the name of the domain that the changed-key is within - and nothing else. (Providing any additional information would serve to be a security risk)
Update: Firefox does currently support this, and you can view a test case here.
Server-side Data
No data from DOM Storage areas is automatically passed to the server. This differs from cookies, which serve as a two-way form of synchronous communication between a server and a browser.
You could work around this limitation with an XMLHttpRequest, POSTing the contents of your relevant storage areas.
Known Bugs
As I mentioned before, Firefox has the only known implementation of DOM Storage available to any browser. However, there are still some serious limitations with it.
- sessionStorage - Does not restore data across browser crash/restore.
- globalStorage['org'] - Storing data to a single TLD fails
- globalStorage[''] - Storing data to a public area fails
Update: Neil Deakin has gotten back to me and mentioned that the lack of TLD or public storage areas was intentional. When implementing the specification he felt that there was, quite simply, too much of a security risk in leaving those public areas open. I suspect that since Firefox is the first browser to make an attempt an implementation, much of this will trickle back to the specification.
We're still in the rocky first moments of this feature, but once it starts to pick up some solid adoption (especially amongst other browsers), I suspect that we'll see the level of quality really start to take off.
Tags: domstorage, whatwg, javascript, html5, firefox, mozilla
15 Comments on 'DOM Storage Answers'
January 29th, 2007
The first thing that I tasked myself with, when I hopped on board with Mozilla, was to learn everything that I could about the new DOM Storage functionality provided in the HTML 5 specification. Interestingly, it's actually very impressive.
The brief summary: DOM Storage is a way to store meaningful amounts of client-side data in a persistent and secure manner.
I've written up a thorough piece of documentation on DOM Storage, and it can be found on the DevMo Wiki.
If you hadn't guessed already, DOM Storage is most similar, in nature, to HTTP Cookies. I'm going to already assume that you know how Cookies work, so with that in mind, here is how DOM Storage compares.
Storage Space
It is implied that, with DOM Storage, you have considerably more storage space than the typical user agent limitations imposed upon Cookies. However, the amount that is provided is not defined in the specification, nor is it meaningfully broadcast by the user agent.
If you look at the Mozilla source code we can see that 5120KB is the default storage size for an entire domain. This gives you considerably more space to work with than a typical 2KB cookie.
However, the size of this storage area can be customized by the user (so a 5MB storage area is not guaranteed, nor is it implied) and the user agent (Opera, for example, may only provide 3MB - but only time will tell.)
Storage / Retrieval
Where as in cookies you have to encode your key/value pairs into a string, along with other information (such as expiration and path), DOM Storage is a breath of fresh air. It provide a clean, JavaScript-like API for accessing and mutating key/value pairs. For example:
// Save a value
sessionStorage[ 'name' ] = "John";
// Display a value
alert( sessionStorage[ 'name' ] );
=> 'John'
This is absolutely fantastic, in comparison to Cookies. Note that, like with cookies, you can only store strings of data as a value. Additionally, no form of object serialization is provided by the browser, making some storage tasks non-trivial.
At this point, I'd be happy with one of two solutions: 1) For the user agent to take care of object (de)serialization for us, making it completely seamless or 2) To provide us with a really good, really fast, JSON (de)serializer. Honestly, I'd prefer the second solution, simply because it could be used in so many other situations. (And if I've heard right, one is definitely in the works by multiple browser vendors.)
Security
Cookies provide you with the ability to limit the accessibility of your key/value pairs to a certain domain, and even a certain path within that domain. DOM Storage works similarly, but only allows you to restrict access to domain (or TLD, or even public!) or session.
The data stored in a DOM Storage area is much more "public" than what's provided by cookies. For example, let's say you wanted to use DOM Storage on a public site like LiveJournal. There would be no way for you to hide your data from other users -- other than to choose a key that is completely private and un-reproducible; since there is no "path" limiter for DOM Storage.
Additionally, unlike in cookies, where you're immediately provided with all available key/value pairs for your particular domain (and path), in DOM Storage you must request a key by its specific name (you can't iterate through all available keys).
In short: If you are only using DOM Storage within a single domain (and you control that domain), then you have nothing to worry about - your data is very safe. In any other context you must choose a secure key name that cannot be duplicated by other clients on other sites (unless, of course, that's your goal).
Scope
DOM Storage provides you with a greater variety of scope for a piece of data, than for that of Cookies. For example, if I wanted to store a piece of data on this domain I could store it with the sessionStorage, globalStorage['ejohn.org'], globalStorage['org'], or globalStorage[''] DOM Storage areas. Here is how the restrictions break down:
- sessionStorage is limited to the current browser session, only. No other client can access this data.
- globalStorage['ejohn.org'] All pages on my web site have access to this area.
- globalStorage['org'] All web sites with the TLD of 'org' can access this area (including, for example, mozilla.org).
- globalStorage[''] All pages on all sites can access this area.
I, personally, think that we're going to see the most interesting results come from sessionStorage and globalStorage['']. sessionStorage for its practicality (which I'll discuss in a minute) and globalStorage[''] for its potential deviousness.
Duration
In DOM Storage it is not possible to specify an expiration period for any of your data. All expiration rules are left up to the user. In the case of Mozilla, most of those rules are inherited from the Cookie-related expiration rules. Because of this you can probably expect most of your DOM Storage data to last at least for a meaningful amount of time.
The one storage area that is particularly interesting (and where expiration does not apply) is that of sessionStorage. sessionStorage is seemingly useless at first blush - it only stores data within the context of a single session. However, there are two, very important, qualifiers to that:
- Data is persistent across page refreshes. This is nice because you can now help to protect against accidental page refreshes, by temporarily caching a user's unsaved data.
- Data is persistent across browser crashes. This is up to the user agent, but in the case of Firefox if you browser crashes, and you restore your previous session, then your sessionStorage area will be restored. (Well, that'll be the case once Firefox actually implements this feature).
This isn't to say that either of these features aren't possible using cookies, but the similar solution wouldn't be nearly as elegant.
Summary
In all, I'm really looking forward to digging in to DOM Storage - I think it has a lot of potential. Currently, the state of affairs is pretty slim (Firefox is the only browser that has an implementation, and even then, sessionStorage isn't terribly useful yet.) - but the future is bright.
As I mentioned before, if you're interested in some basic examples and documentation on the subject, feel free to read through the DOM Storage documentation that I wrote.
Update: Here's the original bug report that detailed its implementation in Gecko.
Tags: mozilla, domstorage, html5, whatwg, dom, javascript, firefox
10 Comments on 'DOM Storage'
·
« Previous entries