When working with the DOM .nodeName property there are two hard-and-fast rules that most people abide by:
The node names of HTML elements are always uppercase, even if they're explicitly created using lowercase characters. <html> will result in a .nodeName === "HTML" (see the HTML 5 draft).
The node names of XML elements are always in the original case, as specified when they're created. <data> will result in a .nodeName === "data", <DATA> will result in a .nodeName === "DATA".
Knowing these rules can be useful because it allows you to optimize your code. If you know that you're in an HTML document you can avoid having to upper/lowercase your .nodeName checks and you can just always assume that you're dealing with a .nodeName that's uppercase. This results in faster selectors for Internet Explorer and other minor optimizations.
However recently I've been running across two cases that've been especially problematic and have bucked the trend.
Importing Nodes from XML
The first is for browsers that support the adoptNode/importNode DOM methods. These methods allow you to move (or clone) a node from one DOM document to another. In this way you can move an XML node from an XML document and insert it into an HTML document. Normally this shouldn't matter much but, as it turns out, the original .nodeName case sensitivity is preserved from the original XML-ness of the node.
Thus if you have a lowercase XML element (<data>) and you use adoptNode or importNode to bring it into your HTML document the result will be .nodeName === "data" -- which completely bucks the trend for "all HTML element's node names are always uppercase." I consider this to be a bug, considering that the DOM element is now in an HTML document, not in an XML document, and should behave as such.
Unknown HTML 5 Elements
The second bit of weirdness comes from people attempting to use the new elements from HTML 5 in browsers that don't support it. Most browsers behave perfectly well when using some of the new HTML 5 elements (in that they don't freak out and support some level of styling). For Internet Explorer you must use the HTML 5 Shim technique - this will give unknown HTML 5 elements the ability to be styled and hold contents (such as a <section> element).
However there is an additional gotcha: When Internet Explorer encounters an element that it doesn't recognize it leaves the .nodeName in its original case. Thus if you have a <section> element in your HTML page the result will be .nodeName === "section" -- which directly contradicts the normal case sensitivity of the .nodeName property in HTML documents.
To try and understand all of this I made a bunch of test cases using a number of doctypes and document styles.
I ran the following tests in IE 6, IE 7, IE 8, Firefox 3.5, Safari 4.0.3, Chrome 3.0.195, and Opera 10.10. Additionally I tested against .tagName in addition to .nodeName and found no discernible difference (you can run your own .tagName tests by appending a ?tagName to any test URL like so.)
Note: The HTML 5, XHTML (served as HTML), and no-doctype pages all behaved identically to each other in every browser - thus I'm just going to not display the XHTML (as HTML) and no-doctype results as there wouldn't be anything interesting to show.
Firefox, Safari, and Chrome all yielded the same results here: Bringing in elements from an external document maintains the case sensitive nature of the .nodeName property - which is unexpected.
<div>
<DIV>
<section>
<SECTION>
HTML
DIV
DIV
SECTION
SECTION
HTML createElement
DIV
DIV
SECTION
SECTION
innerHTML
DIV
DIV
SECTION
SECTION
XML
div
DIV
section
SECTION
XML createElement
div
DIV
section
SECTION
HTML via importNode
div
DIV
section
SECTION
HTML via adoptNode
div
DIV
section
SECTION
Internet Explorer fails in a different manner. To start, Internet Explorer doesn't support importNode or adoptNode so those particular tests simply don't run. However we can confirm that the case sensitivity of the unknown HTML 5 element is maintained in HTML, even though it shouldn't be.
<div>
<DIV>
<section>
<SECTION>
HTML
DIV
DIV
section
SECTION
HTML createElement
DIV
DIV
section
SECTION
innerHTML
DIV
DIV
section
SECTION
XML
div
DIV
section
SECTION
XML createElement
div
DIV
section
SECTION
HTML via importNode
Error: Object doesn't support this property or method
HTML via adoptNode
Error: Object doesn't support this property or method
Opera ups the ante one further: Since it attempts to simultaneous follow web standards, and implement Internet Explorer's weird quirks, it both fails the importNode/adoptNode and the HTML 5 unknown element cases.
Nearly every browser that supported showing this page (Firefox, Safari, Opera, Chrome) displayed the same, expected, results:
<div>
<DIV>
<section>
<SECTION>
HTML
div
DIV
section
SECTION
HTML createElement
div
DIV
section
SECTION
innerHTML
div
DIV
section
SECTION
XML
div
DIV
section
SECTION
XML createElement
div
DIV
section
SECTION
HTML via importNode
div
DIV
section
SECTION
HTML via adoptNode
div
DIV
section
SECTION
An XHTML page served properly is just an XML document - thus the case of elements is sensitive (as to be expected).
... except in Opera. Opera apparently will treat div elements case insensitively, when injected using .innerHTML, even if it's being served within an XHTML document.
<div>
<DIV>
<section>
<SECTION>
HTML
div
DIV
section
SECTION
HTML createElement
div
DIV
section
SECTION
innerHTML
DIV
DIV
section
SECTION
XML
div
DIV
section
SECTION
XML createElement
div
DIV
section
SECTION
HTML via importNode
div
DIV
section
SECTION
HTML via adoptNode
div
DIV
section
SECTION
Update: XHTML as XML Tests
Based upon some suggestions in the comments I've run some additional tests. Namely I tested the loading of an XML document that has the correct XHTML namespace attached to it (specifically I used the same XHTML test page that I used for the other tests, just appending a .xml extension instead of .xhtml). The results are rather interesting - and promising, at least. (Note: Internet Explorer continues to fail as it doesn't have an adoptNode/importNode method.)
Firefox continues to fail the importing of XML nodes, even when they're coming from an XML document:
<div>
<DIV>
<section>
<SECTION>
HTML
DIV
DIV
SECTION
SECTION
HTML createElement
DIV
DIV
SECTION
SECTION
innerHTML
DIV
DIV
SECTION
SECTION
XML
div
DIV
section
SECTION
XML createElement
div
DIV
section
SECTION
HTML via importNode
div
DIV
section
SECTION
HTML via adoptNode
div
DIV
section
SECTION
XML (XHTML)
div
DIV
section
SECTION
XHTML via importNode
div
DIV
section
SECTION
As does Opera:
<div>
<DIV>
<section>
<SECTION>
HTML
DIV
DIV
section
SECTION
HTML createElement
DIV
DIV
section
SECTION
innerHTML
DIV
DIV
section
SECTION
XML
div
DIV
section
SECTION
XML createElement
div
DIV
section
SECTION
HTML via importNode
div
DIV
section
SECTION
HTML via adoptNode
div
DIV
section
SECTION
XML (XHTML)
div
DIV
section
SECTION
XHTML via importNode
div
DIV
section
SECTION
BUT both Safari and Chrome PASS on the importing of XHTML nodes, coming from an XML document:
<div>
<DIV>
<section>
<SECTION>
HTML
DIV
DIV
SECTION
SECTION
HTML createElement
DIV
DIV
SECTION
SECTION
innerHTML
DIV
DIV
SECTION
SECTION
XML
div
DIV
section
SECTION
XML createElement
div
DIV
section
SECTION
HTML via importNode
div
DIV
section
SECTION
HTML via adoptNode
div
DIV
section
SECTION
XML (XHTML)
div
DIV
section
SECTION
XHTML via importNode
DIV
DIV
SECTION
SECTION
This, in particular, is great news. It means that, at least, one browser understands the concept of loading in external (X)HTML into an HTML document and having it continue to work. It's unfortunate that it doesn't work in all browsers, though.
Conclusion
What can we learn from all of this? Unfortunately it appears as if we can't really trust our "trusted" rules about .nodeName case sensitivity for HTML documents. XML documents are completely safe and work as expected. XHTML (served with the correct mimetype) documents are nearly safe, save for the one bizarre Opera bug.
How will this change the code that we write? In short we can no longer trust the case insensitive nature of HTML documents - we need to assume that BOTH HTML and XML documents will be serving their content in a case sensitive nature - especially as more people start to adopt HTML 5 elements in their pages and expect some level of support in older browsers. This means that a number of selectors and DOM methods will take a performance hit as we can no longer take a case insensitive shortcut in our codebases.
There are a few outstanding jQuery tickets that are the result of these issues cropping up and now that I know the reasoning behind why they're happening I can now strip out all the case-insensitive performance improvements from the codebase - which is really quite unfortunate but at least it'll behave more consistently. I continue to stand by thesis from my earlier talk about the DOM: The DOM is a mess and every DOM method and property is broken in some way, in some browser.
The second half of the post is all about the new NodeIterator API that was just implemented. For those that are familiar with some of the DOM TreeWalker APIs this will look quite familiar.
It's my opinion, though, that this API is, at best, bloated, and at worst incredibly misguided and impractical for day-to-day use.
var nodeIterator = document.createNodeIterator(
root, // root node for the traversal
whatToShow, // a set of constants to filter against
filter, // an object with a function for advanced filtering
entityReferenceExpansion // if entity reference children so be expanded );
This is excessive for what should be, at most, a simple way to traverse DOM nodes.
To start, you must create a NodeIterator using the createNodeIterator method. This is fine except this method only exists on the Document node - which is especially strange since the first argument is the node which should be used as the root of the traversal. The first argument shouldn't exist and you should be able to call the method on any DOM element, document, or fragment.
Second, in order to specify which types of nodes you wish to see you need to provide a number (which is the result of the addition of various constants) that the results will be filtered against. This is pretty insane so let me break this down. The NodeFilter object contains a number of properties representing the different types of nodes that exist. Each property has a number associated with it (which makes sense, this way the method can uniquely identify which type of node to look for). But then the crazy comes in: In order to select multiple, different, types of nodes you must OR together the properties to creating a resulting number that'll be passed in.
For example if you wanted to find all elements, comments, and text nodes you would do:
I'm not sure if you can get a much more counter-intuitive JavaScript API than that (you can certainly expect little, to no, common developer adoption, that's for sure).
Next, the filter argument accepts an object that has a method (called acceptNode) which is capable of further filtering the node results before being returned from the iterator. This means that the function will be called on every applicable node (as specified by the previous whatToShow argument).
Two points to consider:
The filter argument must be an object with a property named 'acceptNode' that has a function as a value. It can't just be a function for filtering, it must be enclosed in a wrapper object.Update: Actually, this isn't true - at least with Mozilla's implementation you can pass in just a function. Thanks for the tip, Neil!
The argument is required (even though you can pass in null, making it equivalent to accepting all nodes).
The last argument, entityReferenceExpansion, comes in to play when dealing with XML entities that also contain sub-nodes (such as elements). For example, with XML entities, it's perfectly valid to have a declaration like <!ENTITY aname "<elem>test</elem>"> and then later in your document have &aname; (which is expanded to represent the element). While this may be useful for XML documents it is way out of the scope of most web content (thus the argument will likely always be false).
The first of which can be removed (by making the method available on elements, fragments, and documents).
The second of which is obtuse and should be optional (especially in the case where all nodes are to be matched.
The third which requires a superfluous object wrapping and should be optional.
The fourth of which should be optional.
None of this actually takes into account the actual iteration process. If you look at the specification you can see that all the examples are in Java - and when seeing this a lot of the API decisions start to make more sense (not that it really applies to the world of web-based development, though). In JavaScript one doesn't really use iterators, more typically an array is used instead. (In fact a number of helpers have been added in ECMAScript 5 which make the iteration and filtering process that much simpler.)
I'd like to propose the following, new, API that would exist in place of the NodeIterator API (dramatically simplifying most common interactions, especially on the web).
// Get all nodes in the document
document.getNodes();
// Get all comment nodes in the document
document.getNodes( Node.COMMENT_NODE);
// Get all element, comment, and text nodes in the document
document.getNodes( Node.ELEMENT_NODE, Node.COMMENT_NODE, Node.TEXT_NODE);
I'd also like to propose the following helper methods:
// Get all comment nodes in the document
document.getCommentNodes();
// Get all text nodes in a document
document.getTextNodes();
Beyond finding elements, finding comments and text nodes are the two most popular queries types that I see requested.
Consider the code that would be required to recreate the above using NodeIterator:
// Get all nodes in the document
document.createNodeIterator(document, NodeFilter.SHOW_ALL, null, false);
// Get all comment nodes in the document
document.createNodeIterator(document, NodeFilter.SHOW_COMMENT, null, false);
// Get all element, comment, and text nodes in the document
document.createNodeIterator(document,
NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_COMMENT | NodeFilter.SHOW_TEXT, null, false );
This proposed API would return an array of DOM nodes as a result (instead of an NodeIterator object). You can compare the difference in results between the two APIs:
NodeIterator API
var nodeIterator = document.createNodeIterator(
document,
NodeFilter.SHOW_COMMENT, null, false );
Almost always, when finding some of the crazy intricacies of the DOM or CSS, you'll find a legacy of XML documents and Java applications - neither of which have a strong application to the web as we know it or to the web as it's progressing. It's time to divorce ourselves from these decrepit APIs and build ones that are better-suited to web developers.
Update: An even better alternative (rather than using constants representing node types) would be something like the following:
document.getNodes( Element, Comment, Text );
Just refer back to the back objects representing each of the types that you want.
Firefox Nightly 99.3% (16 failing - doesn't handle 'undefined' being passed in, correctly)
IE 8 RC 1 45.9% (1171 failing - Major problem areas are lack of whitespace trimming, incorrect exceptions being thrown, and lack of full CSS 3 selector support)
Opera 10a1 99.0% (22 failing - Empty string checking in attributes fails and some disconnected checkbox checks fail)
I gave a talk last week at Google (at the request of the excellent Steve Souders) all about the performance improvements, and new APIs, that are coming in browsers. I cover the new browsers, their JavaScript engines, their JavaScript performance, and then do a whirlwind tour of their new DOM methods and some of their new CSS APIs.
I gave a talk last week at Yahoo (at the request of the YUI team) all about the DOM. I outlined some of the reasons why the current situation is such a mess, outline some strategies for working around it, and then give some examples of real world code that's being implemented in libraries today.
While looking for improvements to injecting HTML fragments into a document (which I mentioned, in passing, when I looked at using Document Fragments) I decided to spend some more time with Internet Explorer's insertAdjacentHTML method.
This method has been in Internet Explorer since version 4.0 - as well as is in the current release of Opera - and allows you to inject fragments of well-formed HTML into a variety of locations in a document.
The locations work as such (I list the equivalent terminology):
.insertAdjacentHTML("beforeBegin", ...)
before
.insertAdjacentHTML("afterBegin", ...)
prepend
.insertAdjacentHTML("beforeEnd", ...)
append
.insertAdjacentHTML("afterEnd", ...)
after
The method is only available on DOM elements (which makes sense) and is easy to use:
var ul = document.getElementById("list");
ul.insertAdjacentHTML("beforeEnd", "<li>A new li on the list.</li>");
ul.insertAdjacentHTML("beforeEnd", "<li>Another li!</li>");
At first glance the method appeared to work well and seemed to be relatively fast. Two questions remained, though: How fast is it in comparison to using the Document Fragment technique I outlined before and does it work for all the strange use-cases that exist?
I created a test case to compare the three types of injection: The type we've been using in jQuery prior to the upcoming 1.3 release, the new Document Fragment technique we'll be using in jQuery 1.3, and a case using insertAdjacentHTML (where applicable). While both the Document Fragment and insertAdjacentHTML cases were significantly faster than the old techniques used in jQuery the Document Fragment technique ended up being marginally faster in IE 6 (50ms vs. 80ms for insertAdjacentHTML).
There's a huge problem with insertAdjacentHTML: It doesn't work on all HTML elements in IE 6 (specifically it doesn't work on table, tbody, thead, or tr elements). Having gaps in the functionality is very undesirable (attempting to use insertAdjacentHTML on those elements causes an exception to pop up in IE 6).
It doesn't work on XML documents. Of course neither does innerHTML (at least not until browsers start to implement HTML 5 more completely). We're stuck doing the traditional techniques used in libraries like jQuery.
So why spend all this time talking about a method that is relatively half-baked in the main browser that implements it? Because it's going to be part of the HTML 5 specification. This means that we're going to see a larger number of browsers start to implement this method (and hopefully it'll encourage existing vendors to implement it more completely and efficiently).
Having browsers implement this method will dramatically reduce the amount of code needed to write a respectable JavaScript library. I'm looking forward to the day in which this method is more-widely available (along with querySelectorAll) so that we can really buckle down and do some serious code simplification.
If I had to rate my least favorite browser bugs I'd have to put this one near the top. A holdover from the old DOM0 days it's a practice where elements with a given name or ID are added as an expando property to another DOM node.
Here are my two favorite examples of this bug in action:
The first is a simple form that does a search on a site. Additionally a link is provided that, when clicked, fills in a search value and submits the form.
The .submit() method (which is available on all Form elements) is overwritten by the input element of the same name. This ends up being a very common problem - with frameworks using id="submit" as a default in their code.
Worst of all this fails in all browsers (preventing you from accessing the overwritten method).
The second example is even more devious. In this case we're going to loop over all the DOM elements in the page and alert out their contents.
<divid="length">12 stories</div> <divid="makeup">radiation</div> <script>
var all = document.getElementsByTagName("*");
for ( var i = 0; i < all.length; i++ ){
alert( all[i].innerHTML ); } </script>
This will work in most browsers - but not Internet Explorer. To understand why we return to the address bar.
Oops. All browsers turn elements with specific IDs into expandos of the returned NodeSet. But Internet Explorer goes a step farther and decides to overwrite the built-in .length property as well, breaking current forms of iterating over the DOM elements.
At least within jQuery you'll see a number of cases where, instead of doing the normal array traversal, we do the following in order to work around the issue:
for(var i = 0; elems[i]; i++ ){ // Do stuff with elems[i] }
It's a little more obtuse but at least it's guaranteed to work against cases of broken NodeSet iteration.
Garrett Smith has a highly technical write-up on the variety of issues that stem from this form of expansion. In short: No browser is immune from these problems. It's unfortunate that this whole system couldn't just be done away with (to avoid these types of issues in the first place) but legacy pages will likely necessitate their inclusion for many, many, years to come.
The purpose of this proposal is to make it easier for developers to traverse through DOM elements without having to worry about intermediary text nodes, comment nodes, etc. This has long been a bane of web developers, in particular, with cases like document.documentElement.firstChild yielding different results depending on the whitespace structure of a document.
The Element Traversal API introduces a number of new DOM node properties which can make this traversing much simpler.
Here's a full break-down of the existing DOM node properties and their new counterparts:
Purpose
All DOM Nodes
Just DOM Elements
First
.firstChild
.firstElementChild
Last
.lastChild
.lastElementChild
Previous
.previousSibling
.previousElementSibling
Next
.nextSibling
.nextElementSibling
Length
.childNodes.length
.childElementCount
These properties provide a fairly simple addition to the DOM specification (and, honestly, they're something that should've been in the specification to begin with).
There is one property that is conspicuously absent, though: .childElements (as a counterpart to .childNodes). This property (which contained a live NodeSet of the child elements of the DOM element) was in previous iterations of the specification but it seems to have gone on the cutting room floor at some point in the interim.
But all is not lost. Right now Internet Explorer, Opera, and Safari all support a .children property which provides a super-set of the functionality that was supposed to have been make possible by .childElements. When support for the Element Traversal API was finally landed for Firefox 3.1, support for .children was included. This now means that every major browser will support this property (far in advance of all browsers supporting the rest of the true Element Traversal specification).
I think that the Element Traversal spec is missing a huge opportunity here to specify something that has become a de facto standard amongst browsers. Maybe it'll make the second version of the Element Traversal spec, heh.
There are two big points that need to be explored here:
Now that the .children property is virtually everywhere how can we start to use it to simplify our code?
Can we use .children, or parts of the Element Traversal API, to help speed up existing code?
To answer this question I mocked up a quick little plugin for jQuery that replaces the internals of the existing .prev(), .next(), .prevAll(), .nextAll(), .siblings(), and .children() methods with .children and the Element Traversal API methods.
The resulting code is absolutely simpler - previously there were numerous checks to see if a Node was, or was not, a DOM element - which resulted in lots of extra kludgy methods to handle those cases. But was the code faster?
I plugged the code into Dromaeo to see if there was any speed up in Firefox 3.1. The result? There is no discernible speed improvement to using the new DOM Traversal methods (.firstChildElement, etc.). This isn't, necessarily, a bad thing - we just got the same performance that we see now but with a better API.
However there is a large improvement in speed when using .children (for the .siblings() and .children() jQuery methods). With this addition .siblings() is 84% faster and .children() is 35% faster. Considering that the .children method is now available in all browsers it's making a lot of sense for people to get on board and start using it in their code bases for a definite hot path to extra performance. (Although, this is definitely not a new revelation - with frameworks like Dojo having used .children in their selector code for quite some time now.)
If nothing else the argument to having a simple branch in your code to handle using .children is absolutely becoming more compelling.