XHTML, document.write, and Adsense


After some recent discussion concerning the use of document.write() in XHTML documents served with the doctype “application/xhtml+xml” I decided to revisit the problem. An issue with the solutions proposed by Sam and Ajaxian is that they aren’t really solutions – just a lot of hand waving (not that that’s bad, it’s just that the problem is a lot harder than what they propose).

So I sat down and decided to write a semi-complete document.write() replacement for Firefox 1.5+, Opera 9, and Safari 2+ – all handling straight XHTML documents served with an “application/xhtml+xml” content-type.

Note: Notice that I completely ignore Internet Explorer. Since IE doesn’t even know to render XHTML pages (served with the correct mimetype), I’m assuming that you’re doing some form of browser sniffing in your code (on the server). If that’s the case, then you may be serving a different version of the page, and not include the (at this point) unnecessary document.write() hack. If you want to serve only one version of the code, then I suggest that you use conditional comments, or do some client-side browser sniffing to serve the hack to those that need it. (This is mostly because I have yet to find a way to reliably detect a broken document.write() implementation.)

I had a couple of goals for my solution:

  1. It should be as faithful to the normal document.write() as possible. (This means arbitrary injection of XHTML into the DOM)
  2. It should inject the XHTML into the document at the current DOM position.
  3. It should correct for basic weird things that people do (like using write to add invalid XHTML to a document – and writing out closing tags). Stuff like this:
    1. document.write("<iframe src='test.html'>");
    2. // ... some code ...
    3. document.write("</iframe>");
  4. It should make Google Adsense work, with no code modification.

I’ll start by saying that solving this problem in Mozilla “isn’t that bad” nor is it in Opera. Safari is a royal PITA, which I’ll talk about, more, later.

The vast majority of the cross-browser issues that occur relate to how innerHTML works in XHTML documents. In order to make document.write() work as you would expect it to, you need to write out straight (X)HTML. This topic has been discussed extensively by some of the great JavaScript and standards developers in the industry.

A Solution

So I’ve developed a basic solution to the document.write()/XHTML problem. The full code for which can found found below, along with a demo of it in action here:
http://ejohn.org/apps/write.xhtml

  1. document.write = function(str){
  2.     var moz = !window.opera && !/Apple/.test(navigator.vendor);
  3.    
  4.     // Watch for writing out closing tags, we just
  5.     // ignore these (as we auto-generate our own)
  6.     if ( str.match(/^<\//) ) return;
  7.  
  8.     // Make sure & are formatted properly, but Opera
  9.     // messes this up and just ignores it
  10.     if ( !window.opera )
  11.         str = str.replace(/&(?![#a-z0-9]+;)/g, "&amp;");
  12.  
  13.     // Watch for when no closing tag is provided
  14.     // (Only does one element, quite weak)
  15.     str = str.replace(/<([a-z]+)(.*[^\/])>$/, "<$1$2></$1>");
  16.    
  17.     // Mozilla assumes that everything in XHTML innerHTML
  18.     // is actually XHTML - Opera and Safari assume that it's XML
  19.     if ( !moz )
  20.         str = str.replace(/(<[a-z]+)/g, "$1 xmlns='http://www.w3.org/1999/xhtml'");
  21.      
  22.     // The HTML needs to be within a XHTML element
  23.     var div = document.createElementNS("http://www.w3.org/1999/xhtml","div");
  24.     div.innerHTML = str;
  25.    
  26.     // Find the last element in the document
  27.     var pos;
  28.    
  29.     // Opera and Safari treat getElementsByTagName("*") accurately
  30.     // always including the last element on the page
  31.     if ( !moz ) {
  32.         pos = document.getElementsByTagName("*");
  33.         pos = pos[pos.length - 1];
  34.        
  35.         // Mozilla does not, we have to traverse manually
  36.     } else {
  37.         pos = document;
  38.         while ( pos.lastChild && pos.lastChild.nodeType == 1 )
  39.             pos = pos.lastChild;
  40.     }
  41.    
  42.     // Add all the nodes in that position
  43.     var nodes = div.childNodes;
  44.     while ( nodes.length )
  45.         pos.parentNode.appendChild( nodes[0] );
  46. };

It’s important to note what this solution does – and does not – work for.

  • The code will work perfectly for well-formed XHTML markup. This code only does basic “crappy HTML” checks. For example, if you do: document.write(“<img src=’foo.jpg’>”) it’ll correct it to become XHTML compliant (with the extra / at the end). However, doing document.write(“<img src=’foo.jpg’> <img src=’bar.jpg’>”); will break – as only the last element in the document.write() is “fixed”. (And even then, the fixing isn’t very smart – it just adds a closing tag, which may not always be correct.) Much of this can be fixed with some smarter regular expressions. I took a stab at it, but cross-browser support for variable negative lookaheads seems to be shaky, at best.
  • When using innerHTML in an XML document in Opera and Safari, it assumes that all elements are just XML elements. For this reason the code forcefully puts all elements in the XHTML namespace. Again, this is pretty crude and may break some of your markup, but it’s worked well for me so far.
  • The only extra purification that’s performed is the conversion of ampersands (&) to their entity code (&amp;) – where appropriate. If you have other symbols (like < or >, then I can’t make any guarantees.)
  • It’s also interesting to note that two completely different methods of traversing the document had to be used. Mozilla-based browsers start acting really strange when you do getElementsByTagName(“*”) inline in an XHTML document. It will always work fine for the first document.write(), but all subsequent calls will revert back to the position of the last inline <script/>.
  • In the end, this is still not as good as document.write() since with .write() you can write out stuff like table rows, options, partial HTML, script elements, all without blinking an eye. The code to handle all of this is quite significant (having written the code to do it for jQuery, you can take my word for it). I don’t plan on re-writing all of that special-case code, so please only use this solution for simple fixes.

Ok, so now that that’s out of the way – let’s see how well this works in the different browsers.

Firefox 1.5+ Opera 9 Safari 2 Webkit
(Safari 3)
Simple Text Insertion Pass Pass Fail Pass
Simple HTML Insertion Pass Pass Fail Pass
Google Adsense Pass Pass Fail Sort-of Fail

So here’s the dirt on Safari. I spent many hours banging my head against the keyboard and finally admitted defeat in Webkit for Adsense and anything in Safari 2.0. Here’s the issues:

  • Safari 2.0 completely rejects any attempts to use innerHTML in an XHTML document. It throws exceptions and simply will not let you do it. For this reason, Safari (as it is currently available) is a lost cause.
  • Webkit Nightlites (Safari 3.0) on the other hand, fixed the innerHTML problems – allowing it to work nearly flawlessly. You can see that on the demo page (in a Webkit Nightly) that the Google Adsense IFrame is inserted into the page – and a URL is even requested – however the Adsense script seems to be fundamentally flawed. Looking at the URL generated for Webkit vs. the URL generated for Firefox or Opera, it is apparent that the Adsense script simply isn’t working correctly. So while, technically, Adsense does not (currently) work in the Webkit Nightly, with this hack, it seems like it’s not by a fault of mine.

In all, this hack was an interesting experience – considering that every browser seems to behave in some sort of nonsensical fashion (in one way or another). I’m glad that there’s, at least, a solution now for two of the major browsers (and possibly the next version of Safari too, after some more tinkering). I was, perhaps, most pleasantly surprised by Firefox’s innerHTML/XHTML implementation. You feed it valid XHTML, it inserts it into the document. Any other value throws an exception. Very simple and logical.

As a side note: I’m going to try and feed some of this code back into jQuery, so that stuff like $(…).append(““) will work as you might expect it to in the major browsers.

It’s pretty obvious that writing XHTML documents with the preferred mimetype is still a ways off from real-world usage, however I’m more hopeful now than I was before – which is good, to say the least.

Posted: November 12th, 2006


If you particularly enjoy my work, I appreciate donations given with Gittip.

22 Comments (Show Comments)



Comments are closed.
Comments are automatically turned off two weeks after the original post. If you have a question concerning the content of this post, please feel free to contact me.


Secrets of the JavaScript Ninja

Secrets of the JS Ninja

Secret techniques of top JavaScript programmers. Published by Manning.

Ukiyo-e Database and Search

Ukiyo-e.org

Japanese woodblock print database and search engine.


John Resig Twitter Updates

@jeresig

Infrequent, short, updates and links.