Projects


XHTML, document.write, and Adsense

After some recent discussion concerning the use of document.write() in XHTML documents served with the doctype "application/xhtml+xml" I decided to revisit the problem. An issue with the solutions proposed by Sam and Ajaxian is that they aren't really solutions - just a lot of hand waving (not that that's bad, it's just that the problem is a lot harder than what they propose).

So I sat down and decided to write a semi-complete document.write() replacement for Firefox 1.5+, Opera 9, and Safari 2+ - all handling straight XHTML documents served with an "application/xhtml+xml" content-type.

Note: Notice that I completely ignore Internet Explorer. Since IE doesn't even know to render XHTML pages (served with the correct mimetype), I'm assuming that you're doing some form of browser sniffing in your code (on the server). If that's the case, then you may be serving a different version of the page, and not include the (at this point) unnecessary document.write() hack. If you want to serve only one version of the code, then I suggest that you use conditional comments, or do some client-side browser sniffing to serve the hack to those that need it. (This is mostly because I have yet to find a way to reliably detect a broken document.write() implementation.)

I had a couple of goals for my solution:

  1. It should be as faithful to the normal document.write() as possible. (This means arbitrary injection of XHTML into the DOM)
  2. It should inject the XHTML into the document at the current DOM position.
  3. It should correct for basic weird things that people do (like using write to add invalid XHTML to a document - and writing out closing tags). Stuff like this:
    document.write("<iframe src='test.html'>");
    // ... some code ...
    document.write("</iframe>");
  4. It should make Google Adsense work, with no code modification.

I'll start by saying that solving this problem in Mozilla "isn't that bad" nor is it in Opera. Safari is a royal PITA, which I'll talk about, more, later.

The vast majority of the cross-browser issues that occur relate to how innerHTML works in XHTML documents. In order to make document.write() work as you would expect it to, you need to write out straight (X)HTML. This topic has been discussed extensively by some of the great JavaScript and standards developers in the industry.

A Solution

So I've developed a basic solution to the document.write()/XHTML problem. The full code for which can found found below, along with a demo of it in action here:
http://ejohn.org/apps/write.xhtml

document.write = function(str){
    var moz = !window.opera && !/Apple/.test(navigator.vendor);
       
    // Watch for writing out closing tags, we just
    // ignore these (as we auto-generate our own)
    if ( str.match(/^<\//) ) return;

    // Make sure & are formatted properly, but Opera
    // messes this up and just ignores it
    if ( !window.opera )
        str = str.replace(/&(?![#a-z0-9]+;)/g, "&amp;");

    // Watch for when no closing tag is provided
    // (Only does one element, quite weak)
    str = str.replace(/<([a-z]+)(.*[^\/])>$/, "<$1$2></$1>");
       
    // Mozilla assumes that everything in XHTML innerHTML
    // is actually XHTML - Opera and Safari assume that it's XML
    if ( !moz )
        str = str.replace(/(<[a-z]+)/g, "$1 xmlns='http://www.w3.org/1999/xhtml'");
       
    // The HTML needs to be within a XHTML element
    var div = document.createElementNS("http://www.w3.org/1999/xhtml","div");
    div.innerHTML = str;
       
    // Find the last element in the document
    var pos;
       
    // Opera and Safari treat getElementsByTagName("*") accurately
    // always including the last element on the page
    if ( !moz ) {
        pos = document.getElementsByTagName("*");
        pos = pos[pos.length - 1];
               
        // Mozilla does not, we have to traverse manually
    } else {
        pos = document;
        while ( pos.lastChild && pos.lastChild.nodeType == 1 )
            pos = pos.lastChild;
    }
       
    // Add all the nodes in that position
    var nodes = div.childNodes;
    while ( nodes.length )
        pos.parentNode.appendChild( nodes[0] );
};

It's important to note what this solution does - and does not - work for.

  • The code will work perfectly for well-formed XHTML markup. This code only does basic "crappy HTML" checks. For example, if you do: document.write("<img src='foo.jpg'>") it'll correct it to become XHTML compliant (with the extra / at the end). However, doing document.write("<img src='foo.jpg'> <img src='bar.jpg'>"); will break - as only the last element in the document.write() is "fixed". (And even then, the fixing isn't very smart - it just adds a closing tag, which may not always be correct.) Much of this can be fixed with some smarter regular expressions. I took a stab at it, but cross-browser support for variable negative lookaheads seems to be shaky, at best.
  • When using innerHTML in an XML document in Opera and Safari, it assumes that all elements are just XML elements. For this reason the code forcefully puts all elements in the XHTML namespace. Again, this is pretty crude and may break some of your markup, but it's worked well for me so far.
  • The only extra purification that's performed is the conversion of ampersands (&) to their entity code (&amp;) - where appropriate. If you have other symbols (like < or >, then I can't make any guarantees.)
  • It's also interesting to note that two completely different methods of traversing the document had to be used. Mozilla-based browsers start acting really strange when you do getElementsByTagName("*") inline in an XHTML document. It will always work fine for the first document.write(), but all subsequent calls will revert back to the position of the last inline <script/>.
  • In the end, this is still not as good as document.write() since with .write() you can write out stuff like table rows, options, partial HTML, script elements, all without blinking an eye. The code to handle all of this is quite significant (having written the code to do it for jQuery, you can take my word for it). I don't plan on re-writing all of that special-case code, so please only use this solution for simple fixes.

Ok, so now that that's out of the way - let's see how well this works in the different browsers.

Firefox 1.5+ Opera 9 Safari 2 Webkit
(Safari 3)
Simple Text Insertion Pass Pass Fail Pass
Simple HTML Insertion Pass Pass Fail Pass
Google Adsense Pass Pass Fail Sort-of Fail

So here's the dirt on Safari. I spent many hours banging my head against the keyboard and finally admitted defeat in Webkit for Adsense and anything in Safari 2.0. Here's the issues:

  • Safari 2.0 completely rejects any attempts to use innerHTML in an XHTML document. It throws exceptions and simply will not let you do it. For this reason, Safari (as it is currently available) is a lost cause.
  • Webkit Nightlites (Safari 3.0) on the other hand, fixed the innerHTML problems - allowing it to work nearly flawlessly. You can see that on the demo page (in a Webkit Nightly) that the Google Adsense IFrame is inserted into the page - and a URL is even requested - however the Adsense script seems to be fundamentally flawed. Looking at the URL generated for Webkit vs. the URL generated for Firefox or Opera, it is apparent that the Adsense script simply isn't working correctly. So while, technically, Adsense does not (currently) work in the Webkit Nightly, with this hack, it seems like it's not by a fault of mine.

In all, this hack was an interesting experience - considering that every browser seems to behave in some sort of nonsensical fashion (in one way or another). I'm glad that there's, at least, a solution now for two of the major browsers (and possibly the next version of Safari too, after some more tinkering). I was, perhaps, most pleasantly surprised by Firefox's innerHTML/XHTML implementation. You feed it valid XHTML, it inserts it into the document. Any other value throws an exception. Very simple and logical.

As a side note: I'm going to try and feed some of this code back into jQuery, so that stuff like $(...).append("") will work as you might expect it to in the major browsers.

It's pretty obvious that writing XHTML documents with the preferred mimetype is still a ways off from real-world usage, however I'm more hopeful now than I was before - which is good, to say the least.

Tags: google, javascript, adsense, xhtml

Google Address Translation

This is a hack that brings the power of address translation (converting a US Postal Address into a Latitude/Longitude) to the Google Maps API - something that wasn't provided in the default distribution.



View the Demo! - Download the Code

This hack, which is completely reusable, is broken down into a couple portions.

Address Translation Proxy (written in Perl)
This portion of this project queries the open API provided by Geocoder, which offers free address translation for any postal address in the United States. (If you live in Canada, you may want to check out Geocoder.ca). The code is very very simple, the only reason why it's needed is due to the fact that Javascript applications can't make queries to services that aren't on the same domain. The code is so short, I can show it here:

#!/usr/bin/perl

use CGI;
use LWP::Simple;

my $cgi = new CGI();
my $a = $cgi->param('a');
my $d = get( "http://rpc.geocoder.us/service/rest?address=$a" );

$d =~ /geo:long>([^< ]*).*?geo:lat>([^< ]*)/is;
print "Content-type: text/plain\n\n";
print "$1,$2";

The above code does the following:

  • It gets the address from the browser - a query which looks something like this: gaddress.cgi?a=123+Main+St+Anywhere,+NY.
  • A query is made to the Geocoder service provided at this URL: http://rpc.geocoder.us/service/rest
  • Finally, the latitude and longitude are parsed out of the results and returned to the Javascript client.
  • Javascript Addressing Querying
    This simple function, written in Javascript, makes a query to the Address Translation Proxy asking it to convert an address into a Google GPoint. This function has two parameters that need to be taken into consideration:

    function GAddress( String address, Function callback );

    The first argument, address, is a string representing the address that you want to translate (for example, "123 Main St. Anywhere, NY"). The second argument, callback, is a reference to a function which will be called once the translation is complete. That function will be called with two arguments:

    function callback( GPoint point, String address );

    The first argument, point, will either be a GPoint representing the latitude/longitude of an address OR null, if the address does not exist. The second argument is the same address as what was sent when you called GAddress.

    Now, using both of these components, it's time to wrap them together and put them to use! If you're interested to see what a final result looks like, check out this demo.

    If you'd like to put this code to use, feel free to download the code below and give it a try!

    Download

    • gaddress.tar.gz - Contains sample index.html, Javascript Query Function (gaddress.js), and Address Translation Proxy (gaddress.cgi). To install:
      1. Copy the contents of the archive to your web directory.
      2. Run the following command, from the command-line (or your favorite FTP client) chmod 0755 gaddress.cgi
      3. Go to the Google Maps API signup page and generate an API key for the URL where you uploaded the files.
      4. Finally, get the API key which you generated, open index.html, and change key=CHANGEME to represent your API key.
      5. You should be good to go! Have fun!

    Tags: hacks, google, perl, popular, address, maps, geocoder, geo

    Google Search History RSS

    This tool goes through your current Google Search History, grabs all of your recent searches and turns it into an RSS feed. Would work best set up as a nightly/hourly cron job, redirecting to a file.

    This tool is written in Perl and uses a few, slick, modules: WWW::Mechanize, XML::LibXML, and XML::RSS. I was influenced by the very nice webscrape tool when building this.

    A sample, from my searches, can be found here:
    http://ejohn.org/apps/ghistory/google.rdf

    And how it looks in my newsreader (Newsgator):
    Google Search History RSS Feed

    Downloads

    Tags: perl, google, search, rss, popular

    JavaScript Books

    Secrets of the JavaScript Ninja

    JavaScript Secrets

    Secret techniques of top JavaScript programmers.

    Pro JavaScript Techniques

    Pro JavaScript

    The best techniques for professional JavaScript. Published by Apress.

    Micro Updates

    John Resig Twitter Updates

    @jeresig

    Infrequent, short, updates and links.

    JavaScript Jobs



    Hosting provided by: Ruby Hosting by Engine Yard