Blog


EtherPad: Real-time Editing with JavaScript

I had the opportunity, last year, to talk with the team behind AppJet. They're building something quite cool: A simple platform for developing reusable server-side applications written completely in JavaScript.

They've come a long way since I originally wrote about them late last year. They now even provide a copy of their server-side software along with the full source. This, together with Aptana's Jaxer, means that there is, at least, two high-powered, Open Source, JavaScript server platforms.

EtherPad is something new altogether. Building upon their existing platform, and adding in Comet streaming, they've constructed a completely real-time, multi-user, text and JavaScript editor.

I use two editors in my day-to-day work: vim and SubEthaEdit (in fact I'm writing this blog post in SubEthaEdit, at the moment) - and I can say pretty definitively that EtherPad is just like SubEthaEdit.

I had the opportunity to use it last week with four people all simultaneously editing a document. It has the characteristic SubEthaEdit feature: All changes, by any user, occur in near-real-time and are highlighted with that user's chosen color.

Some may wonder how this is different from Google Docs. Let me just say that SubEthaEdit and EtherPad are in a completely different league from Google Docs: I've used all three pieces of software for multiple-editing a document and the responsiveness that you get from SubEthaEdit/EtherPad makes for an unparalleled experience. It's really common to see users start chat discussions within a document simply because it's so easy to see their response and get a discussion going.

EtherPad does have one major distinction from SubEthaEdit, though: The ability to save and restore page revisions. At any point you can hit a large 'Save Now' button on the page to tag a revision - and then go back and restore from it at any point. In many ways this makes the software more like a real-time, multi-user editable, Wiki.

The most exciting thing for me though, and a point which I think is unparalleled, the entire application is built using JavaScript from the bottom up. The server code is in JavaScript, the database is in JavaScript, and the frontend is in JavaScript - it's a complete JavaScript stack. The AppJet team plans on releasing this new server-side software (similar to their previous release but with the addition of Comet functionality and other pieces) completely Open Source as well. I look forward to being able to give it a spin when the time comes.

Tags: rhino, javascript

Pure JavaScript HTML Parser

Recently I was having a little bit of fun and decided to go about writing a pure JavaScript HTML parser. Some might remember my one project, env.js, which ported the native browser JavaScript features to the server-side (powered by Rhino). One thing that was lacking from that project was an HTML parser (it parsed strict XML only).

I've been toying with the ability to port env.js to other platforms (Spidermonkey derivatives and the ECMAScript 4 Reference Implementation) and if I were to do so I would need an HTML parser. Because of this fact it became easiest to just write an HTML parser in pure JavaScript.

I did some digging to see what people had previously built, but the landscape was pretty bleak. The only one that I could find was one made by Erik Arvidsson - a simple SAX-style HTML parser. Considering that this contained only the most basic parsing - and none of the actual, complicated, HTML logic there was still a lot of work left to be done.

(I also contemplated porting the HTML 5 parser, wholesale, but that seemed like a herculean effort.)

However, the result is one that I'm quite pleased with. It won't match the compliance of html5lib, nor the speed of a pure XML parser, but it's able to get the job done with little fuss - while still being highly portable.

htmlparser.js:

4 Libraries in One!

There were four pieces of functionality that I wanted to implement with this library:

A SAX-style API

Handles tag, text, and comments with callbacks. For example, let's say you wanted to implement a simple HTML to XML serialization scheme - you could do so using the following:

var results = "";

HTMLParser("<p id=test>hello <i>world", {
  start: function( tag, attrs, unary ) {
    results += "<" + tag;

    for ( var i = 0; i < attrs.length; i++ )
      results += " " + attrs[i].name + '="' + attrs[i].escaped + '"';

    results += (unary ? "/" : "") + ">";
  },
  end: function( tag ) {
    results += "</" + tag + ">";
  },
  chars: function( text ) {
    results += text;
  },
  comment: function( text ) {
    results += "<!--" + text + "-->";
  }
});

results == '<p id="test">hello <i>world</i></p>"

XML Serializer

Now, there's no need to worry about implementing the above, since it's included directly in the library, as well. Just feed in HTML and it spits back an XML string.

var results = HTMLtoXML("<p>Data: <input disabled>")
results == '<p>Data: <input disabled="disabled"/></p>'

DOM Builder

If you're using the HTML parser to inject into an existing DOM document (or within an existing DOM element) then htmlparser.js provides a simple method for handling that:

// The following is appended into the document body
HTMLtoDOM("<p>Hello <b>World", document)

// The follow is appended into the specified element
HTMLtoDOM("<p>Hello <b>World", document.getElementById("test"))

DOM Document Creator

This is a more-advanced version of the DOM builder - it includes logic for handling the overall structure of a web page, returning a new DOM document.

A couple points are enforced by this method:

  • There will always be a html, head, body, and title element.
  • There will only be one html, head, body, and title element (if the user specifies more, then will be moved to the appropriate locations and merged).
  • link and base elements are forced into the head.

You would use the method like so:

var dom = HTMLtoDOM("<p>Data: <input disabled>");
dom.getElementsByTagName("body").length == 1
dom.getElementsByTagName("p").length == 1

While this library doesn't cover the full gamut of possible weirdness that HTML provides, it does handle a lot of the most obvious stuff. All of the following are accounted for:

  • Unclosed Tags:
    HTMLtoXML("<p><b>Hello") == '<p><b>Hello</b></p>'
  • Empty Elements:
    HTMLtoXML("<img src=test.jpg>") == '<img src="test.jpg"/>'
  • Block vs. Inline Elements:
    HTMLtoXML("<b>Hello <p>John") == '<b>Hello </b><p>John</p>'
  • Self-closing Elements:
    HTMLtoXML("<p>Hello<p>World") == '<p>Hello</p><p>World</p>'
  • Attributes Without Values:
    HTMLtoXML("<input disabled>") == '<input disabled="disabled"/>'

Note: It does not take into account where in the document an element should exist. Right now you can put block elements in a head or th inside a p and it'll happily accept them. It's not entirely clear how the logic should work for those, but it's something that I'm open to exploring.

You can test a lot of this out in the live demo.

While I doubt this will cover all weird HTML cases - it should handle most of the obvious ones - at least making HTML parsing in JavaScript feasible.

Tags: javascript, html, rhino, parsing

Bringing the Browser to the Server

This weekend I took a big step in upping the ante for JavaScript as a Language. At some point last Friday evening I started coding and didn't stop until sometime mid-Monday. The result is a good-enough browser/DOM environment, written in JavaScript, that runs on top of Rhino; capable of running jQuery, Prototype, and MochiKit (at the very least).

The implications of this are phenomenal, and I'm not the only one who's interested in it what this could mean for server-side JS development. More on that in a minute, but first here's some sample results from running jQuery:

jQuery

$ java -jar build/js.jar
Rhino 1.6 release 6 2007 06 28
js> load('build/runtest/env.js');
js> window.location = 'test/index.html';
test/index.html
js> load('dist/jquery.js');
// Add pretty printing to jQuery objects:
js> jQuery.fn.toString = DOMNodeList.prototype.toString;
js> $('span').remove();
[ <span#台北Taibei>, <span#台北>, <span#utf8class1>,
  <span#utf8class2>, <span#foo:bar>, <span#test.foo[5]bar> ]
// Yes - UTF-8 is support in DOM documents!
js> $('span')
[  ]
js> $('div').append('<span><b>hello!</b> world</span>');
[ <div#main>, <div#foo> ]
js> $('span')
[ <span>, <span> ]
js> $('span').text()
hello! worldhello! world

On a whim, I then plugged in Prototype and MochiKit, both of which appeared to work OK (I haven't done any significant testing with them - so there's probably gaps). Here's some sample results:

Prototype

$ java -jar build/js.jar
Rhino 1.6 release 6 2007 06 28
js> load('build/runtest/env.js');
js> window.location = 'test/index.html';
test/index.html
js> load('prototype.js');
js> $$('div p')
<p#firstp>,<p#ap>,<p#sndp>,<p#en>,<p#sap>,<p#first>
js> Object.toJSON({foo:'bar',baz:true});
{'baz': true, 'foo': 'bar'}
js> var fn = (function(name,msg){
  print(name + ' ' + msg); }).curry('John');
js> fn('hello!');
John hello!

MochiKit

$ java -jar build/js.jar
Rhino 1.6 release 6 2007 06 28
js> load('build/runtest/env.js');
js> window.location = 'test/index.html';
test/index.html
js> load('Mochikit.js');
js> $$('div')
<div#main>,<div#foo>
js> document.body.innerHTML = '';
js> document.body.appendChild( P( 'test',
  A({href:'http://google.com/'}, 'link')) );
js> document.body.innerHTML
<p>test<a href='http://google.com/'>link</a></p>
js> $$('a')
<a>

I just want to emphasize that these are un-modified copies of jQuery, Prototype, and MochiKit - all running perfectly in this un-natural environment.

When I came up with this idea for an environment, I was mulling over a couple ideas: Namely, better ways of automating tests and ways to bring JS-style DOM/HTML interaction to the server-side. Having a way to bring this popular idiom to established problem sets seemed like a lot of fun.

In short, the following (at the very least) can all get a big dose of JavaScript:

  • Automated Testing
  • Screen Scraping
  • Web Application Development

Now, if you think I'm crazy, I'd like to show you a couple quick examples:

Automated Testing

$ java -jar build/js.jar
Rhino 1.6 release 6 2007 06 28
js> load('build/runtest/env.js');
js> window.location = 'test/index.html';
test/index.html
js> load('dist/jquery.js');
js> load('build/runtest/testrunner.js');
js> load('src/jquery/coreTest.js');
PASS (1) [core] Array.push()
PASS (2) [core] Function.apply()
PASS (3) [core] getElementById
PASS (4) [core] getElementsByTagName
PASS (5) [core] RegExp
PASS (6) [core] jQuery
...

Oh yes, that's right - the full jQuery test suite is now automated and capable of running in Rhino (passing all tests). jQuery served as my initial testbed for development, making sure that I was getting all of my code right. So if you import a copy of jQuery into this environment, it should work "just fine".

By the way, you can try out the automated test suite by getting a copy of trunk/jquery out of SVN, then running make runtest - the results are just awesome.

Screen Scraping

This is one part that works pretty well right now - with the huge caveat that it only works on well-formed XML documents (oops!). I'll be integrating an HTML parser into the code base so that we can make this functionality a little more resilient. In the meantime, here's an example of the sort of scraping that you can do currently:

load("env.js");
window.location = "http://alistapart.com/";
window.onload = function(){
  load("dist/jquery.js");
  print("Newest A List Apart Posts:");
  $("h4.title").each(function(){
    print(" - " + this.textContent);
  });
};

And here's another one that writes the results out to a file:

load("env.js");
window.location = "http://alistapart.com/";
window.onload = function(){
  load("dist/jquery.js");
  var str = "Newest A List Apart Posts:\n";
  $("h4.title").each(function(){
    str += " - " + this.textContent + "\n";
  });
  var out = new XMLHttpRequest();
  out.open("PUT", "file:/tmp/alist.txt");
  out.send( str );
};

Oh yeah, I went there - I made PUT and DELETE requests to local files perform the expected actions. I think the result is hilarious.

Web Application Development

This is still a work in progress, but some of the initial ideas are already at play here in this environment. When I have some time I plan on making a JavaScript-based web app framework out of this - which should be pretty cool.

Here's some psuedo-code for how I think it could work:

window.onload = function(){
  print("Content-type: text/html\n");
  if ( location.href == "/" )
     show_home();
  print( document.innerHTML );
};
function show_home(){
  document.load("index.html");
  document.getElementById("time").innerHTML = (new Date()).toString();
}

Download!

Check out the code - there's still huuuge gaps of functionality missing - I only implemented the bare minimum to get this environment working (and passing the jQuery test suite). So your mileage may vary.

Download: http://jqueryjs.googlecode.com/svn/trunk/jquery/build/runtest/env.js (Formatted)

NOTE (February 2009): The above code is quite out of date. If you're interested in using it in your project I recommend that you visit the following Google group and download the code from the current working fork:

How to Use

To start with, you'll need to have, at least, Rhino 1.6R6. You can download it from Mozilla FTP.

Now download the env.js script and put it in the same directory as the Rhino js.jar.

In order to use it from the command-line, you'll wanna do something like this:

$ java -jar js.jar
js> load('env.js');
js> window.location = 'some.html';
some.html
js> // Your code here!

It's important that you do window.location = "some file" before loading any DOM-dependent code (as the 'document' object doesn't exist before the location request).

A full list of Rhino-shell-specific commands can be found in the Rhino Shell docs.

If you want to write executable scripts, the contents will look something like this:

load('env.js');
window.location = 'some.html';
window.onload = function(){
  // Your code here
};

Which can then run like so: java -jar js.jar myscript.js.

Feedback is very much welcome - I've only thought of a couple use-cases thus far, but I'm sure that the surface is just being scratched.

Tags: firefox, ecmascript, java, rhino, mozilla, javascript

JavaScript as a Language

For my work at Mozilla, I'm gearing up to talk more about JavaScript 2.0. This involves a lot of things (from reading up on the specification, looking at non-web-based uses of JavaScript, to teaching myself SML). Perhaps most challengingly, however, is the struggle that I've been facing to quantify and understand the shifts being made in the language - and how that relates to JavaScript programming in general.

I think we've seen the JavaScript language move through many individual phases:

  • The "We need scripting for web pages" phase. (Netscape)
  • The "We should standardize this" phase. (ECMAScript)
  • The "JavaScript isn't a toy" phase. (Ajax)
  • The "JavaScript as a programming language" phase.

I'm surmising that there's this new phase that we're starting to enter, one where JavaScript will be treated as a significant programming language - divorced from the concept of web development. Two significant movements lead me to believe that we're at the start of a new era for JavaScript.

JavaScript Speed

A good deal of energy has been put into worrying about JavaScript performance. This is a great sign. It's sort of a natural progression for a language (worry about implementation, then standardization and compliance, and finally speed).

For proof, look at the work that's being done by the different browser vendors:

  • Mozilla is working on Tamarin (JIT JavaScript)
  • Apple is working on Webkit/Safari 3 (Revamped JS Engine)
  • Opera is releasing a new JS Engine in Opera 9.5 (New features and speed improvements)
  • Microsoft is working on Internet Explorer 8.0 (A bunch of new JS work)

Non-Web-based Use

I've been reading a lot about the use of JavaScript in non-"traditional" situations; especially in relation to the use of Rhino (the JavaScript implementation that sits on top of Java and the JVM).

Specifically, two projects have really stood out as having a lot of potential.

JavaScript on Rails - Granted, at this point, this project may as well be pure vaporware, but it's caught the attention of the right people. When one of the most popular software bloggers talks about how there's a "next big language" coming up and then announces his massive re-write of the popular Ruby on Rails framework, in JavaScript, running on Rhino - people tend to pay attention.

Helma - This web application framework is a long standing stalwart of server-side development with JavaScript (again, using Rhino). Surprisingly, it's managed to fall through the cracks with just about every JavaScript developer that I know. I recently noticed it, and after some startup friends of mine revealed that they're developing an application based on it, I became convinced that we'll be hearing about this little framework in the upcoming months.

All of this leads me up to a point: JavaScript is actively advancing, as a language. While it's most popular domain will probably always be in web browsers (with new JavaScript engines pointing in that continued direction), the advancement of server-side uses of JavaScript will only make for a much larger area for possible development in the upcoming years.

This is all a convoluted way of saying that this is the perfect opportunity to introduce some much needed changes into the language - completing the extended transition of JavaScript from a toy to a professional development tool.

Tags: mozilla, ecmascript, programming, javascript, rhino

JavaScript Books

Secrets of the JavaScript Ninja

JavaScript Secrets

Secret techniques of top JavaScript programmers.

Pro JavaScript Techniques

Pro JavaScript

The best techniques for professional JavaScript. Published by Apress.

Micro Updates

John Resig Twitter Updates

@jeresig

Infrequent, short, updates and links.

JavaScript Jobs



Hosting provided by: Ruby Hosting by Engine Yard