TestSwarm, the project that I've been working on over the past 6 months, or so, is now open to the public. Mozilla has been very gracious, allowing me to work on this project exclusively. At the beginning of April I moved from my old position as a JavaScript Evangelist on the Mozilla Evangelism team to that of a JavaScript Tool Developer on the new Developer Tools team (whose other major project is Bespin).
For more information on Test Swarm I've written up a detailed explanation of what Test Swarm provides and where it fits into the landscape of JavaScript developer tools.
TestSwarm ended up being a very challenging project to get to an alpha state (and probably will be even more challenging to get to a final release state). Dealing with cross-browser incompatibilities, cross-domain test suite execution, and asynchronous, distributed, client execution has been more than enough to make for a surprisingly difficult project. It's mostly written in PHP and uses MySQL as a back end (allowing it to run in virtually any environment). Patches will absolutely be appreciated.
This project has been a long time coming now, the first inklings started back in 2007. Some of us on the jQuery team were discussing ways to distribute the test suite load to multiple browsers in an automated fashion. Andy Kent came along and proposed a participatory application for testing visual code (such as jQuery UI). We worked on that code base for a while but it didn't get off the ground. Eventually I decided to re-tackle the problem early on in 2009. Even in its rough alpha state we've already been able to make great use of TestSwarm. For example, here's a view of jQuery commits run in TestSwarm:
The vertical axis is SVN commits to jQuery (newer commits at the top), the horizontal axis are all the different browsers that we target. Using TestSwarm we've been able to easily spot regressions and fix them with a minimum amount of hassle (especially since all the results are logged).
And this is only the beginning. There are so many different directions in which Test Swarm can be taken. For example:
A pastebin-like service where you can drop in code and see the results come back, from many browsers, in real-time.
IDE integration for sending minor changes out for quick testing.
Manual testing of user interface code. Pushing manual tests, with instructions, to users for them to walk through.
Distributing tests to any number of browsers, rather than a specific sub-set. (You could use this to embed a tiny iframe in your site to collect test results from a small sampling of our users.)
The ability to drive and test browser code or extensions.
And the list goes on. I'm definitely curious to see what directions the community is interested in driving the code base. I've gotten it to a level where it's particularly useful for me and the jQuery team - where should we go from here?
I'm in the process of working on, and improving, test suite support in TestSwarm (an upcoming project of mine). However, there isn't a lot of information on which unit testing frameworks developers actually use to test their code (whereas there is more information on which JavaScript libraries are used).
It will be of great help to me if you could quickly fill out the question below. I will release the results of the survey as soon as possible. Thanks!
The poll is now closed. I've received 1853 responses and plan on writing a detailed blog post on the results. Thanks everyone!
It's become increasingly obvious to me that cross-browser JavaScript development and testing, as we know it, does not scale.
jQuery's Test Suites
Take the case of the jQuery core testing environment. Our default test suite is an XHTML page (served with the HTML mimetype) with the correct doctype. In includes a number of tests that cover all aspects of the library (core functionality, DOM traversal, selctors, Ajax, etc.). We have a separate suite that tests offset positioning (integrating this into the main suite would be difficult, at best, since positioning is highly dependent upon the surrounding content). This means that we have, at minimum, two test suites straight out of the gate.
Next, we have a test suite that serves the regular XHTML test suite with the correct mimetype (application/xhtml+xml). We aren't 100% passing this one yet, but we'd like to be able to sometime before jQuery 1.4 is ready. Additionally, we have another version that we're working on that serves the regular test suite but with its doctype stripped (throwing it into quirks mode). This is another one that we would like to make sure we're passing completely in time for 1.4.
Both of those tweaks (one with correct mimetype and one with no doctype) would also need to be done for the offset test suite. We're now up to 6 test suites.
We have another version of the default jQuery test suite that runs with a copy of Prototype and Scriptaculous injected (to make sure that the external library doesn't affect internal jQuery code). And another that does the same with Mootools. And another that does the same for old versions of jQuery. That's three more test suites (up to 9).
Finally, we're working on another version of the suite that manipulates the Object.prototype before running the test suite. This will help us to, eventually, be able to work in that hostile environment. This is another one that we'd like to have done in time for jQuery 1.4 - and brings our test suite total up to 10.
We're in the initial planning stages of developing a pure-XUL test environment (to make sure jQuery works well in Firefox extensions). Eventually we'd like to look at other environments as well (such as in Rhino + Env.js, Rhino + HTMLUnit, and Adobe AIR). I won't count these non-browser/HTML environments, for now.
At minimum that's 10 separate test suites that we need to run for jQuery. Ideally, we should be running every one of them just prior to committing a change, just after committing a change, for every patch that's waiting to be committed, and before a release goes out...
in every browser that we support.
The Browser Problem
And this is where cross-browser JavaScript unit testing goes to crazy town. In the jQuery project we try to support the current version of all major browsers, the last released version, and the upcoming nightlies/betas (we balance this a little bit with how rapidly users upgrade browsers - Safari and Opera users upgrade very quickly).
At the time of this post that includes 12 browsers.
Internet Explorer 6, 7, 8. (Not including 8 in 7 mode.)
Firefox 2, 3, Nightly.
Safari 3.2, 4.
Opera 9.6, 10.
Chrome 1, 2.
Of course, that's just on Windows and doesn't include OS X or Linux. For the sake of sanity in the jQuery project we generally only test on one platform - but ideally we should be testing Firefox, Safari, and Opera (the only multi-platform browsers) on all platforms.
The end result is that we need to run 10 separate test suites in 12 separate browsers before and after every single commit to jQuery core. Cross-Browser JavaScript testing does not scale.
Of course, this is just desktop cross-browser JavaScript testing - we should be testing on some of the popular mobile devices, as well. (MobileSafari, Opera Mobile, and possibly NetFront and Blackberry.)
Manual Testing
All of the above test suites are purely automated. You open them up in a browser, wait for them to finish, and look at the results - they require no human intervention whatsoever (save for the initial loading of the URL). This works for a lot of JavaScript tests (and for all the tests in jQuery core) but it's unable to cover interactive testing.
Some test suites (such as Yahoo UI, jQuery UI, and Selenium) have ways of automating pieces of user interaction (you can write tests like 'Click this button the click this other thing'). For most cases this works pretty well. However all of this is just an approximation of the actual interaction that a user may employ. Nothing beats having real people manually run through some easily-reproducible (and verifiable) tests by hand.
This is the biggest scaling problem of all. Take the previous problem of scaling automated test suites and multiply it the number of tests that you want to run. 100 tests in 12 browsers run on every commit by a human is just insane. There has to be a better way since it's obvious that Cross-Browser JavaScript testing does not scale.
What currently exists?
The only way to tackle the above problem of scale is to have a massive number of machines dedicated to testing and to somehow automate the process of sending those machines test suites and retrieving their results.
There currently exists an Open Source tool related to this problem space: Selenium Grid. It's able to send out tests to a number of machines and automatically retrieve the results - but there are a couple problems:
As far as I can tell, Selenium Grid requires that you use Selenium to run your tests. Currently no major JavaScript library uses Selenium (and it would be a major shift in order to do so).
It isn't able to test against non-desktop machines. Each server must be running a daemon to handle the batches of jobs - this leaves mobile devices out of the picture.
It can't test against unknown browsers. Each browser needs special code to hook in to triggering the loading of the browser by Selenium, thus an unknown browser (such IE 8, Opera 10, Firefox Nightly, or Chrome) may not be able to run.
And most importantly: Selenium Grid requires that you actually own and control a number of machines on which you can run your tests. It's not always feasible, especially in the world of distributed Open Source JavaScript development, to have the finances to have dedicated machines running non-stop. A more cost effective solution is required.
Naturally, this solution doesn't tackle the problem of manual testing, either.
A solution: TestSwarm
All of this leads up to a new project that I'm working on: TestSwarm. It's still a work in progress but I hope to open up an alpha test by the end of this month - feel free to sign up on the site if you're interesting in participating.
Its construction is very simple. It's a dumb JavaScript client that continually pings a central server looking for more tests to run. The server collects test suites and sends them out to the respective clients.
All the test suites are collected. For example, 1 "commit" can have 10 test suites associated with it (and be distributed to a selection of browsers).
The nice thing about this construction is that it's able to work in a fault-tolerant manner. Clients can come-and-go. At any given time there might be no Firefox 2s connected, at another time there could be thirty. The jobs are queued and divvied out as the load requires it. Additionally, the client is simple enough to be able to run on mobile devices (while being completely test framework agnostic).
Here's how I envision TestSwarm working out: Open Source JavaScript libraries submit their test suite batches to the central server and users join up to help out. Library users can feel like they're participating and helping the project (which they are!) simply by keeping a couple extra browser windows open during their normal day-to-day activity.
The libraries can also push manual tests out to the users. A user will be notified when new manual tests arrive (maybe via an audible cue?) which they can then quickly run through.
All of this help from the users wouldn't be for nothing, though: There'd be high score boards keeping track of the users who participate the most and libraries could award the top participants with prizes (t-shirts, mugs, books, etc.).
The framework developers get the benefit of near-instantaneous test feedback from a seemingly-unlimited number of machines and the users get prizes, recognition, and a sense of accomplishment.
There's already been a lot of interest in a "corporate" version of TestSwarm. While I'm not planning on an immediate solution (other than releasing the software completely Open Source) I would like to have some room in place for future expansion (perhaps users could get paid to run through manual tests - sort of a Mechanical Turk for JavaScript testing - I dunno, but there's a lot of fodder here for growth).
I'm really excited - I think we're finally getting close to a solution for JavaScript testing's scalability problem.
It's safe to say that the biggest tax on a web developer is spending so much time dealing with browser bugs and incompatibilities. Thus it has become the favorite past-time of all web developers to complain about having to deal with them. Browser bugs are annoying, frustrating, and make your job incredibly difficult.
Because browser bugs are so frustrating and such a burden on top of normal development it should be the responsibility of every web developer to make sure that the browsers they develop for are able to find and fix their bugs. By taking responsibility for the bugs that you find - and to not assume that "someone else will find it" - will accelerate the rate at which browsers can improve.
The solution to helping browsers is two-fold: 1) Every time you find a browser bug, file a bug report to the respective browser. 2) Actively test your sites in the latest builds of the major browsers.
The vast majority of web developers have never filed a bug report with a browser vendor - or even used a nightly version of a browser - which is a shame. If you think about it there are few who are more qualified to assess what is going wrong in a browser than those who spend every day developing for them.
I'm especially surprised when I see professional developers not filing bugs with browsers, or testing on nightlies. Since one of the primary tasks of most developers is to paper over cross-browser issues it becomes in their best interest to see the number of bugs reduced (and making their job dramatically simpler).
I've personally filed bug reports with every major browser vendor and I've noted a couple characteristics that make for a good report.
How to File a Good Bug Report
The three points that make for a good bug report are: categorization, test case, and reduction. Any bug that is categorized correctly and provides a reduced test case is guaranteed to be reviewed by a browser developer.
Let's start with where to file the bug itself.
Filing a Bug Report
Frequently when you go to file a bug report you have to wade a couple layers deep before you can get to the actual submission form. I've provided direct URLs to the best forms to use, below.
When you're filing your bug be sure to also test it in the latest nightly of the browser you're filing against (which I'll describe later on). This will be one of the first things that a browser developer asks. If you can show that the bug still exists in the current development version of the browser, that it has not been fixed, then it'll be that much easier for them to get started.
Note: Many of the bug report pages require that you create an account before submitting a report. This is an annoying one-time cost.
Categorizing the Bug
Categorizing a bug properly is an important first step. Frequently the owners of particular modules (such as layout or DOM) watch all new submissions that come in. Assigning a bug to the correct category will instantly put it before the eyes of the very person most capable of fixing the bug.
The categorization of a bug depends on the browser (some browsers, like Opera and Internet Explorer), provide simplified categories for filing a bug while others (WebKit/Safari and Mozilla/Firefox) use complex categories to denote the specific module where a piece of functionality might exist.
Mozilla/Firefox: Choose a Component. Some of the most common ones: DOM, Layout, JavaScript Engine.
WebKit/Safari: Choose a Component. Some of the most common ones: HTML DOM, Layout and Rendering, JavaScriptCore.
Google Chrome: Figuring out if you should file for Chrome is tricky. First test your bug in both the latest release of Safari and in the latest WebKit nightly. If the bug exists only in Chrome then file there, otherwise file the bug with WebKit/Safari. One problem, though: Bugs that only exist with the Chrome JavaScript engine (V8) should be filed in the V8 bug tracker instead of the Chrome one (lest it get lost in the shuffle).
Also, Chrome does not provide an explicit means of categorization. All bugs are reviewed by a developer and then categorized (you have no control over this process).
You should also take the quick step of testing on more than one platform (OS X and Windows, Windows and Linux, etc.). Simply determining that the bug exists on more than one platform can dramatically help to reduce the time needed to locate the cause of the bug by the browser developer.
Providing a Test Case
Any type of reproducible test case is better than nothing. A web page that's able to encapsulate the problem is generally a good start. If the web page is able to be attached to the bug report directly, that's even better (it may take a while for the browser developer to get around to your ticket and if your test case no longer exists at the URL that you specified then they'll likely just close your ticket and move on).
That being said there is such a thing as a bad test case. The worst kind is something like: "I have a web site at http://example.com and it doesn't work in browser X, please fix." That will take someone a considerable amount of time to locate the exact reasons for failure and will likely push your bug much farther back on the queue.
The best kind of test case is one that provides a reduction.
Providing a Reduction
Providing a simple test case is absolutely the hardest and most frustrating part of creating a bug report - but it's also the point that'll make your report most likely to be noticed and fixed. Even for the most qualified developers it should take no more than 30 minutes to create a good-enough test case.
The process for creating one is simple: Take a page that has the bug in it and rip out anything that doesn't affect the reproduction of the bug. This includes stylesheets, images, JavaScript files, JavaScript libraries, and HTML.
For example, a while back when I was running the Dromaeo test suite and I noticed that WebKit kept crashing when it hit a certain point. I began by ripping out tests, unnecessary HTML, CSS, and images. I eventually worked my way down to a single test: Splitting a string. I then worked to strip out as much of the test suite as possible so that there were no external dependencies required.
What was left was about as simple as you could get while still having the crash occur. This is a good reduction. Based upon this reduction the reason for the problem was quickly identified and resolved just a couple weeks later.
Is my bug being worked on?
This is a challenging point - but one that's easier to determine with Mozilla/Firefox, WebKit/Safari, and Chrome (since they're all relatively open projects). Here's the best way to determine the status of the bug with each of those browsers.
Mozilla/Firefox: Your bug will start out being assigned to the default contact for the component category that you originally selected. This doesn't mean anything, yet, it's simply attracting people in to look at your ticket. People will start to CC themselves in to the ticket (which means that they're interested in its progress). The defining moment, though, is when someone assigns the bug to themselves, effectively stating that they are taking responsibility for the status of this bug. Most contributors have it set up so that any future comments on the bug are automatically emailed to them so if you have any questions about the status of the bug you can feel free to post a comment - but please do so at a reasonable pace (asking for daily, or even weekly, updates would be frustrating).
WebKit/Safari: WebKit uses a very similar setup to Mozilla/Firefox - just look for someone to take control of the bug and drive it to completion. The golden ticket, though, is when your bug "goes to rdar". Radar is Apple's internal (private) bug tracker. Having your bug move to there means that your bug is, effectively, guaranteed to be completed at some point (if not by the person who 'owns' it then by another Apple employee). Since Apple is still the major driving force behind WebKit updates having your bug move to rdar is what you should be hoping for. That being said, since rdar is private to Apple employees you no longer get the benefit of knowing if or when your bug will be completely fixed - it's just a waiting game.
Chrome: Chrome uses a system very similar to Mozilla/Firefox. Be sure to keep up communication with whom the bug is currently assigned to and to answer any questions that they might have.
If your bug eventually gets resolved then congratulations! You just helped to make the web a better place for everyone.
But that's not always the case.
What happens if my bug gets rejected?
Rejected bugs fall into two categories:
It was rejected because it was not a bug.
It was rejected because they browser vendor does not feel like working on it.
The first point is broken down into two further sub-categories:
It's legitimately not a bug. Congratulations! You've learned something about a standard, or some other browser obscurity, that you weren't familiar with: you're now a better web developer! You should fire up your blog and write about the obscure new bug or API that you discovered and explain it to the world.
Or it is a bug and the owner unnecessarily closed it. At this point you need to argue your case to re-open it.
The second point (the vendor does not feel like working on it) is also handled in two ways:
First, argue your case for the bug. This should help inform the browser developer that they should dedicate valuable resources to fixing this issue.
Or second, if they are truly unwilling to fix the bug: Raise holy hell on your blog, Twitter account, and any other place where other web developers will listen to you. If you can't find anyone else who agrees with your plight then you are probably crazy - but if it's a legitimate problem that the browser vendor is refusing to fix then you should easily find others with whom you can band together and complain openly to. But that's OK: You've earned this right. By doing all the due diligence necessary to bring this bug to light you earn full privileges of bitching about it in every other sentence.
Arguing a Bug
So you're at the point where your bug has been closed and you need to convince the closer that this was a mistake - that what you have is an actual, legitimate, bug.
Here are some of the best arguments that you can use, in order of which they should be used:
Show that the bug was a regression. Prove that it was something that worked in a previous release that stopped working because of a change. Works great in conjunction with #2.
Show real-world web sites breaking. If you can show that actual users are going to no longer visit X bank in Poland or Y shopping site in Canada then browsers will easily bend over backwards to fix the issue (unless it's Opera, in which case they may use their browser.js to force a site to work - but that's another story).
Show a web standard that is being violated by not fixing this bug. If you're able to show that the W3C DOM specification is not implemented correctly because a certain bug is not fixed then a browser vendor should feel compelled to fix it. If not, this will make for great blog fodder.
Show that not fixing the bug makes a browser incompatible with other browsers. If IE, Safari, and Opera all implement a specific feature or fix a specific bug then Firefox should be compelled to comply with the other implementations (as long as its not in contradiction with a specification). This is the hardest one to argue - but it becomes easier the more browsers that are on board.
If you can't prove any of those steps then you're probably just scratching your own itch anyway - and should lay off.
Examples
I want to show some representative examples of bugs that I've filed with different browser vendors.
WebKit/Safari
Canvas arc() with radius of 0 throws exception: Calling the Canvas arc() method was throwing an exception. I provided a super-simple test case and pointed to the specification that they were in disagreement with. Was resolved the same day it was posted.
Huge Speed Drop in Array.prototype.sort: Showed a regression with a simple reduction. Included a bunch of very-useful Shark profiling data to help pinpoint the exact issue. Was fixed within a month.
Implement .children: Argued that Mozilla needed to implement the .children method (which was in every other major browser). A good debate led to its eventual inclusion (about 6 months later).
At this time we do not plan on fixing this issue. We appreciate the report, but unfortunately we are at a stage where need to choose what we work on to maximize the value for customers and web developers.
So while they agree that it is an issue - they do not plan on fixing it. That means that I now get to complain about it!
HTMLElement.prototype.querySelectorAll doesn't exist: As it turns out this was because querySelectorAll only exists in standards-compliant pages (not in quirksmode). This is positively bizarre but I think I understand their rationale behind the decision. I suspect that this will bite a lot of people once IE 8 goes live, we'll see. I now know more about IE8 and am a better developer for it.
Google Chrome
for in loops don't occur in defined order: This was a compatibility issue (all other browsers behaved in a particular manner). I provided a simple test case. It was accidentally closed as WontFix (which caused confusion) but was actually fixed. I made a mistake here and actually filed this bug against Chrome when it should've been against V8 - here's the bug that dealt with the issue over there.
setTimeout(..., 0) fires too quickly: This was actually due to a structural change made by the Chrome team. Mike Belshe (the author of the change) personally emailed me to explain what happened. I became much more informed as a result and blogged about it.
Opera
(Opera does not provide an official public view for bug reports, that I know of.)
How to Test a Nightly Build
Before we get into the details of testing a nightly build of a browser I should probably answer the most common question: Why should I care about testing a nightly build of a browser? There are a couple reasons.
First, when filing a bug report you're going to need to determine if the bug you're submitting has already been fixed, or not. If it's already been fixed in a nightly then you don't need to worry about submitting it - the bug will be fixed in the next release. If the bug has not been resolved, though, then you should be sure to continue filing the bug.
Second, you should periodically test your site or library in the latest browser nightlies - to make sure that your code isn't going to break when the browser is released. How often you test is up to you - but the more frequently you test the more likely it is that your site or library won't hit a massive regression at some point. I think it's pretty safe to say that no developer likes finding out that there's a new browser version on the market that breaks their site.
Filing a bug for a nightly is just like filing any other bug. Provide a reduced test case and be sure to emphasize that a regression occurred. If you're testing frequently enough this should be sure to get the developers hopping into action.
Getting the Latest Nightly
Browser vendors provide a variety of techniques for getting the latest version of a browser. Some browsers release more frequently than others, as well (for example, Chrome updates multiple times a day, Firefox once a day, and Opera every couple weeks or so).
Mozilla/Firefox
Mozilla provides nightly releases of Firefox. It can be installed, and used, alongside your existing copy of Firefox using Firefox Profiles. Once you download one nightly release it should update itself, automatically, every day.
WebKit nightlies are easy to install on OS X - they can live completely side-by-side with no profile details. However they do not update automatically. I use NightShift to make sure that my WebKit nightly is kept up to date (on OS X).
Doing a nightly install on Windows is much more cumbersome (it involves running some scripts and copying files around) but it works.
Internet Explorer installs are "a big deal" - they completely blow away any previously-installed copy of the browser. For this reason you should be sure to use some tricks to keep multiple copies of Internet Explorer installed on your system. There's one installer that can handle IE 6 (and older) and another one for taking care of IE 7. Once you have all those standalone versions installed you can feel safe downloading and installing IE 8.
IE 8 automatically updated from each beta release but it doesn't appear to do it any more. If you sign up for the Microsoft IE Beta Connect program you can get more recent builds to test against. Again, all of these builds will overwrite older versions of the current browser.
Google provides multiple builds per day (one for each revision). These builds can live side-by-side with one another but they do not update automatically. One user built an automatic update application that you can use to make that happen.
Multiple versions of Opera can be installed side-by-side and they update automatically. They don't provide nightly builds (they come out every couple weeks, or so) but they should serve as a relatively-current example of the browser.
The importance of taking an active role in the future of web development cannot be overstated. Shifting from a passive position of hoping that other developers will be proactive about filing bugs or hoping that browser vendors will notice every possible regression to one of active diligence gives you an incredible amount of power. The minimal amount of work that you do to improve communication between the web community and browsers does volumes for helping to improve the quality of the entire web.
You should wear every bug that has gotten fixed, because of you, as a badge of honor: You've done your part to make the web a better place.
In my work with the Firebug team over the past couple months I've been working with Jan Odvarko on a way to provide some form of unit testing that we can build off of. The result of my work is a new Firefox/Firebug extension called FireUnit.
FireUnit provides a simple JavaScript API for doing simple test logging and viewing within a new tab of Firebug.
For example, here's some of the API that you can use (we're starting with the basics and looking to expand with more methods, later).
// Simple true-like/false-like testing
fireunit.ok(true, "I'm going to pass!");
fireunit.ok(false, "I'm going to fail!");
// Compare two strings - shows a diff of the // results if they're different
fireunit.compare( "The lazy fox jumped over the log.", "The lazy brown fox jumped the log.", "Are these two strings the same?" );
// Compare a string using a regular expression
fireunit.reCompare( /The .* fox jumped the log./, "The lazy brown fox jumped the log.", "Compare a string using a RegExp." );
// Display the total results
fireunit.testDone();
The results will appear in a 'Test' tab in Firebug (which must be installed in order for Fireunit to work). Each of the results can be expanded to show additional information including a full stack trace of where the test ran and a comparison with a diff.
FireUnit also provides a couple methods for simulating native browser events:
// You can also simulate browser events var input = document.getElementsByTagName("input")[0];
fireunit.mouseDown( input );
fireunit.click( input );
fireunit.focus( input );
fireunit.key( input, "a");
And a way of running a batch of test files (each of which would contain a number of individual tests).
// Or run multiple pages of tests:
fireunit.runTests("test2.html", "test3.html");
// Place at the end of every test file in order to continue
fireunit.testDone();
We've been using this test runner to run a number of Firebug tests, especially ones that are network based.
Depending on the suite it's pretty easy to adapt existing test suites to display their results in FireUnit.
Lately I've been investigating how Mozilla's build and test system works (there's a number of pieces that tend to have a pretty tight integration and I wanted to learn more). I asked developer Ben Hearsum and he kindly obliged. I've included the questions and information here in the hopes that others will be able to learn from it, as well.
There's two critical components to Mozilla's build and test infrastructure: Buildbot and Tinderbox - I was wondering if you could tell me about their relationship and integration.
Ben: I'd like to break this down a little bit more. Tinderbox consists of
two parts: Client and Server. The server is essentially just the Waterfall display. It sits on a server somewhere and reacts to incoming e-mail. The client is a set of scripts that knows how to do various interesting things (building, packaging, generating updates, etc) with Mozilla products.
Historically, builds are done on an infinite loop by tinderbox client, reporting back to the server at the start and end of each. It's a completely stateless system; the client sends out a specifically formatted e-mail, and the server acts on it. Because of the simplicity of communicating with the tinderbox server we can use it as a display, with Buildbot driving the builds.
That is only direct communication between the two. In some cases Buildbot does some post-processing of logs to get Tinderbox to display things directly on the Waterfall (f.e., unit test pass/fail numbers).
At this point, Buildbot is responsible for driving almost all of the new build, unit test, and talos infrastructure we bring up. Right now, we'll be sticking with Tinderbox as the main developer frontend. In the future we may want to present developers with the Buildbot Waterfall instead -- but we've got some feature parity to address before that can happen.
How are unit tests integrated into the Buildbot and Tinderbox systems? How are different types of tests handled?
Ben: With the exception of reporting back to the Tinderbox server all of our unit test infrastructure is 100% Buildbot. (I'm not even sure tinderbox client can run unit tests, actually.) We've got custom Buildbot steps that run our various tests and parse the output. These classes deal with the different types of tests - how they run, how to parse the output they generate, etc.
Currently, unit test output is only available to the outside world via Tinderbox. It shows a quick pass/fail/skip for each test. Internally, we mostly watch the Buildbot Waterfall. For the curious, here's what that looks like:
What if a developer wants to test out the implications of a patch before landing it and affecting all other developers?
Ben: The Try Server is exactly what you want here. You can submit a patch to it (or a set of HG repositories, if you're testing Mozilla2 code) - and have it compile, package, and upload a build for you. Recently, we added two new features: rudimentary Talos and a win32 symbol server. For instructions on how to use it swing by the wiki page.
Buildbot appears to be used all throughout the Mozilla infrastructure - is its use applicable to other projects?
Ben: It's true, we use Buildbot a lot here. Off the top of my head we use
it for: Release Automation, Try Server, unit tests, Talos, misc. test infrastructure, l10n, and probably some things I'm unaware of. One of the great things about Buildbot is that it's an active project with a healthy community. This is one of the reasons we want to move to it. It's used by many different projects, including: Python, Twisted, KDE, WebKit, Subversion, OpenOffice, Gnome - to name a few.
Buildbot is built in a very extensible and customizable way. It's relatively easy, even for a project like Mozilla (whose build system/process is quite odd) to start driving infrastructure with Buildbot.
Monitoring Tinderbox frequently serves as the heart of Mozilla development. What does this tool provide that makes it so important to developers?
Ben: Because developers still use the Tinderbox server as their source of information there isn't currently much direct benefit to them - other than us Release Engineering folk having more time to do interesting things.
In the future we may want to give developers direct access to the Buildbot waterfall. This is where we can make developers' lives easier. Ideally (depending on what features get implemented) we'd want to provide the following:
The ability to trigger a clobber build from the web interface (no more CLOBBER files).
Lots of different ways to display data ("give me the latest build from each builder", "give me a list of builds from builder X", etc).
Build status via IM/RSS/E-Mail.
Specific to the try server
Ability to stop a running build (useful when one platform fails quickly - you can save time by canceling the others).
Ability to submit a patch directly from the Buildbot Waterfall.
--
I want to thank Ben for taking the time to answer these questions for me. I've been really impressed with Mozilla's continuous integration set up - especially the use of Buildbot. I suspect that I'd like to have something similar set up for jQuery (doing automated testing, etc.) but a lot of work is still required to make that happen. At least it doesn't look that hard to get started, which is quite important.
Dromaeo is the name that I've given to the JavaScript performance test suite that I've been working on over the past couple months.
I was hoping to hold off on this release for another week or two, while I finished up some final details, but since it's been discovered, and about to hit the Digg front page, there isn't a whole lot that I can do to stop it.
There's a ton of details concerning how it works, and how to use it, on the Dromaeo wiki page. I won't go through too much of it here, but it should clarify most question there.
Probably the most pressing question that'll be encountered (outside of what is answered on the wiki page) is "What is the relation of Dromaeo to SunSpider?" (SunSpider being the WebKit team's JavaScript testing suite).
Right now I'm working very closely with all the browser vendors to make sure that we have a common-ground test suite that is both highly usable and statistically sound (not to mention providing results that are universally interesting). There are a number of outstanding concerns that've been raised by users of the suite, along with a number of concerns that've already been rectified - again, all of this is clarified on the Dromaeo wiki page. It's of the utmost concern that this suite be as applicable as possible. It's very likely that the core suite will be moving to a common working ground where all browser vendors can work on it.
I especially want to thank Allan Branch of LessEverything who provided the awesome design for the site. It's like he tapped into my brain and produced exactly what I wanted - without knowing even it. I highly recommend them, if you have design work that needs to be done.
Update: The Acid3 test is now final, I've updated the blog post to reflect this.
The new news on the block is that the upcoming Acid 3 test is in the oven, starting to get baked. Traditionally, the Acid test has served as a way to get browser vendors in line by testing them on really-annoying edge cases. This can, sometimes, get people tied up in knots but it actually serves as a devious way of getting people to meet a large part of a spec.
For example, in order for a browser to have some weird padding/margin test case solved - in CSS - they must also have a working box model. So while an Acid test may not, explicitly, test for a working box model, it will be done implicitly (by testing edge cases that result from it).
With that in mind, it's time to take a look at Acid 3 which primarily focuses on technology that I find to be interesting: ECMAScript and the DOM. Let's dig in and see what exactly is being tested - specifically, relating to ECMAScript.
Array Elisions - Making sure that stuff like [,,] has a length of 2 and [0,,1] has a length of 3.
Array Methods - Doing an unshift with multiple arguments .unshift(0, 1, 2), joining with an undefined argument .join(undefined).
Number Conversion - Banging against .toFixed(), .toExponential(), and .toPrecision() - especially with decimals and negative numbers.
String Operations - Negative indicies in substr .substr(-7, 3), character access by index "foo"[1] (part of the ECMAScript 4 spec).
Date - Making sure that certain method calls result in NaN results (like d.setMilliseconds(), with no arguments) and also enforcing +1900 year offsets.
Unicode in Identifiers - You can't use escaped Unicode in identifiers, for example: eval("test.i\\u002b= 1;"); (that should throw an exception).
Regular Expressions - /[]/ matches an empty set, /[])]/ should throw an exception, backreferences to non-existent captures, and negative lookaheads /(?!test)(test).exec("test test").
Enumeration - Make sure that object properties are enumerated in the correct order, make sure that you're able to enumerate properties of certain names (toString, hasOwnProperty, etc.).
Function Constructors - The user should be able to set custom constructors on the .constructor property, .constructor should not be enumerable, and .prototype.constructor should be deletable.
Function Expressions - (function test(){ ... })(); You should be able to call the function by name, within the function itself, you can't directly overwrite the function name (only with a function-scoped variable), and 'test' isn't leaked into the parent scope.
Exception Scope - Variables within the catch(){} should interact with the catch arguments primarily, followed by variables in an outer scope.
Assignment Expressions - s = a.length = "123"; - a.length has a return value of 123 (the number) which is assigned to 's', rather than the correct result of the string "123".
Encoding - encodeURI() and encodeURIComponent() must gracefully handle null bytes.
All-in-all it's a comprehensive smattering of weird ECMAScript edge cases - you're bound to find at least one that fails in your favorite browser-of-choice. I'm sure we'll see many more test cases coming in, in the upcoming days in weeks.
I'm looking forward to seeing the final results - and the competitive heat that's been applied to CSS-spec implementors being applied to ECMAScript implementors.
For kicks, here's the current results in a bunch of major browsers (including the correct reference rendering).
These are the preliminary results of the UNCOMPLETED Acid 3 test in UNCOMPLETED versions of major browsers - take with a grain of salt. Go here to view the final version of this test.