John Resig - Poignant Problems with Perf

Poignant Problems with Perf

The world of performance analysis in JavaScript is a strange land. I’ve had the “pleasure” of being involved in two JavaScript performance-related debates: The speed of JavaScript-implemented CSS Selector libraries (via jQuery) and the speed of native browser implementations of JavaScript (via Mozilla). I want to go over a couple things that I’ve learned – much of which, I’m sure, is universally applicable to all types of speed tests.

It’s damn easy to sculpt tests to make yourself look good – or others look bad. I’d be a liar if I said that I didn’t knowingly write tests that emphasized our best selectors in jQuery. Of course, everyone does this – it’s a game of tactics. The issue, though, is that this sort of strange test writing behavior only arises when you’re competing on stats – you want your code to look the best that it can. It’s only when you divorce yourself from competition, and take a step back, that you can truly start to understand what it is that’s most important to users and what’s a true performance problem.

A couple examples:

In jQuery there’s the CSS 3 ~ selector. A stupid, stupid, worthless selector. No one uses it I wish it would go away. I had actually removed this selector from jQuery as no one used it, it wasn’t missed in the slightest. (I’ve removed a huge number of others as well – no one has ever complained about the missing :nth-last-of-type pseudo-selector.) However, once libraries started to compete (I think it was ExtJS that first released a speed test that compared us on ‘~’) users lose. They get bloat (an extra feature that no one used), they lose performance elsewhere (on the overhead of supporting the extra selector), and it takes away from the time of the developers to resolve. In the end I re-implemented the ~ selector to meet this non-existent demand – and later on had to dedicate time to improving its speed (again, for the speed-test-suite-induced demand) – all instead of working more important things like bug fixes.

In Firefox, there’s been a lot of analysis of different speed test suites (including the one produced by yours truly) and attempts to figure out what tests are actually meaningful. One such test was looking at sorting an array of integers using .sort(). Specifically, however, this test was fundamentally flawed, observe:

var a = [];
for (i = 0; i < 6; i++)
  a.push(parseInt(1000 * Math.random()));
a.sort();
// => [1, 11, 3, 5, 7, 9]

Notice the problem? This test was using the default .sort() method which sorts the contents of the array as if they were strings. This is a nonsense test. There is virtually no use use case for doing performance testing of comparing-integers-as-strings. And yet there it is; confusing users and wasting developer time.

I could go on-and-on talking about really bad JavaScript performance tests (like the one that tested the speed of animating elem.style.x and elem.style.y – neither of which are actual CSS properties). However, it’s probably more important to look towards the future. What exactly is a good test of JavaScript performance? It’s one that encapsulates real-world user activity and analyzes actual application responsiveness to user interaction. This is incredibly difficult to both quantify and analyze.

Now, it’s really easy to test stuff like sorting and looping – that requires no user interaction whatsoever. But how do you objectively test the speed of things like “the user clicks this button and this div appears” or “how smooth is the animation of this div from point A to point B”? It isn’t completely clear how to make this happen – especially in an unbiased way. For example, different forms of event triggering could be used to simulate user clicks but do those properly simulate an actual user click? Do they happen faster? slower? What if a browser’s event trigger system is quite slow but their normal UI experience is excellent? It’s not clear what the right answer is, but no one has solved it yet. This is something that I hope to be looking into in the upcoming weeks and months.

The problem really boils down to a matter of “mircotests” in comparison to “real world” tests. Effectively running a single test hundreds, or thousands, of times trying to get a good statistical result for analysis. This is ripe for error, cheating, and general unfeasibility. When was the last time you did document.getElementById("test") back-to-back 500 times or $("#test") for that matter? You didn’t. Any sane person would store the result in a variable and access it again later. The only thing that a test like that does is encourage library, or browser, authors to provide bias for unrealistic tests. Is the overhead of a caching system worth it if, in reality, there are virtually no cache hits? It doesn’t matter, though, since competitive testing can lead people to implement unnecessary systems like these, purely for the sake of stats.

Now microtests do have their place – but that place should not be one of public competition but of personal introspection. The first step to competitive performance analysis should always focus on the users. It should be all about what the users are actually doing in their day-to-day browsing. Only after you’ve identified problem tests that you’d like to improve do you move to microtests. Since they don’t serve as a good basis for public comparison (leading to unrealistic cheating, etc.) it’s best to keep tests like those internal. Once you can do that they can become quite useful. User clicks a link and the animation is choppy? How’s our timer performance? CSS property manipulation? Closure speed? DOM element accesses? All of these are the type of things that microtests were designed for – looking into the root cause of the real world problems.

In summary, there’s three things that I think are really important about performance testing:

Use competition to light a fire. Competitive performance testing has its place. Personally, seeing poor performance results for my code makes me angry – which is good, because it gets me excited about improving the speed of my code. This is good as, theoretically, the users will benefit in the end.

Test real world code. Competitive, or even public, testing should strive to analyze real world code. Striving to simulate a user’s natural experience as closely as possible should be the ultimate goal. It’s only when you get to this point do you no longer cheat yourself, others, and especially – your users.

Analyze performance against yourself. You should be your greatest enemy. Remember that last release of your code? It should be your goal to make it look as bad as possible. In the end your users will be able to see actual results and competition will be healthy (no one gets hurt) and clean (there’s no incentive to cheat).

Here’s looking forward to a super-fast JavaScript future.

Posted: January 17th, 2008

Subscribe for email updates

9 Comments (Show Comments)

Guy Fraser (January 17, 2008 at 1:17 am)

I’d like to see real world tests such as a fairly normal web page with a defined task: “Find the best way for your library to make XYZ happen”

This would do two things:

1. Allow libraries to show off their best way of doing the task

2. Provide excellent examples for people to use in their own sites

Speed would be one aspect of it, but it would also allow comparison of ideas, code legibility, ease of use, etc. It would also generate vital feedback for library developers as to where their library is falling short of real-world needs.
Alexandre Plennevaux (January 17, 2008 at 4:07 am)

I think the problem is inherent to the will to quantify js performance, when – as you mentioned – wha matters is the end user feeling of responsiveness of the UI and the interactions.

So i wonder if a good test case would be to set a specific online application/Website dedicated to javascript library benchmarking, where users are asked to perform a set of actions and via a simple drop down menu, switch the js library that works in the background.
Performance times could be of course tracked, but also the user’s machine (transparently please), and after each test, he would be asked to rate on a scale of 5 from ‘apocalyptic’ to ‘paradisiac’ things like feeling of responsiveness, etc. So there you collect human subjective perception alongside machine performance.
Even better, this would be more refined for users, where they must enter their PC bardware config and OS, etc. to make things easy, provide a java plugin that scans your systems so user doesn’t have to seek the info itself.

I would first only open this system to developers, and when refined, to the Wild Wild Net.

just my 2 eurocent.

Alexandre
Ryan Breen (January 17, 2008 at 6:02 am)

Guy: I love that idea, and I think John is the right guy to get it off the ground.
John Resig (January 17, 2008 at 10:40 am)

@Guy & Ryan: Absolutely that would be the best goal. We looked at some of those (at jQuery) in the past however it’s incredibly difficult to gauge and can still have a lot of bias. For example, let’s say that a particular library doesn’t have very good element cloning support (as a random example) so the user is forced to use the browser’s native element cloning. Two things happen: 1) This completely circumvents the need for the library and forces the user to interact with browser bugs. 2) That library will then be “faster” as it, literally, has no overhead.

While it’s easy to quantify matters of speed in an absolute number it’s near-impossible to quantify clarity in the same manner (# of lines of code?). Libraries, in this respect, are much harder to test than browsers. Browsers have to do well regardless of what’s thrown at them, whereas with libraries, you can always dip back to the native level to gain extra speed.

@Alexandre: Honestly? I’m not sure I trust user feedback. I like being able to leave things in absolute terms that can’t be influenced by bias. A good example test is the one produced by the PBWiki team. They look at the loading speed of different libraries, which can’t be easily tricked as it’s being run in a cohesive suite, together. I think that would be a better situation for both users and those that are being tested (either libraries or browsers).
Wade Harrell (January 17, 2008 at 12:36 pm)

I feel a good starting point for getting realistic numbers would be a large sample dataset in JSON and XML formats. Perhaps a mock version control log with data about users, departments, checked in files, version numbers, comments, # of checkins, # of lines of code changed, etc. That is sample data most anyone reading this would understand ;)

With data in hand library builders could run an automated test like:

1. Start with a blank page.
2. Load the data file (this time is tossed out as we are not measuring net speed)
3. Generate a tabular display of the data with predetermined columns
4. Sort a string column
5. Sort a date column (displayed for the appropriate locale)
6. Sort a numeric column
7. Window a draggable, closeable, display of totals for each numeric column
8. Window a draggable, closeable, display of the average for each numeric column
9. Highlight each value that is above the average for its column

The numbers from that test, broken down by data source and browser, could be quite telling. The library developer would be free to interpret tabular and windowing however they want; tweaking it to get the best possible performance. This would be a base level test, and in addition to raw time other measurements could be made, such as lines of code needed or file footprint and usage of non-library native JavaScript.

Obviously some libraries may be better suited to others ways of viewing the data, and it would be good to allow for others tests to be outlined using the same data sources.

I am sitting here thinking about this, and already thinking about what would be involved in setting up the site for it ;)
timothy (January 17, 2008 at 6:35 pm)

I have a computationally intense JavaScript application that uses a bit of jQuery, some flot charting, but mostly does a lot of statistics, sorting (actual numeric sorting), lots of array man manipulation, etc.

Much to my surprise, this app, which was mostly developed in Firefox, is crazy fast in Safari. (Less surprisingly, it’s crazy slow in IE.) I’ve not read anything anywhere that would give me the idea that Safari’s JS interpreted is so frigging fast.

That’s the difference between test cases and real-world cases.

I wish I could run Firebug in Safari so I could compare the results of Profiling my application.
brito (January 17, 2008 at 7:07 pm)

@timothy: it’s somewhat simple to create a cross-browser profiling extension using aspects in javascript.

In a nutshell, you redefine each function that you’re interested in as a wrapper for itself, adding logging, timing and counters. Of course this adds overhead to your code, but now you can compare results for your application across browsers.

I’ll look for code that I wrote for that purpose and follow up with something more useful than my rant.
timothy (January 17, 2008 at 10:36 pm)

That sounds pretty nifty.
Alexandre Plennevaux (January 22, 2008 at 6:23 am)

Just stumbled on this, may be of interest, sunSpider javascript benchmark: it only test the core javascript engine, may be it could be adapted to test libraries ?
http://www.codinghorror.com/blog/archives/001023.html

Comments are closed.
Comments are automatically turned off two weeks after the original post. If you have a question concerning the content of this post, please feel free to contact me.