Blog


Deep Tracing of Internet Explorer

After reading a recent post by Steve Souders concerning a free tool called dynaTrace Ajax, I was intrigued. It claimed to provide full tracing analysis of Internet Explorer 6-8 (including JavaScript, rendering, and network traffic). Giving it a try I was very impressed. I tested against a few web sites but got the most interesting results running against the JavaScript-heavy Gmail in Internet Explorer 8.

I typically don't write about most performance analysis tools because, frankly, most of them are quite bland and don't provide very interesting information or analysis. dynaTrace provides some information that I've never seen before - in any tool on any browser.

dynaTrace Ajax works by sticking low-level instrumentation into Internet Explorer when it launches, capturing any activity that occurs - and I mean virtually any activity that you can imagine. I noticed very little slow down when running the browser in tracing mode (although it's sometimes hard to tell, considering the browser). However all of the tracing is recorded and saved for later, making it easy to record sessions for later analysis.

dynaTrace Ajax

Above is the result of a recorded session, logging in to Gmail, reading a mail, and logging back out again. All aspects of the session are saved: Network requests, JavaScript source, all DOM events, etc. I had a hard time finding information that wasn't saved by the tool.

This is the full timeline view of loading a single the Gmail inbox. All network traffic, JavaScript parsing and execution, browser events, and CPU load can be seen.

You can select a segment of the timeline and get a view that looks like the following:

In the above you can see a clearer picture of the exact interactions happening. A phenomenal amount of inline JavaScript execution followed by page layout calculation coinciding with loading of some data over the network. You can mouse over the individual blocks on the timeline to get more information (such as if the JavaScript execute was the result of a timer or what Ajax requests were firing to cause the network traffic). Additionally you can click the blocks to dive in and take a deeper view of the trace.

Digging in to the execution of an XMLHttpRequest on a page we get to see some of the full execution stack trace - and this is where the tools starts to become really interesting. The tool is capable of tracing across JavaScript, through the native XMLHttpRequest, through the network request, and back to the handler that fires when the request is done. This is phenomenal. This is the first tool that I've seen that's capable of tracing through native methods to give you a picture of what activity triggers which actions and the complete ramifications of what happens (in both CPU usage and execution time).

Note that in the stack trace view you can click any piece of code and see its location anywhere inside the source code (and this even works after you've already closed the browser and have moved on - all source code is saved for later analysis).

While it's interesting to trace through code to look for problems the bigger question is usually: Where are slowdowns occurring? This is where the HotPath view comes into play:

This looks like a typical execution count view - like the one that you might see in Internet Explorer's built in tool or in Firebug - except for one important point: This view includes JavaScript parsing and layout rendering times. This is huge! No other tool provides information on how long it takes to parse all the JavaScript code on your site or how long it takes to do all the rendering. Clicking those entries allows you to see a breakdown of every time JavaScript was parsed or a layout was rendered - from which you can trace back to get even more information about what caused those actions. I don't want to seem too excited but I really am, this is just an incredible amount of information - and it gets even better:

Not only can you see the execution count for your defined JavaScript methods but you can also see execution time for the built-in DOM methods! Wondering what native method calls are slowing down your application? Wonder no more. From the HotSpot view you can filter by DOM or regular JavaScript and see exactly where execution time is going and what methods are so slow.

dynaTrace provides an additional view, called PurePath that attempts to figure out problematic scripts:

Just another way to try and get a full picture as to where your application is slowing down and what may be causing the problem.

In all I'm hugely impressed with this (free!) tool and am already using it to do more testing and performance analysis on my code. I don't think any browser has ever had a tool capable of this type of analysis, let alone Internet Explorer 6 and 7, which are still a very real part of any developer's workflow.

I chatted with some of the dynaTrace guys and asked them to add in memory profiling and to support more browsers. If they can provide this quality of instrumentation for CPU and execution time I hope they can do the same for memory usage, the next un-tapped realm of JavaScript performance analysis.

Tags: analysis, performance, tracing, ie, tools

Browser Page Load Performance

Steve Souders is currently doing more to improve the performance of web pages and web browsers than anyone else out there. When he worked at Yahoo! he was responsible for YSlow (a great tool for measuring ways to improve the performance of your site) and he wrote the book on improving page performance: High Performance Web Sites. Now he works for Google but much of what he's up to is the same: Making web pages load faster.

I've been really excited about one of his recent project releases: UA Profiler. The profiler is a tool that you can run in your browser to determine the status of a number of network-performance-specific features that tie heavily to browser page load performance.

Here's a look at the current breakdown:

We can see Firefox 3.1 taking a lead, fixing 9 out of 11 of the issues tested for. Firefox 3, Chrome, and Safari 4 all come after with 8 fixed. Firefox 2, Safari 3.1, and IE 8 next at 7. Those numbers help to give you an overall feel of the page load performance that you'll see in a browser. (Naturally these tests don't take any rendering or JavaScript performance numbers into account but network performance generally trumps their total runtime anyway.)

Information about network performance is important for two reasons:

  1. It informs browser vendors as to the quality of their browser. A browser fixing any of the points specified by the test will yield faster page loads.
  2. It informs web site developers as to the problems they should be taking into consideration when developing a site. For example if a browser they support doesn't handle simultaneous stylesheet downloading perhaps their page should be re-worked.

The tests themselves can be broken down into a couple categories (Steve explains them all, in detail in the FAQ):

Network Connections

Two big things are tested here: The number of simultaneous connections that can be opened to a hostname (sub-domains count as different hostnames) and how many connections can be opened to any number of hostnames, simultaneously. These numbers can give you a good indicator of how many parallel downloads can occur (most commonly seen for downloading multiple images, simultaneously).

Additionally there is a check to see if the browser supports Gzip compression. The results aren't too exciting here as all modern browsers support Gzip compression at this point.

Parallel Downloads

All browsers are capable of downloading images in parallel (multiple images downloading simultaneously) but what about other resources (like scripts or stylesheets)?

Unfortunately it's much harder to get scripts and stylesheets to load in parallel since their contents may dramatically change the rest of the page. The loading of these resources occur in three steps:

  1. Downloading (can be parallelized)
  2. Parsing
  3. Execution

The load order breaks down like so (sort of an advanced game of rock-paper-scissors): Scripts prevent other scripts from parsing and executing, stylesheets prevent scripts from parsing and executing.

It's been hard for browsers to implement the parallelization of script downloading since scripts are capable of changing the contents of the page - and possibly removing adding new scripts or stylesheets to the page. Because of this browsers are starting to get better at opportunistically looking ahead in the document and pre-loading stylesheets and scripts - even if their actual use may be delayed.

Changes in this area will yield some of the largest benefits to browser page load performance, going forward, as it's still one of the most untapped areas of improvement.

Caching

While all modern browsers support caching of resources, caching of page redirects is much less common. For example, consider the case where a user types in "http://google.com/" - Google redirects the user to "http://www.google.com/" but only a couple browsers cache that redirect as to not retry it later.

A similar case of redirect caching occurs for resources, for example with stylesheets, images, or scripts. Since these occur much more frequently it becomes that much more important for browsers to cache every action that they can.

Prefetching

This is part of the HTML 5 specification and allows for pages to specify resources which should be opportunistically downloaded in case they should be used in the future (the common example of image rollovers could be used here).

There's a full page describing how to use them on the Mozilla developer wiki but it isn't that hard to get started. It's as simple as including a new link element in the top of your site:

<link rel="prefetch" href="/images/big.jpeg">

And that resource will be downloaded preemptively.

Inline Images

The final case that the profiler tests for is the ability of a browser to support inline images using a data: URI. Data URIs give developers the ability to include the image data directly within the page itself. While this saves an extra HTTP request it's important to note that the resource will not be cached (at least not as external resource - it may be cached as part of the complete page). The use of this technique will vary on a case-by-case basis but having a browser support it is absolutely important.


Going forward it will become increasingly important to have publicly-visible tests like the UA Profiler that are able to encourage browser vendors to act quicker at implementing critical browser functionality. Anything that's able to, even indirectly, improve the performance of the browsing experience for users of the web is absolutely critical, in my book.

Tags: browser, performance, network

Fonts, Podcast, Performance

Three tidbits from this week:

I published an article on W3C Web Fonts at Ars Technica the other day.

I did another Open Web Podcast this week, this time with Ryan Steward of Adobe.

On Thursday I gave a talk at the Web Experience Forum here in Boston talking about Performance Improvements that are coming in new browsers.


Tags: javascript, fonts, podcast, performance

JavaScript Performance Rundown

A new JavaScript Engine has hit the pavement running: The new V8 engine (powering the brand-new Google Chrome browser).

There are now a ton of JavaScript engines on the market (even when you only look at the ones being actively used in browsers):

  1. JavaScriptCore: The engine that powers Safari/WebKit (up until Safari 3.1).
  2. SquirrelFish: The engine used by Safari 4.0. Note: The latest WebKit nightly for Windows crashes on Dromaeo, so it's passed for now.
  3. V8: The engine used by Google Chrome.
  4. SpiderMonkey: The engine that powers Firefox (up to, and including, Firefox 3.0).
  5. TraceMonkey: The engine that will power Firefox 3.1 and newer (currently in nightlies, but disabled by default).
  6. Futhark: The engine used in Opera 9.5 and newer.
  7. IE JScript: The engine that powers Internet Explorer.

There have, already, been a number of performance tests run on the above browsers - and a few of those runs have also included the new Chrome browser. It's important to look at these numbers and try and gain some perspective on what the tests are testing and how those numbers relate to actual web page performance.

We have three test suites that we're going to look at:

  • SunSpider: The popular JavaScript performance test suite released by the WebKit team. Tests only the performance of the JavaScript engine (no rendering or DOM manipulation). Has a wide variety of tests (objects, function calls, math, recursion, etc.)
  • V8 Benchmark: A benchmark built by the V8 team, only tests JavaScript performance - with a heavy emphasis on testing the performance of recursion.
  • Dromaeo: A test suite built by Mozilla, tests JavaScript, DOM, and JavaScript Library performance. Has a wide variety of tests, with the majority of time spent analyzing DOM and JavaScript library performance.

SunSpider

Let's start by taking a look at some results from WebKit's SunSpider test (which covers a wide selection of pure-JavaScript functionality). Here is the break down:

We see a fairly steady curve, heading down to Chrome (ignoring the Internet Explorer outliers). Chrome is definitely the fastest in these results - although the results from the new TraceMonkey engine aren't included.

Brendan Eich pulled together a comparison, last night, of the latest TraceMonkey code against V8.

We already see TraceMonkey (under development for about 2 months) performing better than V8 (under development for about 2 years).

The biggest thing holding TraceMonkey back, at this point, is its recursion tracing. As of this moment no tracing is done across recursive calls (which puts TraceMonkey as being about 10x slower than V8 at recursion). Once recursion tracing lands for Firefox 3.1 I'll be sure to revisit the above results.

Google Chrome Benchmark

The Chrome team released their own benchmark for analyzing JavaScript performance. This includes a few new tests (different from the SunSpider ones) and is very recursion-heavy:

We can see Chrome decimating that other browsers on these tests. Its debatable as to how representative these tests are of real browser performance, considering the hyper-specific focus on minute features within JavaScript.

Note TraceMonkey performing poorly: It's unable to benefit from any of the tracing due to the lack of recursion tracing (as explained above).

Dromaeo with DOM

Finally, let's take a more holistic look at JavaScript performance. I've been working on the Dromaeo test suite, adding in a ton of new DOM and JavaScript library tests. This assortment provides a much stronger look at how browsers might perform under a normal web browsing situation.

Considering that most web pages are being held back by the performance of the DOM (think table sorters and the like) and not, necessarily, the performance of JavaScript (games, graphics) it's important to look at these particular details for extended analysis.

The results of a run against the JavaScript, DOM, and library tests (thanks to Asa Dotzler for helping me run the tests):

(No results for IE were provided as the browser crashes when running the tests, unfortunately - also I had trouble getting the WebKit nightlies, with Squirrelfish, to run on Windows, see bug 20626.)

We see a very different picture here. WebKit-based engines are absolutely ahead - but Chrome is lagging behind the latest release of Safari. And while there is a small speed improvement while using TraceMonkey, over regular Firefox, the full potential won't be unlocked until tracing can be performed over DOM structures (which it is currently incapable of - may not be ready until Firefox 3.2 or so).

One thing is clear, though: The game of JavaScript Performance leapfrog is continuing. With another JavaScript engine in the mix that rapid iteration will only have to increase - which is simply fantastic for end users and application developers.

Update: I've posted results for Safari 4.0 wherever I could.

Tags: javascript, performance

Deep Profiling jQuery Apps

This evening I was playing around with the idea of profiling jQuery applications - trying to find a convenient way to completely analyze all the code that is being executed in your application.

I've come up with a plugin that you can inject into a jQuery site that you own and see how the performance breaks down method-by-method.

Here's how you can try this plugin on your own site:

Step 1: Copy site HTML, add base href, add plugin.

For example, Github.com uses jQuery for a few basic effects and pieces of interaction (they use considerably more on pages beyond the homepage).

I took a copy of their page, added a <base href> to the top and injected the profiling plugin giving a resulting test page.

Before:

  <head>
    <meta http-equiv="content-type" content="text/html;charset=UTF-8" />
    ...
    <script src="/javascripts/bundle.js"></script>
    ...
  </head>

After:

  <head>
    <meta http-equiv="content-type" content="text/html;charset=UTF-8" />
    <base href="http://github.com/"/>
    ...
    <script src="/javascripts/bundle.js"></script>
    <script src="http://dev.jquery.com/~john/plugins/profile/jquery-profile.js"></script>
    ...
  </head>

Step 2: Use the site normally.

Use the site as you normally would. Load it up, click around - perform normal interactions. In the case of the Github.com page I let it load, scrolled down, and clicked on one of the demo images - which caused an overlay to appear. I then closed the X on the overlay and let it hide.

Step 3: View data.

In your console type jQuery.displayProfile(); and scroll down to the bottom of the page. You should see something like the following:

and here's a sample data dump:

Event: ready (165ms)
% (ms) Method in out
0.0% 0 jQuery(#document) 1
0.0% 0   .bind("ready", function()) 1 1
3.6% 6 jQuery("a[hotkey]")
0.0% 0   .each(function())
0.0% 0 jQuery(#document) 1
0.0% 0   .bind("keydown.hotkey", function()) 1 1
0.0% 0 jQuery("#triangle")
0.0% 0 jQuery("body") 1
1.2% 2   .append("<div id="triangle" style="position: absolute; display: none;"> </div>") 1 1
0.6% 1 jQuery("#repo_menu .active")
3.6% 6 jQuery(".jshide")
0.0% 0   .hide()
1.2% 2 jQuery(".toggle_link")
0.0% 0   .click(function())
0.6% 1 jQuery("#beta :text")
0.0% 0   .focus(function())
0.6% 1 jQuery("#beta form")
0.0% 0   .ajaxForm(function())
1.2% 2 jQuery(".hide_alert")
0.0% 0   .click(function())
0.0% 0 jQuery("#login_field")
0.0% 0   .focus()
0.0% 0 jQuery("#versions_select")
0.0% 0   .change(function())
1.2% 2 jQuery("a[rel*=facebox]") 3
17.6% 29   .facebox() 3 3
Event: load (1ms)
% (ms) Method in out
Event: click (29ms)
% (ms) Method in out
6.9% 2 jQuery("#facebox .loading")
3.4% 1 jQuery("facebox_overlay")
3.4% 1 jQuery("body") 1
6.9% 2   .append("<div id="facebox_overlay" class="facebox_hide"></div>") 1 1
0.0% 0 jQuery("#facebox_overlay") 1
6.9% 2   .hide() 1 1
3.4% 1   .addClass("facebox_overlayBG") 1 1
0.0% 0   .css("opacity", 0) 1 1
3.4% 1   .click(function()) 1 1
6.9% 2   .fadeIn(200) 1 1
3.4% 1 jQuery("#facebox .content") 1
3.4% 1   .empty() 1 1
3.4% 1 jQuery("#facebox .body") 1
0.0% 0   .children() 1 2
10.3% 3   .hide() 2 2
0.0% 0   .end() 2 1
6.9% 2   .append("<div class="loading"><img src="/facebox/loading.gif"/></div>") 1 1
0.0% 0 jQuery("#facebox") 1
0.0% 0 jQuery({...}) 1
3.4% 1   .width() 1
0.0% 0   .css({...}) 1 1
6.9% 2   .show() 1 1
0.0% 0 jQuery(#document) 1
0.0% 0   .bind("keydown.facebox", function()) 1 1
0.0% 0 jQuery(#document) 1
3.4% 1   .trigger("loading.facebox") 1 1
Event: beforeReveal.facebox (1ms)
% (ms) Method in out
Event: click (6ms)
% (ms) Method in out
16.7% 1 jQuery(#document) 1
66.7% 4   .trigger("close.facebox") 1 1
Event: close.facebox (3ms)
% (ms) Method in out

This quick table of data should be able to provide you with some interesting information about what's happening in your code. The result is still incredibly basic (really only providing the most basic level of jQuery method introspection) but it definitely shows some merit.

If you wish to create a different view for the data you can access the raw data structure by running jQuery.getProfile();.

The next stage of development for this plugin would be to reveal which methods are running inside other jQuery methods - in addition to monitoring other aspects of the application (such as timers, Ajax callbacks, etc.). I'm pleased with even this most-basic result - it gives me the ability to quickly, and easily, learn much more about a jQuery-using application.

Tags: jquery, javascript, performance

Dromaeo: JavaScript Performance Testing

Dromaeo JavaScript Performance Testing

Dromaeo is the name that I've given to the JavaScript performance test suite that I've been working on over the past couple months.

I was hoping to hold off on this release for another week or two, while I finished up some final details, but since it's been discovered, and about to hit the Digg front page, there isn't a whole lot that I can do to stop it.

There's a ton of details concerning how it works, and how to use it, on the Dromaeo wiki page. I won't go through too much of it here, but it should clarify most question there.

Probably the most pressing question that'll be encountered (outside of what is answered on the wiki page) is "What is the relation of Dromaeo to SunSpider?" (SunSpider being the WebKit team's JavaScript testing suite).

Right now I'm working very closely with all the browser vendors to make sure that we have a common-ground test suite that is both highly usable and statistically sound (not to mention providing results that are universally interesting). There are a number of outstanding concerns that've been raised by users of the suite, along with a number of concerns that've already been rectified - again, all of this is clarified on the Dromaeo wiki page. It's of the utmost concern that this suite be as applicable as possible. It's very likely that the core suite will be moving to a common working ground where all browser vendors can work on it.

I especially want to thank Allan Branch of LessEverything who provided the awesome design for the site. It's like he tapped into my brain and produced exactly what I wanted - without knowing even it. I highly recommend them, if you have design work that needs to be done.

Tags: testing, performance, javascript, browsers, mozilla

Firefox 3 Memory Use

Mozilla developer 'Pavlov' wrote up some extensive details on memory use in Firefox 3. I highly recommend that you check it out.

I borrowed some of his data and created another view of the results. For example, here's the results from Windows Vista of a number of browsers:

Note that both Safari 3 and IE 8 crashed during the test (which was a page runner which automatically opened and closed groups of web pages) so accurate numbers weren't able to be achieved for them. Some preliminary numbers show Safari 3 on a similar path to IE 7 (Stuart mentioned that IE 8 showed a similar path, as well).

It's great to see the massively-improved memory use of Firefox 3. It far excels anything that we offered, previously, and seems to best all other browsers on the platform.

Now, obviously, Windows Vista isn't the only platform available. I, personally, use OS X and am interested to see the memory numbers there as well. One portion of Stuart's blog post that I found to be particularly interesting was the discussion of measuring cross-platform browser memory use - and how difficult it can be. Here's how he explains it:

The short summary is Windows Vista (Commit Size) and Linux (RSS) provide pretty accurate memory measurement numbers while Windows XP and MacOS X do not.

...

On Mac, If you look at Activity Monitor it will look like we’re using more memory than we actually are. Mac OS X has a similar, but different, problem to Windows XP. After extensive testing and confirmation from Apple employees we realized that there was no way for an allocator to give unused pages of memory back while keeping the address range reserved.. (You can unmap them and remap them, but that causes some race conditions and isn’t as performant.) There are APIs that claim to do it (both madvise() and msync()) but they don’t actually do anything. It does appear that pages mapped in that haven’t been written to won’t be accounted for in memory stats, but you’ve written to them they’re going to show as taking up space until you unmap them. Since allocators will reuse space, you generally won’t have that many pages mapped in that haven’t been written to. Our application can and will reuse the free pages, so you should see Firefox hit a peak number and generally not grow a lot higher than that.

I think it's great to see these numbers come in. In many ways Firefox 3 is going to be a very different browser from its previous instantiations. I've, personally, been using it as my primary browser for a while now and have enjoyed the increased performance. It really does feel - and really even look - like a whole new browser.

If you'd like to try Firefox 3, especially without disturbing your current setup, it's really easy - and I've even written up instructions to help you out.

Tags: firefox, performance, browsers, memory

JavaScript Performance Stack

Something that's frequently befuddled is the differentiation between where JavaScript is executing and where performance hits are taking place. The difficulty is related to the fact that many aspects of a browser engine are reliant upon many others causing their performance issues to be constantly intertwined. To attempt to explain this particular inter-relationship I've created a simplified diagram:

To break it down, there's a couple key areas:

  • JavaScript - This represents the core JavaScript engine. This contains only the most basic primitives (functions, objects, array, regular expression, etc.) for performing operations. Unto itself it isn't terribly useful. Speed improvements here have the ability to affect all the various object models.
  • Object Models - Collectively these are the objects introduced into the JavaScript runtime which give the user something to work with. These objects are generally implemented in C++ and are imported into the JavaScript environment (for example XPCOM is frequently used by Mozilla to achieve this). There are numerous security checks in place to prevent malicious script from accessing these objects in unintended ways (which produces an unfortunate performance hit). Speed improvements generally come in the way of improving the connecting layer or from removing the connecting layer altogether.
    • XMLHttpRequest and Timers - These are implemented in C++ and introduced into the JavaScript engine at runtime. The performance of these elements only indirectly affect rendering performance.
    • Browser - This represents objects like 'window', 'window.location', and the like. Improvements here also indirectly affect rendering performance.
    • DOM and CSS - These are the object representations of the site's HTML and CSS. When creating a web application everything will have to pass through these representations. Improving the performance of the DOM will affect how quickly rendering changes can propagate.
  • Parsing - This is the process of reading, analyzing, and converting HTML, CSS, XML, etc. into their native object models. Improvements in speed can affect the load time of a page (with the initial creation of the page's contents).
  • Rendering - The final painting of the page (or any subsequent updates). This is the final bottleneck for the performance of interactive applications.

What's interesting about all of this is that a lot of attention is being paid to the performance of a single layer within the browser: JavaScript. The reality is that the situation is much more complicated. For starters, improving the performance of JavaScript has the potential to drastically improve the overall performance of a site. However there still remain bottlenecks at the DOM, CSS, and rendering layers. Having a slow DOM representation will do little to show off the improved JavaScript performance. For this reason when optimization is done it's frequently handled throughout the entire browser stack.

Now what's also interesting is that the analysis of JavaScript performance can, also, be affected by any of these layers. Here are some interesting issues that arise:

  • JavaScript performance outside of a browser (in a shell) is drastically faster than inside of it. The overhead of the object models and their associated security checks is enough to make a noticeable difference.
  • An improperly coded JavaScript performance test could be affected by a change to the rendering engine. If the test were analyzing the total runtime of a script a degree of accidental rendering overhead could be introduced as well. Care needs to be taken to factor this out.

So while the improvement of JavaScript performance is certainly a critical step for browser vendors to take (as much of the rest of the browser depends upon it) it is only the beginning. Improving the speed of the full browser stack is inevitable.

Tags: javascript, browsers, performance

· « Previous entries

JavaScript Books

Secrets of the JavaScript Ninja

JavaScript Secrets

Secret techniques of top JavaScript programmers.

Pro JavaScript Techniques

Pro JavaScript

The best techniques for professional JavaScript. Published by Apress.

Micro Updates

John Resig Twitter Updates

@jeresig

Infrequent, short, updates and links.

JavaScript Jobs



Hosting provided by: Ruby Hosting by Engine Yard