Blog
November 24th, 2008
Steve Souders is currently doing more to improve the performance of web pages and web browsers than anyone else out there. When he worked at Yahoo! he was responsible for YSlow (a great tool for measuring ways to improve the performance of your site) and he wrote the book on improving page performance: High Performance Web Sites. Now he works for Google but much of what he's up to is the same: Making web pages load faster.
I've been really excited about one of his recent project releases: UA Profiler. The profiler is a tool that you can run in your browser to determine the status of a number of network-performance-specific features that tie heavily to browser page load performance.
Here's a look at the current breakdown:
We can see Firefox 3.1 taking a lead, fixing 9 out of 11 of the issues tested for. Firefox 3, Chrome, and Safari 4 all come after with 8 fixed. Firefox 2, Safari 3.1, and IE 8 next at 7. Those numbers help to give you an overall feel of the page load performance that you'll see in a browser. (Naturally these tests don't take any rendering or JavaScript performance numbers into account but network performance generally trumps their total runtime anyway.)
Information about network performance is important for two reasons:
- It informs browser vendors as to the quality of their browser. A browser fixing any of the points specified by the test will yield faster page loads.
- It informs web site developers as to the problems they should be taking into consideration when developing a site. For example if a browser they support doesn't handle simultaneous stylesheet downloading perhaps their page should be re-worked.
The tests themselves can be broken down into a couple categories (Steve explains them all, in detail in the FAQ):
Network Connections
Two big things are tested here: The number of simultaneous connections that can be opened to a hostname (sub-domains count as different hostnames) and how many connections can be opened to any number of hostnames, simultaneously. These numbers can give you a good indicator of how many parallel downloads can occur (most commonly seen for downloading multiple images, simultaneously).
Additionally there is a check to see if the browser supports Gzip compression. The results aren't too exciting here as all modern browsers support Gzip compression at this point.
Parallel Downloads
All browsers are capable of downloading images in parallel (multiple images downloading simultaneously) but what about other resources (like scripts or stylesheets)?
Unfortunately it's much harder to get scripts and stylesheets to load in parallel since their contents may dramatically change the rest of the page. The loading of these resources occur in three steps:
- Downloading (can be parallelized)
- Parsing
- Execution
The load order breaks down like so (sort of an advanced game of rock-paper-scissors): Scripts prevent other scripts from parsing and executing, stylesheets prevent scripts from parsing and executing.
It's been hard for browsers to implement the parallelization of script downloading since scripts are capable of changing the contents of the page - and possibly removing adding new scripts or stylesheets to the page. Because of this browsers are starting to get better at opportunistically looking ahead in the document and pre-loading stylesheets and scripts - even if their actual use may be delayed.
Changes in this area will yield some of the largest benefits to browser page load performance, going forward, as it's still one of the most untapped areas of improvement.
Caching
While all modern browsers support caching of resources, caching of page redirects is much less common. For example, consider the case where a user types in "http://google.com/" - Google redirects the user to "http://www.google.com/" but only a couple browsers cache that redirect as to not retry it later.
A similar case of redirect caching occurs for resources, for example with stylesheets, images, or scripts. Since these occur much more frequently it becomes that much more important for browsers to cache every action that they can.
Prefetching
This is part of the HTML 5 specification and allows for pages to specify resources which should be opportunistically downloaded in case they should be used in the future (the common example of image rollovers could be used here).
There's a full page describing how to use them on the Mozilla developer wiki but it isn't that hard to get started. It's as simple as including a new link element in the top of your site:
<link rel="prefetch" href="/images/big.jpeg">
And that resource will be downloaded preemptively.
Inline Images
The final case that the profiler tests for is the ability of a browser to support inline images using a data: URI. Data URIs give developers the ability to include the image data directly within the page itself. While this saves an extra HTTP request it's important to note that the resource will not be cached (at least not as external resource - it may be cached as part of the complete page). The use of this technique will vary on a case-by-case basis but having a browser support it is absolutely important.
Going forward it will become increasingly important to have publicly-visible tests like the UA Profiler that are able to encourage browser vendors to act quicker at implementing critical browser functionality. Anything that's able to, even indirectly, improve the performance of the browsing experience for users of the web is absolutely critical, in my book.
Tags: browser, performance, network
18 Comments on 'Browser Page Load Performance'
October 18th, 2008
Three tidbits from this week:
I published an article on W3C Web Fonts at Ars Technica the other day.
I did another Open Web Podcast this week, this time with Ryan Steward of Adobe.
On Thursday I gave a talk at the Web Experience Forum here in Boston talking about Performance Improvements that are coming in new browsers.
Tags: javascript, fonts, podcast, performance
9 Comments on 'Fonts, Podcast, Performance'
September 3rd, 2008
A new JavaScript Engine has hit the pavement running: The new V8 engine (powering the brand-new Google Chrome browser).
There are now a ton of JavaScript engines on the market (even when you only look at the ones being actively used in browsers):
- JavaScriptCore: The engine that powers Safari/WebKit (up until Safari 3.1).
- SquirrelFish: The engine used by Safari 4.0. Note: The latest WebKit nightly for Windows crashes on Dromaeo, so it's passed for now.
- V8: The engine used by Google Chrome.
- SpiderMonkey: The engine that powers Firefox (up to, and including, Firefox 3.0).
- TraceMonkey: The engine that will power Firefox 3.1 and newer (currently in nightlies, but disabled by default).
- Futhark: The engine used in Opera 9.5 and newer.
- IE JScript: The engine that powers Internet Explorer.
There have, already, been a number of performance tests run on the above browsers - and a few of those runs have also included the new Chrome browser. It's important to look at these numbers and try and gain some perspective on what the tests are testing and how those numbers relate to actual web page performance.
We have three test suites that we're going to look at:
- SunSpider: The popular JavaScript performance test suite released by the WebKit team. Tests only the performance of the JavaScript engine (no rendering or DOM manipulation). Has a wide variety of tests (objects, function calls, math, recursion, etc.)
- V8 Benchmark: A benchmark built by the V8 team, only tests JavaScript performance - with a heavy emphasis on testing the performance of recursion.
- Dromaeo: A test suite built by Mozilla, tests JavaScript, DOM, and JavaScript Library performance. Has a wide variety of tests, with the majority of time spent analyzing DOM and JavaScript library performance.
SunSpider
Let's start by taking a look at some results from WebKit's SunSpider test (which covers a wide selection of pure-JavaScript functionality). Here is the break down:
We see a fairly steady curve, heading down to Chrome (ignoring the Internet Explorer outliers). Chrome is definitely the fastest in these results - although the results from the new TraceMonkey engine aren't included.
Brendan Eich pulled together a comparison, last night, of the latest TraceMonkey code against V8.
We already see TraceMonkey (under development for about 2 months) performing better than V8 (under development for about 2 years).
The biggest thing holding TraceMonkey back, at this point, is its recursion tracing. As of this moment no tracing is done across recursive calls (which puts TraceMonkey as being about 10x slower than V8 at recursion). Once recursion tracing lands for Firefox 3.1 I'll be sure to revisit the above results.
Google Chrome Benchmark
The Chrome team released their own benchmark for analyzing JavaScript performance. This includes a few new tests (different from the SunSpider ones) and is very recursion-heavy:
We can see Chrome decimating that other browsers on these tests. Its debatable as to how representative these tests are of real browser performance, considering the hyper-specific focus on minute features within JavaScript.
Note TraceMonkey performing poorly: It's unable to benefit from any of the tracing due to the lack of recursion tracing (as explained above).
Dromaeo with DOM
Finally, let's take a more holistic look at JavaScript performance. I've been working on the Dromaeo test suite, adding in a ton of new DOM and JavaScript library tests. This assortment provides a much stronger look at how browsers might perform under a normal web browsing situation.
Considering that most web pages are being held back by the performance of the DOM (think table sorters and the like) and not, necessarily, the performance of JavaScript (games, graphics) it's important to look at these particular details for extended analysis.
The results of a run against the JavaScript, DOM, and library tests (thanks to Asa Dotzler for helping me run the tests):
(No results for IE were provided as the browser crashes when running the tests, unfortunately - also I had trouble getting the WebKit nightlies, with Squirrelfish, to run on Windows, see bug 20626.)
We see a very different picture here. WebKit-based engines are absolutely ahead - but Chrome is lagging behind the latest release of Safari. And while there is a small speed improvement while using TraceMonkey, over regular Firefox, the full potential won't be unlocked until tracing can be performed over DOM structures (which it is currently incapable of - may not be ready until Firefox 3.2 or so).
One thing is clear, though: The game of JavaScript Performance leapfrog is continuing. With another JavaScript engine in the mix that rapid iteration will only have to increase - which is simply fantastic for end users and application developers.
Update: I've posted results for Safari 4.0 wherever I could.
Tags: javascript, performance
85 Comments on 'JavaScript Performance Rundown'
June 16th, 2008
This evening I was playing around with the idea of profiling jQuery applications - trying to find a convenient way to completely analyze all the code that is being executed in your application.
I've come up with a plugin that you can inject into a jQuery site that you own and see how the performance breaks down method-by-method.
Here's how you can try this plugin on your own site:
Step 1: Copy site HTML, add base href, add plugin.
For example, Github.com uses jQuery for a few basic effects and pieces of interaction (they use considerably more on pages beyond the homepage).
I took a copy of their page, added a <base href> to the top and injected the profiling plugin giving a resulting test page.
Before:
<head>
<meta http-equiv="content-type" content="text/html;charset=UTF-8" />
...
<script src="/javascripts/bundle.js"></script>
...
</head>
After:
<head>
<meta http-equiv="content-type" content="text/html;charset=UTF-8" />
<base href="http://github.com/"/>
...
<script src="/javascripts/bundle.js"></script>
<script src="http://dev.jquery.com/~john/plugins/profile/jquery-profile.js"></script>
...
</head>
Step 2: Use the site normally.
Use the site as you normally would. Load it up, click around - perform normal interactions. In the case of the Github.com page I let it load, scrolled down, and clicked on one of the demo images - which caused an overlay to appear. I then closed the X on the overlay and let it hide.
Step 3: View data.
In your console type jQuery.displayProfile(); and scroll down to the bottom of the page. You should see something like the following:
and here's a sample data dump:
Event: ready (165ms)
| % |
(ms) |
Method |
in |
out |
| 0.0% |
0 |
jQuery(#document) |
|
1 |
| 0.0% |
0 |
.bind("ready", function()) |
1 |
1 |
| 3.6% |
6 |
jQuery("a[hotkey]") |
|
|
| 0.0% |
0 |
.each(function()) |
|
|
| 0.0% |
0 |
jQuery(#document) |
|
1 |
| 0.0% |
0 |
.bind("keydown.hotkey", function()) |
1 |
1 |
| 0.0% |
0 |
jQuery("#triangle") |
|
|
| 0.0% |
0 |
jQuery("body") |
|
1 |
| 1.2% |
2 |
.append("<div id="triangle" style="position: absolute; display: none;"> </div>") |
1 |
1 |
| 0.6% |
1 |
jQuery("#repo_menu .active") |
|
|
| 3.6% |
6 |
jQuery(".jshide") |
|
|
| 0.0% |
0 |
.hide() |
|
|
| 1.2% |
2 |
jQuery(".toggle_link") |
|
|
| 0.0% |
0 |
.click(function()) |
|
|
| 0.6% |
1 |
jQuery("#beta :text") |
|
|
| 0.0% |
0 |
.focus(function()) |
|
|
| 0.6% |
1 |
jQuery("#beta form") |
|
|
| 0.0% |
0 |
.ajaxForm(function()) |
|
|
| 1.2% |
2 |
jQuery(".hide_alert") |
|
|
| 0.0% |
0 |
.click(function()) |
|
|
| 0.0% |
0 |
jQuery("#login_field") |
|
|
| 0.0% |
0 |
.focus() |
|
|
| 0.0% |
0 |
jQuery("#versions_select") |
|
|
| 0.0% |
0 |
.change(function()) |
|
|
| 1.2% |
2 |
jQuery("a[rel*=facebox]") |
|
3 |
| 17.6% |
29 |
.facebox() |
3 |
3 |
Event: load (1ms)
Event: click (29ms)
| % |
(ms) |
Method |
in |
out |
| 6.9% |
2 |
jQuery("#facebox .loading") |
|
|
| 3.4% |
1 |
jQuery("facebox_overlay") |
|
|
| 3.4% |
1 |
jQuery("body") |
|
1 |
| 6.9% |
2 |
.append("<div id="facebox_overlay" class="facebox_hide"></div>") |
1 |
1 |
| 0.0% |
0 |
jQuery("#facebox_overlay") |
|
1 |
| 6.9% |
2 |
.hide() |
1 |
1 |
| 3.4% |
1 |
.addClass("facebox_overlayBG") |
1 |
1 |
| 0.0% |
0 |
.css("opacity", 0) |
1 |
1 |
| 3.4% |
1 |
.click(function()) |
1 |
1 |
| 6.9% |
2 |
.fadeIn(200) |
1 |
1 |
| 3.4% |
1 |
jQuery("#facebox .content") |
|
1 |
| 3.4% |
1 |
.empty() |
1 |
1 |
| 3.4% |
1 |
jQuery("#facebox .body") |
|
1 |
| 0.0% |
0 |
.children() |
1 |
2 |
| 10.3% |
3 |
.hide() |
2 |
2 |
| 0.0% |
0 |
.end() |
2 |
1 |
| 6.9% |
2 |
.append("<div class="loading"><img src="/facebox/loading.gif"/></div>") |
1 |
1 |
| 0.0% |
0 |
jQuery("#facebox") |
|
1 |
| 0.0% |
0 |
jQuery({...}) |
|
1 |
| 3.4% |
1 |
.width() |
1 |
|
| 0.0% |
0 |
.css({...}) |
1 |
1 |
| 6.9% |
2 |
.show() |
1 |
1 |
| 0.0% |
0 |
jQuery(#document) |
|
1 |
| 0.0% |
0 |
.bind("keydown.facebox", function()) |
1 |
1 |
| 0.0% |
0 |
jQuery(#document) |
|
1 |
| 3.4% |
1 |
.trigger("loading.facebox") |
1 |
1 |
Event: beforeReveal.facebox (1ms)
Event: click (6ms)
| % |
(ms) |
Method |
in |
out |
| 16.7% |
1 |
jQuery(#document) |
|
1 |
| 66.7% |
4 |
.trigger("close.facebox") |
1 |
1 |
Event: close.facebox (3ms)
This quick table of data should be able to provide you with some interesting information about what's happening in your code. The result is still incredibly basic (really only providing the most basic level of jQuery method introspection) but it definitely shows some merit.
If you wish to create a different view for the data you can access the raw data structure by running jQuery.getProfile();.
The next stage of development for this plugin would be to reveal which methods are running inside other jQuery methods - in addition to monitoring other aspects of the application (such as timers, Ajax callbacks, etc.). I'm pleased with even this most-basic result - it gives me the ability to quickly, and easily, learn much more about a jQuery-using application.
Tags: jquery, javascript, performance
29 Comments on 'Deep Profiling jQuery Apps'
April 11th, 2008
Dromaeo is the name that I've given to the JavaScript performance test suite that I've been working on over the past couple months.
I was hoping to hold off on this release for another week or two, while I finished up some final details, but since it's been discovered, and about to hit the Digg front page, there isn't a whole lot that I can do to stop it.
There's a ton of details concerning how it works, and how to use it, on the Dromaeo wiki page. I won't go through too much of it here, but it should clarify most question there.
Probably the most pressing question that'll be encountered (outside of what is answered on the wiki page) is "What is the relation of Dromaeo to SunSpider?" (SunSpider being the WebKit team's JavaScript testing suite).
Right now I'm working very closely with all the browser vendors to make sure that we have a common-ground test suite that is both highly usable and statistically sound (not to mention providing results that are universally interesting). There are a number of outstanding concerns that've been raised by users of the suite, along with a number of concerns that've already been rectified - again, all of this is clarified on the Dromaeo wiki page. It's of the utmost concern that this suite be as applicable as possible. It's very likely that the core suite will be moving to a common working ground where all browser vendors can work on it.
I especially want to thank Allan Branch of LessEverything who provided the awesome design for the site. It's like he tapped into my brain and produced exactly what I wanted - without knowing even it. I highly recommend them, if you have design work that needs to be done.
Tags: testing, performance, javascript, browsers, mozilla
22 Comments on 'Dromaeo: JavaScript Performance Testing'
March 13th, 2008
Mozilla developer 'Pavlov' wrote up some extensive details on memory use in Firefox 3. I highly recommend that you check it out.
I borrowed some of his data and created another view of the results. For example, here's the results from Windows Vista of a number of browsers:
Note that both Safari 3 and IE 8 crashed during the test (which was a page runner which automatically opened and closed groups of web pages) so accurate numbers weren't able to be achieved for them. Some preliminary numbers show Safari 3 on a similar path to IE 7 (Stuart mentioned that IE 8 showed a similar path, as well).
It's great to see the massively-improved memory use of Firefox 3. It far excels anything that we offered, previously, and seems to best all other browsers on the platform.
Now, obviously, Windows Vista isn't the only platform available. I, personally, use OS X and am interested to see the memory numbers there as well. One portion of Stuart's blog post that I found to be particularly interesting was the discussion of measuring cross-platform browser memory use - and how difficult it can be. Here's how he explains it:
The short summary is Windows Vista (Commit Size) and Linux (RSS) provide pretty accurate memory measurement numbers while Windows XP and MacOS X do not.
...
On Mac, If you look at Activity Monitor it will look like we’re using more memory than we actually are. Mac OS X has a similar, but different, problem to Windows XP. After extensive testing and confirmation from Apple employees we realized that there was no way for an allocator to give unused pages of memory back while keeping the address range reserved.. (You can unmap them and remap them, but that causes some race conditions and isn’t as performant.) There are APIs that claim to do it (both madvise() and msync()) but they don’t actually do anything. It does appear that pages mapped in that haven’t been written to won’t be accounted for in memory stats, but you’ve written to them they’re going to show as taking up space until you unmap them. Since allocators will reuse space, you generally won’t have that many pages mapped in that haven’t been written to. Our application can and will reuse the free pages, so you should see Firefox hit a peak number and generally not grow a lot higher than that.
I think it's great to see these numbers come in. In many ways Firefox 3 is going to be a very different browser from its previous instantiations. I've, personally, been using it as my primary browser for a while now and have enjoyed the increased performance. It really does feel - and really even look - like a whole new browser.
If you'd like to try Firefox 3, especially without disturbing your current setup, it's really easy - and I've even written up instructions to help you out.
Tags: firefox, performance, browsers, memory
26 Comments on 'Firefox 3 Memory Use'
February 28th, 2008
Something that's frequently befuddled is the differentiation between where JavaScript is executing and where performance hits are taking place. The difficulty is related to the fact that many aspects of a browser engine are reliant upon many others causing their performance issues to be constantly intertwined. To attempt to explain this particular inter-relationship I've created a simplified diagram:
To break it down, there's a couple key areas:
- JavaScript - This represents the core JavaScript engine. This contains only the most basic primitives (functions, objects, array, regular expression, etc.) for performing operations. Unto itself it isn't terribly useful. Speed improvements here have the ability to affect all the various object models.
- Object Models - Collectively these are the objects introduced into the JavaScript runtime which give the user something to work with. These objects are generally implemented in C++ and are imported into the JavaScript environment (for example XPCOM is frequently used by Mozilla to achieve this). There are numerous security checks in place to prevent malicious script from accessing these objects in unintended ways (which produces an unfortunate performance hit). Speed improvements generally come in the way of improving the connecting layer or from removing the connecting layer altogether.
- XMLHttpRequest and Timers - These are implemented in C++ and introduced into the JavaScript engine at runtime. The performance of these elements only indirectly affect rendering performance.
- Browser - This represents objects like 'window', 'window.location', and the like. Improvements here also indirectly affect rendering performance.
- DOM and CSS - These are the object representations of the site's HTML and CSS. When creating a web application everything will have to pass through these representations. Improving the performance of the DOM will affect how quickly rendering changes can propagate.
- Parsing - This is the process of reading, analyzing, and converting HTML, CSS, XML, etc. into their native object models. Improvements in speed can affect the load time of a page (with the initial creation of the page's contents).
- Rendering - The final painting of the page (or any subsequent updates). This is the final bottleneck for the performance of interactive applications.
What's interesting about all of this is that a lot of attention is being paid to the performance of a single layer within the browser: JavaScript. The reality is that the situation is much more complicated. For starters, improving the performance of JavaScript has the potential to drastically improve the overall performance of a site. However there still remain bottlenecks at the DOM, CSS, and rendering layers. Having a slow DOM representation will do little to show off the improved JavaScript performance. For this reason when optimization is done it's frequently handled throughout the entire browser stack.
Now what's also interesting is that the analysis of JavaScript performance can, also, be affected by any of these layers. Here are some interesting issues that arise:
- JavaScript performance outside of a browser (in a shell) is drastically faster than inside of it. The overhead of the object models and their associated security checks is enough to make a noticeable difference.
- An improperly coded JavaScript performance test could be affected by a change to the rendering engine. If the test were analyzing the total runtime of a script a degree of accidental rendering overhead could be introduced as well. Care needs to be taken to factor this out.
So while the improvement of JavaScript performance is certainly a critical step for browser vendors to take (as much of the rest of the browser depends upon it) it is only the beginning. Improving the speed of the full browser stack is inevitable.
Tags: javascript, browsers, performance
10 Comments on 'JavaScript Performance Stack'
February 20th, 2008
When doing DOM-based performance testing you frequently need to pick a sample HTML document to work against. This raises the question: What is a good, representative, HTML document?
For many people a good document seems to file into one of two categories:
- A large web page with a lot of content. When we did our initial performance testing with jQuery we used Shakespeare's As You Like It (lots of elements, but a very flat structure) - Mootools uses an old draft of the W3C CSS3 Selectors recommendation. This has a lot of content, as well - thousands of elements with a medium depth structure.
- A popular web page. Common recommendations include 'yahoo.com' and 'microsoft.com'.
What's troubling is that there doesn't really seem to be any way to determine what a representative web page actually is. There's a couple things that I'd like to propose as being good indicators:
- Standards-based semantic markup (including strong use of attributes: id, class, etc.).
- Non-trivial file size and element count (testing the scalability of the performance).
- Some use of tables and form elements (frequent inclusions in most web pages).
- Strong use of CSS (frequently implies a deep element structure, in order to accommodate complex layouts).
- Pervasive use of JavaScript. If JavaScript performance analysis is being done it's probably good to start with a page that already has a desire to use it.
I think, out of all of these aspects, one page stands out: CNN.com. Here's why:
- It uses semantic HTML 4 markup with a lot of classnames and ids.
- It's about 92kb in size and has 1648 elements in it.
- It has some tables (seemingly for legacy material) and some forms (search forms, drop-downs).
- Lots of CSS and JavaScript. Makes good use of Prototype which, at least, should show an desire in having a page with performant JavaScript.
- It's, also, imperfect. I would consider this to be desirable. Rarely are pages completely without fault - and the heavy use of embedded JavaScript, ads, and non-semantic tables helps to add a stark dash of reality.
Of course, analysis could always be done against multiple pages and be viewed in aggregate but we need a place to start. So, what do you think; is CNN a good, representative, page for doing performance analysis against?
Tags: javascript, css, html, performance
12 Comments on 'A Typical HTML Page'
·
« Previous entries