Speedometer:
Benchmark for Web App Responsiveness

Today we are pleased to announce Speedometer, a new benchmark that measures the responsiveness of web applications.

Benchmark

Speedometer measures simulated user interactions in web applications. Version 1.0 of Speedometer uses TodoMVC to simulate user actions for adding, completing, and removing to-do items. Speedometer repeats the same actions using DOM APIs — a core set of web platform APIs used extensively in web applications — as well as six popular JavaScript frameworks: Ember.js, Backbone.js, jQueryAngularJS, React, and Flight. Many of these frameworks are used on the most popular websites in the world, such as Facebook and Twitter. The performance of these types of operations depends on the speed of the DOM APIs, the JavaScript engine, CSS style resolution, layout, and other technologies.

Motivation

When we set out to improve the performance of interactive web applications in WebKit last year, we looked for a benchmark to guide our work. However, many browser benchmarks we checked were micro-benchmarks, and didn’t reflect how DOM APIs were used in the real world, or how individual APIs interacted with the rest of the web browser engine.

For example, one popular DOM benchmark assigns the value of element.id to a global variable repeatedly inside a loop:

for (var i = 0; i < count; i++)
     globalVariable = element.id; 

We like these micro-benchmarks for tracking regressions in heavily used DOM APIs like element.id. However, we couldn’t use them to guide our performance work because they don’t tell us the relative importance of each DOM API. With these micro-benchmarks, we could have easily over optimized APIs that don’t matter as much in actual web applications.

These micro-benchmarks can also encourage browser vendors to implement optimizations that don’t translate into any real-world benefit. In the example above for instance, some browser engines detect that element.id doesn’t have any side effect and eliminate the loop entirely; assigning the value exactly once. However, real-world websites rarely access element.id repeatedly without ever using the result or modifying the DOM in between.

Hence we decided to write a new benchmark for the end-to-end performance of a complete web application instead of testing individual DOM calls.

Mechanics

We tried to make Speedometer faithfully simulate a typical workload on a demo application by replaying a sequence of user interactions. We did have to work around certain limitations of the browser, however. For instance, we call click() on each checkbox in order to simulate a mouse click since many browsers don’t allow web content to create fake mouse or keyboard events. To make the run time long enough to measure with the limited precision, we synchronously execute a large number of the operations, such as adding one hundred to-do items.

We also noticed that some browser engines have used an optimization strategy of doing some work asynchronously to reduce the run time of synchronous operations. Returning control back to JavaScript execution as soon as possible is worth pursuing. However, a holistic, accurate measurement of web application performance involves measuring when these related, asynchronous computations actually complete since they could still eat up a big proportion of the 16-millisecond-per-frame budget required to achieve smooth, 60 frames-per-second user interaction. Thus, we measure the time browsers spend executing those asynchronous tasks in Speedometer, estimated as the time between when a zero-second delay timer is scheduled and when it is fired.

It is worth noting that Speedometer is not meant to compare the performance of different JavaScript frameworks. The mechanism by which we simulate user actions is different for each framework, and we’re forcing frameworks to do more work synchronously than needed in some cases to ensure that run time can be measured.

Optimizations in WebKit

Over the past eight months since we introduced the first beta version of Speedometer under a different name, we have been optimizing WebKit’s performance on this benchmark.

One of our biggest improvements on Speedometer came from making render tree creation lazy. In WebKit, we create render objects for each element that gets displayed on screen in order to compute each element’s style and position. To avoid thrash, we changed WebKit to only create an element’s render object when it was needed, giving WebKit a huge performance boost. We also made many DOM APIs not trigger synchronous style resolutions or synchronous layout so that we get further benefit from lazily creating the render tree.

Another area we made substantial improvements was JavaScript bindings, the layer that sits between C++ browser code and JavaScript. We removed redundant layers of abstraction and made more properties and member functions on DOM objects inline cacheable (See Introducing the WebKit FTL JIT). For example, we added a new JavaScriptCore feature to deal with named properties on the document object so that its attributes could be inline cached. We also optimized node lists returned by getElementsByTagName as well as inline caching the length property.

Finally, WebKit’s performance on Speedometer benefited from two major architectural changes. JavaScriptCore’s concurrent and parallel JIT (See Introducing the WebKit FTL JIT), which allows JavaScript to be compiled while the main thread is running other code, reduced the run time of JavaScripts in Speedometer. The CSS JIT, which allows CSS selectors to be compiled up front and rapidly checked against elements, reduced time spent in style resolution and made querySelector and querySelectorAll much faster.

Because Speedometer is an end-to-end benchmark that uses popular JavaScript frameworks, it also helped us catch surprising performance regressions. For instance, when we tried to optimize calls to bound functions in JavaScript by doing more work up front in Function.prototype.bind, we saw a few percent performance regression on Speedometer because many bound functions were called only once before being discarded. We initially doubted that this result reflected the behavior of real web applications, so we collected statistics on popular websites like Facebook and Twitter. To our surprise, we found exactly the same behavior: the average bound function was called only once or twice on the websites we studied.

Future Plans

In future versions, we hope to add more variations of web applications and frameworks. If you know of any good demo web applications distributed under MIT license or BSD license that could be incorporated into Speedometer, please let us know.

With Speedometer, the web browser community now has a benchmark that measures the responsiveness of real-world web applications. We are looking forward to making further performance improvements in WebKit using Speedometer, and we hope other browser vendors will join us.