Announcing SquirrelFish
Posted by ggaren on Monday, June 2nd, 2008 at 5:37 pm
“Hello, Internet!”
WebKit’s core JavaScript engine just got a new interpreter, code-named SquirrelFish.
SquirrelFish is fast—much faster than WebKit’s previous interpreter. Check out the numbers. On the SunSpider JavaScript benchmark, SquirrelFish is 1.6 times faster than WebKit’s previous interpreter.

What Is SquirrelFish?
SquirrelFish is a register-based, direct-threaded, high-level bytecode engine, with a sliding register window calling convention. It lazily generates bytecodes from a syntax tree, using a simple one-pass compiler with built-in copy propagation.
SquirrelFish owes a lot of its design to some of the latest research in the field of efficient virtual machines, including research done by Professor M. Anton Ertl, et al, Professor David Gregg, et al, and the developers of the Lua programming language.
Some great introductory reading on these topics includes:
- The Structure and Performance of Efficient Interpreters (Introduces the fundamentals of virtual machine design and explains the importance of direct threading)
- Virtual Machine Showdown: Stack Versus Registers (Details the benefits of register machines, and the importance of copy propagation)
- The Implementation of Lua 5.0 (Outlines the implementation of a real-world register-based bytecode engine, with a sliding register window calling convention)
I’ve also pored over stacks of terrible books and papers on these topics. I’ll spare you those.
Why It’s Fast
Like the interpreters for many scripting languages, WebKit’s previous JavaScript interpreter was a simple syntax tree walker. To execute a program, it would first parse the program into a tree of statements and expressions. For example, the expression “x + y” might parse to
+
/ \
x y
Having created a syntax tree, the interpreter would recursively visit the nodes in the tree, performing their operations and propagating execution state. This execution model incurred a few types of run-time cost.
First, a syntax tree describes a program’s grammatical structure, not the operations needed to execute it. Therefore, during execution, the interpreter would repeatedly visit nodes that did no useful work. For example, for the block “{ x++; }”, the interpreter would first visit the block node “{…}”, which did nothing, and then visit its first child, the increment node “x++”, which incremented x.
Second, even nodes that did useful work were expensive to visit. Each visit required a virtual function call and return, which meant a couple of indirect memory reads to retrieve the function being called, and two indirect branches—one for the call, and one for the return. On modern hardware, “indirect” is a synonym for “slow”, since indirection tends to defeat caching and branch prediction.
Third, to propagate execution state between nodes, the interpreter had to pass around a bunch of data. For example, when processing a subtree involving a local variable, the interpreter would copy the variable’s value between all the nodes in the subtree. So, starting at the “x” part of the expression “f((x) + 1)”, a variable node “x” would return x to a parentheses node “(x)”, which would return x to a plus node “(x) + 1”. Then, the plus node would return (x) + 1 to an argument list node “((x) + 1)”, which would copy that value into an argument list object, which, in turn, it would pass to the function node for f. Sheesh!
In our first rounds of optimization, we squeezed out as much performance as we could without changing this underlying architecture. Doing so allowed us to regression test each optimization we wrote. It also set a very high bar for any replacement technology. Finally, having realized the full potential of the syntax tree architecture, we switched to bytecode.
SquirrelFish’s bytecode engine elegantly eliminates almost all of the overhead of a tree-walking interpreter. First, a bytecode stream exactly describes the operations needed to execute a program. Compiling to bytecode implicitly strips away irrelevant grammatical structure. Second, a bytecode dispatch is a single direct memory read, followed by a single indirect branch. Therefore, executing a bytecode instruction is much faster than visiting a syntax tree node. Third, with the syntax tree gone, the interpreter no longer needs to propagate execution state between syntax tree nodes.
The bytecode’s register representation and calling convention work together to produce other speedups, as well. For example, jumping to the first instruction in a JavaScript function, which used to require two C++ function calls, one of them virtual, now requires just a single bytecode dispatch. At the same time, the bytecode compiler, which knows how to strip away many forms of intermediate copying, can often arrange to pass arguments to a JavaScript function without any copying.
Just the Beginning
In a typical compiler, conversion to bytecode is just a means to an end, not an end in itself. The purpose of the conversion is to “lower” an abstract tree of grammatical constructs to a concrete vector of execution primitives, the latter form being more amenable to well-known optimization techniques.
Therefore, though we’re very happy with SquirrelFish’s current performance, we also believe that it’s just the beginning. Some of the compile-time optimizations we’re looking at, now that we have a bytecode representation, include:
- constant folding
- more aggressive copy propagation
- type inference—both exact and speculative
- specialization based on expression context—especially void and boolean context
- peephole optimization
- escape analysis
This is an interesting problem space. Since many scripts on the web are executed once and then thrown away, we need to invent versions of these optimizations that are simple and efficient. Moreover, since JavaScript is such a dynamic language, we also need to invent versions of these optimizations that are resilient in the context of an unknown environment.
We’re also looking at further optimizing the virtual machine, including:
- constant pool instructions
- superinstructions
- instructions with implicit register operands
- advanced dispatch techniques, like instruction duplication and context threading
- getting computed goto working on Windows
Performance on Windows has extra room to grow because the interpreter on Windows is not direct-threaded yet. In place of computed goto, it uses a switch statement inside a loop.
Getting Involved
If you’re interested in compilers or virtual machines, this is a great project to join. We’re moving quickly, so the best way to come up to speed is to log on to our IRC channel.
As always, testing out nightly builds and reporting bugs is also a great help.
Extra Bonus Updates
We’ve got some extra bonus info: very early draft documentation of the SquirrelFish VM’s opcodes. For those of you who know about VMs, you may find this enlightening, for those who don’t, you may find it is simpler than you expect.
In addition, we have a detailed comparison of Safari 3.1 vs. SquirrelFish, looking at the individual tests, it is interesting to see which sped up the most. If you look at this comparison to Safari 3.0, you can see that we’ve sped up 4.34x overall since Safari 3, and have improved some kinds of code by over an order of magnitude.
SquirrelFish around the web: There’s lots of interesting discussion in the reddit article about this post. And posts from key SquirrelFish developer Cameron Zwarich has performance data and other info, as does occasional WebKit contributor Charles Ying.
June 2nd, 2008 at 6:51 pm
Holy Jebus that’s fast!
June 2nd, 2008 at 6:53 pm
This is a topic of great interest to me. I’d appreciate it if you could post the name of some “stacks of terrible books and papers” that you pored over
June 2nd, 2008 at 7:34 pm
@mazdak
So much has been written on these topics. Without knowing what you’re working on, I’m not sure what to recommend.
If you liked the articles I posted, here are some good follow-ups:
* Everything involving Ertl and/or Gregg tends to be well-researched, detailed, and relevant. You can start at Ertl’s ACM bibliography.
* For more on Lua, you can check out The Evolution of Lua.
In general, steer toward articles with detailed benchmarks, and you can’t go wrong!
June 2nd, 2008 at 7:37 pm
Does this have something in common with LLVM? Also known as the revolution of compilers and such
June 2nd, 2008 at 8:31 pm
I have no idea what this about, but I love the picture.
June 2nd, 2008 at 9:00 pm
It may be fast but, on some gallery pages with sliding images, WebKit pegs the ol CPU meter still. That’s a biggie for me and my customers, CPU usage.
June 2nd, 2008 at 9:03 pm
@webjive
If you have a site that consistently uses excessive system resources, please file a bug on bugs.webkit.org
June 2nd, 2008 at 9:47 pm
Ever wondered if it makes any sense and at least is possible to cache this byte compiled code instead of caching the original source file. Ever done any experiments in this field? Applications built with huge application JavaScript frameworks (like qooxdoo) may have multiple large JS files with more than 1MB size (un-gezipped). Storing a “byte compiled” version in the cache may make sense for files of that size.
June 2nd, 2008 at 10:24 pm
@wpbasti: An issue with that is that we optimise lookup of global values, which may not be valid if the load order of such files is different. That said it is possible that in the future we may be able to appropriately update such references. Another thing to consider is of course the fact that we don’t actually compile functions until they’re called, and even then the time to compile any given function is typically tiny compared to the time required to execute it.
June 2nd, 2008 at 10:25 pm
It’s great to finally see a post about SquirrelFish on the WebKit blog. I made a short post to kick off a new blog that I will hopefully use to talk about ongoing JavaScriptCore development. The first post includes some SunSpider numbers for the bleeding edge versions of different browsers, which may be of interest to people reading this post.
June 2nd, 2008 at 11:11 pm
[...] The best logo ever (via Simon). [...]
June 2nd, 2008 at 11:45 pm
It’s excellent !
It’s possible to use SquirrelFish for embedding ? As spidermonkey ?
June 2nd, 2008 at 11:47 pm
[...] Safari Webkit team for calling their new super fast Javascript engine - SquirrelFish. Seriously - you can’t get more hard core geek than that. Can you? Despite the name the [...]
June 3rd, 2008 at 12:14 am
Is that normal that the Web inspector is not available in the latest nightly build?
June 3rd, 2008 at 12:14 am
@yellowiscool
Yes, it is possible to embed it in your own programs - the JavaScriptCore library has a public API that works cross-platform.
June 3rd, 2008 at 12:23 am
[...] I compared WebKit’s new SquirrelFish bytecode JavaScript interpreter against Tamarin, the JIT JavaScript engine currently in Flash 9 and in development for [...]
June 3rd, 2008 at 12:50 am
@iFrodo: it should be there, have you checked the context menu, or the Develop menu? Or are you referring to Drosera? I ask because Drosera was recently killed off as we have now integrated the debugger with the web inspector.
June 3rd, 2008 at 3:11 am
@ggaren
Thanks for the recommendations. I am looking for good introductory material on VM design for now.
June 3rd, 2008 at 5:06 am
[...] Surfin’ Safari - Blog Archive » Announcing SquirrelFish [...]
June 3rd, 2008 at 5:09 am
[...] the rest of the Mac nerd world, I saw the announcement of SquirrelFish as very promising and inspiring news. The WebKit team has massively [...]
June 3rd, 2008 at 6:32 am
[...] new JavaScript engine: SquirrelFish is a register-based, direct-threaded, high-level bytecode engine, with a sliding [...]
June 3rd, 2008 at 11:20 am
What’s really exciting is that this was started and finished in two months. That is really a testament to the quality of the webkit team.
June 3rd, 2008 at 12:24 pm
[...] liegt, stellten heute in ihrem Blog die neue Javascript-Engine des WebKits, SquirrelFish genannt, vor. Der neue Interpreter arbeitet bis zu 4 mal schneller als noch in Safari 3 und immerhin noch 1,6 [...]
June 3rd, 2008 at 5:05 pm
Way to go, WebKit team!
If I’m not mistaken, wasn’t WebKit’s javascript already head-to-head or faster than competing browser? If so, this is yet another mark for them to reach for.
Fantastic!
June 3rd, 2008 at 5:44 pm
[...] you’re reading this, chances are that you already know about SquirrelFish, Appl/WebKit’s new Javascript implementation. Early tests show SquirrelFish to be 60% faster [...]
June 3rd, 2008 at 6:24 pm
[...] an announcement on the Safari blog about SquirrelFish, their new JS interpreter. To sum it up: SquirrelFish is a register-based, direct-threaded, [...]
June 3rd, 2008 at 9:07 pm
I running 31st may webkit build, dont know if this includes squirrel fish.
Safari is holy cow fast, damn, didnt know browsers of past were so inefficient
June 4th, 2008 at 7:03 am
I’m unable to receive a WebKit bugzilla login at GMail (twotwotwo plus webkitbugs at gmail).
Since I figure an incorrectly-submitted bug report beats none: In Safari 3.1.1 (but not Firefox 3 RC 1), trying to set the body of my GMail vacation message to “I’ll be away” makes it save as “I’ll b away”, reproducibly. If I try to set the away message “‘ABCDEFGHIJ” (note the leading quote), the “A” is dropped. (My vacation subject is “I’m away until June 17″ if that’s necessary to repro.)
Hope this is really a WebKit bug and I’m not being silly. It’s a great product.
June 4th, 2008 at 7:07 am
[...] interpreter in Apple’s WebKit, which is the engine for the Safari browser, is very slow. Go here, scroll down to “Why It’s Fast”. They were interpreting the syntax tree rather [...]
June 4th, 2008 at 11:14 am
[...] SquirrelFish’s bytecode engine elegantly eliminates almost all of the overhead of a tree-walking interpreter. First, a bytecode stream exactly describes the operations needed to execute a program. Compiling to bytecode implicitly strips away irrelevant grammatical structure. Second, a bytecode dispatch is a single direct memory read, followed by a single indirect branch. Therefore, executing a bytecode instruction is much faster than visiting a syntax tree node. Third, with the syntax tree gone, the interpreter no longer needs to propagate execution state between syntax tree nodes. - The webkit blog [...]
June 4th, 2008 at 1:33 pm
I must confess I’m very happy about the commotion WebKit in general and SquirrelFish in particular are causing in the Javascript engine realm. Seemingly, new wine will again not go into old wineskins, and it needed a fairly fresh endeavor to bring Javascript closer to a level it deserves. In contrast, the Mozilla project seems to suffer from a certain stiffness in that regard, despite of Tamarin and all that, and the amount of love Firefox’s engine receives leaves me disappointed. Which is even more surprising since all of the browser’s chrome runs on top of it. All experiences from similar runtime environments (e.g. Emacs, Eclipse, basic operating systems,…) seem to be ignored and have to be gathered again. When will they start running multiple interpreter instances (or at least worker threads) in the browser, to isolate chrome and different pages from each other?! Will WebKit do it? - Anyway, way to go, WebKit!
June 4th, 2008 at 1:53 pm
[...] squirrelfish ist ein Javascript Engine für Browser, der momentan vom WebKit Open Source Projekt entwickelt wird. Unter anderem nutzt Safari diese Engine, wohingegen Firefox gerade an tamarin arbeitet. squirrelfish scheint allerdings weitaus schneller zu sein, als Tamarin und das wird auch auf der Homepage von Webkit erklärt, allerdings alles sehr technisch und irgendwann schnallt man einfach ab, aber interessant ist es trotzdem…für Informatiker…manche… [...]
June 4th, 2008 at 2:17 pm
The interpret speedup is impressive. Here is my test of SunSpider benchmark on Linux/debian:
Jun 4 Webkit trunk Total: 5842.2ms +/- 12.5%
Mar 12 Webkit trunk Total: 7864.6ms +/- 3.1%
But SunSpider Benchmark only report the interpret time. For JavaScript, most of the scripts are compiled on-the-fly, so the compilation time is also important. It will be more convincing
if considering both interpret time and compile time on SunSpider Benchmark. Although, considering code cache, speedup interpreter is more important.
June 4th, 2008 at 2:27 pm
@gfan: The SunSpider execution time *already* includes parsing, compilation, and execution.
June 4th, 2008 at 2:29 pm
@gfan: SunSpider reports the complete time to compile and run the tests, in JSC it is basically impossible not to as we lazily compile functions, i’m not sure why you think it does otherwise.
June 4th, 2008 at 2:36 pm
I’m interested if this anything to do with LLVM? Someone mentioned that already, can anyone confirm or deny
June 4th, 2008 at 3:06 pm
Squirrelfish does not make any use of LLVM.
June 4th, 2008 at 3:13 pm
@Mark Rower and @Oliver: When I traced the SunSpider test cases on SpiderMonkey before, it seems to me when it executes the first line to record the time, the script is already compiled. I thought this is same for Squirrelfish. Sorry for my confusion.Thanks.
June 5th, 2008 at 9:09 am
This is really impressive! I’ve seen some comparisons to Tamarind around the web and my own testing with apps seems both solid and fast.
I’ve attempted to use squirrelfish with the latest public beta with a number of existing javascript benchmarks and I’m seeing a number of tests with results of 0 which, when I repeat the tests, either stay at zero or go to 16 ms. Is there a lower limit on measurability using the timing functions?
Are there known functional differences between the prior engine (JavaScriptCore? is that what it was called?) and SquirrelFish? Even things that used to be broken that you fixed.
Are there areas where we should expect dramatic speed increases that should change how JS developers design code? There are certainly costly choices that we now avoid.
Oh, and I hope that we’ll see this on the iPhone. That’s a device whose javascript performance could use a speedup.
June 5th, 2008 at 7:40 pm
[...] Webkit’s JS interpreter has just hit the big time — it’s now a full blown vm instead of a syntax-tree-walker like the other slow-pokes. [...]
June 5th, 2008 at 8:53 pm
@mcroft: based on what you’re seeing i’m guessing that you’re testing on windows, which for some reason seems to be limited to only 16ms accuracy in some circumstances :-/
June 6th, 2008 at 5:41 am
[...] is about to get a whole lot faster. The Surfin’ Safari weblog has written about SquirrelFish, the code name (and what a code name it is) for the new interpreter for WebKit’s core [...]
June 6th, 2008 at 5:44 am
[...] SquirrelFish - So awesome. Those webkit guys just make my day every frickin time. Too lazy to click the link? SquirrelFish is a new superfast JS vm runtime. Benchmarks show it faster than Tamarin at the moment even. Not much need for explanation here. The better performance runtimes we get for the open web, the better it can compete against proprietary competition! Ok, I guess that’s enough for now. I really don’t want to turn this into a news aggregation blog, regurgitating things that I think are cool. You can just go to Ajaxian to see where I get MY news from. However, news regurgitation is easy, and I needed to write something. Also, I feel like such a negative nancy sometimes and I thought a positive post would be nice for a change. [...]
June 6th, 2008 at 6:48 am
@Oliver: thanks! Yes, I’m testing on Win XP, so that explains that.
June 8th, 2008 at 2:13 am
[...] Annunciato sul blog di Webkit, il motore di Safari, l’introduzione di un nuovo interprete JavaScript il cui nome in codice è SquirrelFish. Il nuovo interprete SquirrelFish è più veloce (1.6 volte più veloce) del precedente interprete JavaScript di WebKit, come evidenziato in un grafico sul blog. [...]
June 8th, 2008 at 3:20 am
(off topic)
Hello the fast fish sounds great. But I dont need more speed. I need a safari that can relax.
Right now Safari uses 85% cpu and the only thing I do with the browser is writing this lines. Is it flash that hangs from a previous page? Or is it buggs in Safari? There is no flash as I can see. I have done a prosess sample if anyone is interested.
I am an editor/writer and dont know so much about programming, but I do know that people dont like pages that starts up fans like an old DC-3. I understand a programmer that wants the application to have as much power as possible, but the overall experience is to hot. I have talked with colleagues and it is a problem that a page triggers heat and fans. We fear its a turn-off among the readers. One option we discussed is no-flash on the frontpage. Right now i am on powerbook, but i guess the thermomanagment is similar on other laptops.
So if Safari and/or flash and other apps can do things slower, but cooler, I will support that.
Lastly. Thanks for Safari. A truely great piece of work
– Erland Flaten, Lillehammer. Norway
June 8th, 2008 at 7:08 am
@eflaten: “Higher speed” when it comes to computers means it uses the CPU less. It is the same thing, so yes you _do_ need higher speed
As for flash: if the flash player is doing something in an inefficient way that is unfortunately outside the scope of what the WebCore team does. If you can indeed show that flash is to blame then you probably need to complain to Adobe.
June 8th, 2008 at 2:05 pm
Wow! Nice fricken work optimizing Javascript. That’s awesome.
I’ve got this little canvas demo where you watch circles smash into smaller circles (or the impatient can grab a circle and manually smash it into the other circles). Here’s the default:
http://tech.no.logi.es/woodshop/momentum6.php
With the latest Webkit build I can totally crank up the smashing into even smaller bits:
http://tech.no.logi.es/woodshop/momentum6.php?webkit=1
Try that link in the new webkit build vs Safari vs Firefox. Pretty apparent difference.
June 8th, 2008 at 7:25 pm
@eflaten
I would love to see profile data for any page where CPU usage is out of control, especially if it does not appear to be Flash. Even if it is Flash, we can pass the data on to Adobe.
June 12th, 2008 at 1:32 am
[...] fastest content rendering around as well as nippy JavaScript execution with the state of the art SquirrelFish VM. The JavaScript SDK is available independently of the web renderer for sandboxed client-side game [...]
June 16th, 2008 at 1:51 pm
There is a bootleg store in China called “Squirrel–shaped Fish”. In China it’s standard procedure for stores to illegally use names and logos from large international brands. With this store, they stole the Lacoste alligator logo, but made their own name. Pretty genius. I don’t know if this was the inspiration for the name squirrel fish, but it should be.
http://chadvonnau.com/china/4/15.IMG_4485_bootlegs.jpg
June 16th, 2008 at 6:04 pm
[...] about SproutCore on its official website. Apple also has more details on the new MobileMe, and SquirrelFish details are on the Webkit project site. Possibly related posts: (automatically generated)Adobe [...]
June 17th, 2008 at 1:34 pm
[...] There’s already a developer seed of Safari 4 released. Which includes the SquirrelFish JavaScript interpreter (renamed from GlassFish to avoid confusion with Apple’s other Java stuff). SquirrelFish is a [...]
June 17th, 2008 at 2:38 pm
[...] SquirrelFish JavaScript interpreter in Safari 4 is a bytecode engine which eliminates almost all of the overhead of a tree-walking [...]
June 18th, 2008 at 1:33 pm
[...] you’ve seen it is not at all slow, I guess with Firefox 3 and newer versions of Safari / Webkit it should get even faster. The point behind this is that if the foundation stands as-is, its just a [...]
June 19th, 2008 at 5:13 pm
The numbers for SquirrelFish look pretty impressive. It seems on yesterday I was reading about its development, but there was no expectation it would be merged into WebKit anytime soon.
Speaking of other advancements, is there any chance we’ll see a blog post about the new CSS variable support added to build 34666?
June 20th, 2008 at 4:16 am
Hey Apple team, I beg you to develop an updater for Safari for Windows that just updates the changed bits i.e. patches the existing install instead of downloading the whole thing again, uninstalling and reinstalling. As a user, it’s one thing keeping me away from consistently using Safari because when a vulnerability is detected, I can’t continue to use the old version and I am unable to download large files everytime.
June 21st, 2008 at 9:13 am
[...] SquirrelFish. Everyone should look to these folks next time they need inspiration for naming their JavaScript interpreter. And the logo? Spectacular. [...]
June 22nd, 2008 at 10:04 pm
[...] Safari 4. Safari 4 vil også muligens få gleden av en helt ny JavaScript interpretor med kodenavn SquirrelFish. Når dette implementeres er avhengig av utviklingsprogresjonen, men tester så langt viser gode [...]
June 23rd, 2008 at 2:09 am
[...] Leopard Wish List: 2005 How Open will the iPhone Get? Surfin’ Safari » Announcing SquirrelFish Microsoft’s Application Features in Mac OS X, System Wide. Microsoft’s business model [...]
June 24th, 2008 at 7:19 pm
Is it just me or has Webkit for Windows hasn’t worked since June 09???
June 25th, 2008 at 2:15 am
It would be great to also work on memory leaks
http://dotnetperls.com/Content/Browser-Memory.aspx
June 25th, 2008 at 4:41 pm
[...] excited about the SquirrelFish project, which promises to speed up plain old JavaScript running in the browser dramatically — 1.5 [...]
June 28th, 2008 at 7:04 am
Wouldn’t there be a potentially large performance gain by:
1) pre-compiling the bytecode on the server
2) serving the bytecode and byte code interpreter, only, to the client
3) interpreting the bytecode on the client
June 29th, 2008 at 2:19 pm
dicklacara: probably not. The compilation time is pretty minimal, and that would make it so that the bytecode format couldn’t be upgraded in the future, which would make further performance gains harder.
July 11th, 2008 at 9:04 am
I got SunSpider 0.9 to run on MobileSafari 2.
Summary and comparison to WebKit post-SquirrelFish and IE7 on my 6 month old Thinkpad…
Totals:
iPhone: 148752.0ms +/- 3.9%
WebKit: 2152.0ms +/- 1.7%
IE 7 : 35659.8ms +/- 3.4%
iPhone JavaScript is 4.17 times slower than IE7
iPhone Javascript is 69.1 times slower than WebKit Nightly
Internet Explorer 7 is 16.6 times slower than WebKit Nightly
Detailed Results
July 20th, 2008 at 10:13 pm
The secret behind isn’t LNVM, it’s just Forth. It is the know how of this some people well known programming languge. If you hear about M. Anton Ertl and David Gregg, and about a very fast direct-threaded interpreted (may be byte code or not), it is clear,. it is Forth-know-how. I’m absolutly sure.
Some Forth systems are the fastest threaded code and even direct-threaded code interpreter (als virtual stack machines) available. Know how coming from here speeds up SquirrelFish.