That original proof of concept gwtBox2d demo has slowly evolved along with the rest of the GWTns library. Though only a small portion of the original code remains, it has ended up being a pretty great stress test for new and tweaked library functionality; no matter how much performance improves, there can always be more blocks.
While the program itself doesn’t really do anything useful in a meaningful sense, or even help move Web Workers beyond a useful-for-simulated-annealing-while-animating-a-png demo reputation, I think it does have its own demonstrative value. And really, I have enough fun poking at it that I figured I might as well just gussy up the styling a bit and let other people have a turn.
- If you’ve run the worker version of the demo without reading this, and were running (pre-3.6.4) Firefox, I probably crashed your browser. And on reopening, promptly did it again. Sorry about that (hi SUMO!). This isn’t a problem inherent to workers or this library (see TraceMonkey tear it up in The Rationals) but (probably) due to branch heavy/poorly traceable physics code. I’ll write more about Firefox below.
- Stacking boxes turns out to be a good stressor, but Box2d can do much cooler things than this, often requiring more forgiving levels of computing power. For more examples, see the JBox2d demos or some of the many Flash/Alchemy demos out there.
Now that you’ve been warned, here are the links. Return (or keep reading) for more on the application, your favorite browser, and some thoughts on using web workers in GWT.
- Blocks, native web workers where available (may crash Firefox < 3.6.4)
- Blocks, worker emulation in all browsers (safe)
Please Press That
First, the basics. Browsers that can run the physics engine in a web worker will do so, those that can’t, won’t, and which is happening should be obvious. If you’d like to force worker-emulation mode no matter what support is available, there’s a link for that at the bottom right.
Also on the right, you can select four different stress levels; the first three are named for the level at which that particular browser can regularly keep the blocks stacked, a mark of a stable simulation. Note that “Firefox 3.6″ is really 3.6, not the here-slower 3.6.3 (the security updates are a net positive, though). Unfortunately, my Windows-browser-centrism is on full display, but feel free to mentally insert “Safari” in the list above or below “Chrome,” whichever tickles your personal fanatical tendencies. I’m also hoping that UI purists can forgive my use of radio buttons and that redundant “reset” button. In some impromptu testing, users tended to be reluctant to click on things (maybe from all that “crash your browser” talk), so those elements function as a bit of a hack to ease users into a first click.
Finally, while we’re discussing sins: yes, those are all divs. I was originally going to use this opportunity to check out Hydro4GE’s Raphael wrapper, but testing and optimizing the transform module in something approaching a real application (i.e. transforming elements when you want to be spending execution time doing anything but transforming elements) was too great an opportunity to pass up.
A physics engine is, at least in this case, an ideal match for a web worker. Once the simulator and display models are initialized, only a very small delta needs to be passed to keep them synchronized ([position + orientation] * number of bodies). Since you want to update the display as often as possible, but total message-passing cost must be kept to a minimum for performance, frequent small payloads work well.
Using Box2d in this particular, “one-code-fits-all” approach also works nicely because there is a natural “chunking” point between simulation steps. Though it may be possible to wring out a little more responsiveness in the non-worker case by further dividing these steps, the natural split is convenient and even the slowest browser released in the last 1.05 years can handle this code without throwing a slow script warning.
Much like many of the worker demos that came out about a year ago, this one aims to clearly demonstrate the benefits of taking a computation-heavy script out of the main execution thread. I think this one is a little more fair, though, because I actually try to make it run well in browsers without web worker support.
Let’s look at the emulated-worker mode first.
The Emulation Test
This is an “Emulation Test” and not, say, a “Regular Test,” because every component interacts with the simulator as if it were really running within a web worker. While this does introduce some overhead into the normal timer/callback setup, the worker specification is so simple that it ends up being pretty minimal, especially in the face of the benefits of a unified code base.
As a result of the aforementioned “not trying to be a jerk to older browsers” approach, most UAs acquit themselves fairly well in worker emulation mode. At the lower stress levels, there will likely be little difference between the worker and non-worker versions.
As the stress level is increased, though, it should be clear that the “Physics Rate” (the number of simulation steps per second) and the “Animation Rate” (the number of display updates per second) become tightly coupled. Even though the physics engine is updating as often as it can and the interface is only requesting 30 refreshes per second, when the rate of both fall below 30Hz, there is no effective difference betwen what they want (John Resig explains this really well in his timer tutorial).
The end result is obvious to anyone who has used a sluggish website (i.e. everyone). Since the two are tightly coupled, if the physics step takes longer, the interface will be proportionally less responsive. Moreover, since most current browsers refresh styling on the main thread, nothing is safe.
That’s what the fourth and highest stress level is for. Try it again, in even the fastest browser (unless you’re from the future). Move your mouse over anything blue, which will trigger a color change via a
The Native Test
The native-worker mode will use web workers for the physics engine if available; if not, the emulated-worker mode will automatically be used as a fallback. The benefit is simple, as long as more processing power is available. With the physics simulator off the main thread, no matter how long its time steps take, it won’t be able to dominate interface refreshes.
Once again, at lower stress levels the improvement might be slight, but try the test again (may crash Firefox) and set it to the highest stress level. Then test the UI. Though all the current browsers are still unable to keep up with the physics step, this fact has significantly less impact on the interaction responsiveness. The slingshot seems connected to your mouse drag. The blocks immediately change color when you hover over them. The standard form elements respond to your cursor. It’s nice when your website isn’t acting like Sonic 2 when you just lost all your rings.
The point isn’t a new one, but it also doesn’t have much to do with a physics engine, either. If you move your heavy scripts to a worker, you’ll have a lot less worrying to do about keeping an interface that is snappy and responsive, which is in many ways our first responsibility. Let the newer browsers do the heavy lifting for you, and let libraries pick up the older ones’ slack. Workers really are easier than you think.
Chrome. So very fast. The only minor complaints I have are that
:hover does not update without a mouse move (even if the hovered element itself moves), and that, in terms of apparent sane behavior,
-moz-user-select beats the pants off of
Firefox. Again, the worker version will likely crash the browser. This particular type of code is just not conducive to tracing and turns out to be a bit of a pathological case for TraceMonkey’s current tracing heuristics. Note that this might not actually be the cause of the crashes, but it is similar enough to TraceMonkey bugs fixed in 3.5 that I’m going to assume it to be the case. However, TM’s issues with my code are made more clear since worker scripts are now JITed; Firefox 3.6.x (with x>1) can no longer meet the performance level I set for it. Disable JIT compilation (or just turn on the Firebug Script or Console panels) and performance will double or triple. The entire situation is amusing enough that I think I’m going to write a post dedicated to it. Update: I can’t get recent 3.7 nightlies to crash. Hooray! Still slow.
Internet Explorer 8. Obviously IE is going to be at the shallow end of the performance pool, but I’ve been surprised at how well it performs, actually. Matrix filter transforms are still somewhat slow even after my earlier work, but they seem more fill-bound than geometry-bound. Small rectangles and no overdraw help, too. Finally, Page Speed warns about possible problems with
:hover on non-anchors with a strict doctype, but I haven’t detected this in practice.
IE9 Platform Preview. Unfortunately there appears to be some bug that is preventing the JSON payload of worker messages from initializing properly in the Preview, which then causes an error when they are deserialized. However, setting breakpoints where the messages are created causes the Preview to crash, so I’m not pursuing the problem at this time.
Safari. On Windows, Safari is pretty much like Chrome with a few more ways to crash it thrown in (mostly via the inspector). On the Mac, Safari really is a class act. A shiny new 3.06GHz MacBook Pro runs the highest test level in something approaching interactive (though not fast enough for a stable timestep, yet). Perhaps more impressive: an old 800MHz dual G4 is able to handle the IE8 level easily, and is able to update the interface faster than 28fps even while the simulation is at the highest setting. I don’t advocate always using all available processing power for rendering a web page, but if the alternative is an unresponsive interface and an impatient user, use what you have.
On Workers and GWT
These thoughts apply to using workers in general, but I’ll focus on GWT in particular. Once again, I’m not sure that this example fully supports my call for wider worker use, but in using them, I’m increasingly convinced that they will soon be considered an essential part of building a faster web.
The GWT compilation process is not currently a perfect fit for generating web workers, but web workers are in many ways a perfect fit for the current trends in GWT development. As the cult of Ray Ryan has spread, the MVP pattern and event buses have become the poster children for structuring large GWT applications. This is the world that workers were made for.
Worker requirements, again:
- No shared state except where created and maintained by message passing.
- No DOM access.
I don’t think the connection could be much clearer. Hook up a worker object to your event bus, finish pulling the view out of model and presenter code, done.
Pragmatically, there is of course more to consider. Code with truly separated logic and presentation will need little change; code otherwise structured will likely need non-trivial work. The point isn’t that the addition of workers will come free, but that—if you’re already using these patterns—the barrier to entry is lowered enough that even modest potential performance gains can justify a day or two spent benchmarking.
The purpose often given for web workers is that they allow certain kinds of scripts to be written more naturally, without worrying about program flow disrupting the user experience. Blocking I/O and long-running calculations are usually then given as examples. While these are now possible in platform-specific applications—like a (modern) browser extension—the reality is that IE6, 7, and 8 are going to be with us for a long time. Moreover, as good GWT developers, we are already disciples of asynchrony (in addition to that whole Ray Ryan thing); let’s not regress.
The most obvious worker candidates are anywhere an IncrementalCommand is found, as long it’s needed for more than updating the DOM (and even then, there is always innerHTML string creation). But in fact, any long-running background process that reduces interface responsiveness should be strongly considered for extraction to a worker. Try the emulated worker test again. If the physics rate is 5fps, that means a step takes only 200ms, nowhere near long enough to trigger a slow script warning. But notice again the effect that has on the interface. Just because something can be done without triggering an error doesn’t mean that your application doesn’t suffer for it, especially if you have to do it more than once.
Finally, I don’t know what the guerilla-GWT-developer demographic is like, but workers are also good for tasks that might normally be performed with a server roundtrip but become limited by bandwidth, throughput, or latency. Local image editing has become a prominent example of this, but really any transformation of a large data set (geometric or otherwise) would qualify, as long as security issues were carefully addressed.
Like any multi-threaded approach, there will be trade-offs. There will always be overhead. Gains will be sublinear. With no way to discover the number of processes/threads that can be executed in parallel (as limited by the particular hardware and browser), blind guesses will have to be made. This is mitigated by the fact that extra workers just devolve to same-core native context switching, but before you start writing that worker-based MapReduce implementation, stop being ridiculous and remember that single- and dual-core processors are going to be the (client) norm for a few years yet.
I don’t want to sound too dour, though. Once again, you should be using web workers. The point of the module I wrote for GWTns was to make this as easy as possible (though it certainly isn’t the only approach available). While it enables, by design, only a proper subset of worker functionality, this allows developers to use the exact same code for all browsers. I covered this pretty well in the post about The Rationals worker example, but I wanted to go over the end result again.
Older browsers will get an application where the worker object is constructed normally, the only strange behavior being that all messages passed to it are done so asynchronously. Overhead is designed to be as minimal as possible, though there is still more work to be done there. Newer browsers will be able to load the code as a true worker and all execution will be done asynchronously. Script execution that was causing the interface to lag is now off the UI thread. Perhaps most importantly, though, this scheme allows for full dev mode debugging; no need to change your workflow. This last part was pretty much essential for working out the kinks in the blocks demo, but from a larger perspective, development mode is itself the key to not going crazy doing GWT development (Step 1: use dev mode. Step 2: use workers).
I would be remiss if I didn’t mention that, while I’ve found my own code to be very stable, it is very much in an alpha-state and feature-incomplete. There aren’t many other options right now, but you could certainly write a worker script by hand or compile one from a module and then load it manually. SpeedTracer’s web worker code might help you get started with this approach.
However you go about it, try workers out. I think you’ll be pleasantly surprised with the results.