Firefox Ethology

Barely related: I’ve written a design document (of sorts) for my implementation of web workers in GWT. Like the library itself, the document is fairly rough, but it does give a pretty good overview of the current state of things. It’s actually in a Wave, for better formatting and the hope of some feedback, so if you’d like to take a look and can’t, drop me a line and I can send you an invite. You can find the link to the wave in this thread.

On to the actual topic of the day.

That Crashing Thing

[Update added at bottom of post.]

I’m happy to report that I have been unable to crash Firefox 3.6.4 beta 1 [only once] using my Blocks worker test. I originally noticed the change in one of the 3.7 nightlies, but it’s nice that it landed earlier. Unfortunately, I’ve been unable to track down the bug that fixed it, even after spending way too long with Bugzilla yesterday.

Part of the problem is that the stack traces from the crashes were never definitive (to my eyes, at least). All of them were on account of an “EXCEPTION_ACCESS_VIOLATION,” but it sometimes happened in TraceMonkey’s TraceRecorder, indicating my earlier guesses were correct, and sometimes in the JS string code, indicating not so much.

One factor that I actually forgot to mention in my last post is that the worker object running the physics simulation is actually discarded and reloaded from file every time the test level is changed or reset. In a real application, you’d want to avoid this if at all possible (depending on the browser, the script file might have to be re-parsed, re-compiled, or—serious horrors—re-downloaded). In this case, however, it was a great way to put the worker emulation code through its paces: the old, “frustrated-user-quadruple-button-click” stability test. It actually helped me find a nice little NPE that only occurred in the unlikely event that a terminate() call came after an emulated worker was created but before it actually started running.

I’m going to wildly speculate—with the very real chance that I’ve missed something on Bugzilla—that something similar was happening with Firefox. The facts:

  1. an “EXCEPTION_ACCESS_VIOLATION”
  2. the page would only crash after the worker was reloaded at least once (though often, when restoring, it would crash again immediately)
  3. the crash frequently occurred with a call to JS_MakeStringImmutable(), which, MDC tells us, “applications must call…before sharing a JSString among threads”

Thinking what I’m thinking?

The weird thing is that nothing in the complete list of changes really matches the situation. The closest I can find is Bug 547399, “Don’t let worker messages run if the worker is suspended,” but the patch doesn’t touch any of the files I was combing through, nor do the developers involved think that the bug could trigger something as serious as a crash. It does, however, prevent a possible race condition dealing with passing strings between threads, so, who knows? Maybe my crash didn’t get enough hits and the fix is just a side effect of something else.

Still, it’s a welcome development. Thanks, Mozilla.

In Their Slowness

When I wrote the Blocks test, Firefox 3.6.0 really was as fast as the second test level would indicate, though there were hints that it was losing a lot of speed due to the way it got along with TraceMonkey. What I didn’t realize was that TraceMonkey wasn’t involed at all; JIT compilation was disabled in web workers until 3.6.2 and this bug. Imagine my surprise a month later: I finally have the time to post the test and Firefox can’t even manage the IE8 level. There was some frustration.

Our clues:

  1. It’s happened before. I think. Some code just doesn’t take to the way TraceMonkey traces, so the engine needs a set of heuristics to determine when to just give up and interpret. Scripts will sometimes be misclassified, and this seems to be just such a case. There was a really interesting example of this discovered and diagnosed in front of a live audience (I love you, open development) and the bug is still open. The code involved is rather different than this case, but the connection is too nice to think too much about confirmation biases.
  2. I mentioned this before, but just disable JIT compilation and try the test. This can be done in about:config, but it’s a lot easier to just turn on the console or script panels in Firebug. While it’s not as fast as it once was (Firebug is running, after all), you’ll usually see the blocks speed up rather comically.

The really humorous part of this whole situation is that I got exactly what I wanted: a performance update rolled out in a point release. I still agree with the sentiment, and this is undoubtedly an on-average win for performance. But, seriously, can I just have my interpreter back?

[Update: Almost immediately after publishing this post, I was able to crash 3.6.4; let's just say it will crash only rarely. Notably, the crash had nothing to do with JSString and occurred, once again, in TraceRecorder.

In addition, turning off JIT compilation in about:config—originally mentioned above—will not work to get the speedier Firefox back in action. Using Firebug to accomplish the same thing only works at first (and generally stops if you reset the test), so you'll have to extrapolate from that first few seconds of speediness to imagine what once gloriously was.

There is a bug: Bug 562455, but I am a total Bugzilla noob, so expect it to be duped or something in 3...2... =]

This entry was posted in Uncategorized and tagged , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">