Bug 37292 - http://trac.webkit.org/changeset/57215 caused perf regressions
Summary: http://trac.webkit.org/changeset/57215 caused perf regressions
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Text (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P1 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-04-08 14:15 PDT by Ojan Vafai
Modified: 2010-04-28 13:50 PDT (History)
8 users (show)

See Also:


Attachments
zip of reduced perf test (6.14 MB, application/zip)
2010-04-09 16:06 PDT, Ojan Vafai
no flags Details
Patch (56.05 KB, patch)
2010-04-21 14:48 PDT, Ojan Vafai
no flags Details | Formatted Diff | Diff
Patch (55.94 KB, patch)
2010-04-28 11:35 PDT, Ojan Vafai
no flags Details | Formatted Diff | Diff
Patch (56.00 KB, patch)
2010-04-28 12:34 PDT, Ojan Vafai
adele: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ojan Vafai 2010-04-08 14:15:49 PDT
http://trac.webkit.org/changeset/57215 caused a ~6% perf regression on Mac on Chromium's page cycler intl2 test and a ~3% regression on page cycler intl2.

It caused a 2-8% memory regression on Chromium's intl1, intl2 and moz page cyclers.

I'm pretty sure Safari would see the same regressions. At the very least, Chromium uses the same font code as Safari on the Mac.

Chromium will be cutting a release very soon (in a matter of days). Is there something we can do to address this regression before then? Can we put this behind a flag until the perf/memory issues are resolved?
Comment 1 mitz 2010-04-08 14:22:21 PDT
Bug 6274 comment 52 mentions a couple of possible speed improvements, but further steps probably require more information. One important question is what fraction of the regression is due to sending more cases down the complex text code path. This can be easily tested by undoing the change to canUseGlyphCache(). Depending on the answer, the idea at the end of bug 6274 comment 25 may be worth pursuing.
Comment 2 Ojan Vafai 2010-04-08 14:32:33 PDT
(In reply to comment #1)
> Bug 6274 comment 52 mentions a couple of possible speed improvements, but
> further steps probably require more information. One important question is what
> fraction of the regression is due to sending more cases down the complex text
> code path. This can be easily tested by undoing the change to
> canUseGlyphCache(). Depending on the answer, the idea at the end of bug 6274
> comment 25 may be worth pursuing.

I would guess that's all of it. Would you be open to an experiment where we commit a change putting just that behind a flag or comment that code out? It's hard to get definitive answers testing this locally. Need to actually see it run on the bot to have confidence in the results.
Comment 3 mitz 2010-04-08 14:42:54 PDT
(In reply to comment #2)
> (In reply to comment #1)
> Would you be open to an experiment where we
> commit a change putting just that behind a flag or comment that code out? It's
> hard to get definitive answers testing this locally. Need to actually see it
> run on the bot to have confidence in the results.

Given the magnitude of the regression on the specific test, I expect that you should be able to get a good qualitative result from testing locally.
Comment 4 Ojan Vafai 2010-04-09 10:35:03 PDT
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > Would you be open to an experiment where we
> > commit a change putting just that behind a flag or comment that code out? It's
> > hard to get definitive answers testing this locally. Need to actually see it
> > run on the bot to have confidence in the results.
> 
> Given the magnitude of the regression on the specific test, I expect that you
> should be able to get a good qualitative result from testing locally.

I'm looking into this today. I'm also trying to find the specific pages in those page cyclers that are slowed down. Overall though, running these tests takes a long time and I have to run them many times to get numbers I have confidence in. It will take at least most of today.

I am not comfortable leaving a regression like this in the tree for long. Other perf regressions or improvements happen and it quickly becomes difficult to judge whether the fixes for the regression actually fully fixed it. My understanding is that WebKit has a strict no-regressions policy for PLT (am I wrong?). It seems like that should extend to cases like this.

Is there something we can do in the interim while we figure out how to fix the regression? The simplest solutions are to put this behind a flag or to rollback. I'd obviously prefer an option that didn't require one of those two, but nothing comes to mind that doesn't require getting more data and working on fixes, which seems like it will take days at least.

Anyways, I'll try to have some useful data ASAP.
Comment 5 Ojan Vafai 2010-04-09 16:05:13 PDT
OK. I've created a reduced perf test that I'll attach shortly. The page cycler runs through 30 pages and does 10 cycles. I took the tests where median_after_patch / median_before_patch > 1.1. In other words, the pages that regressed. In the test case, I load each one in an iframe 100 times.

I ran each case 5 times in Safari. Numbers are in milliseconds.

Before r57215: 106454, 105744, 102916, 101746, 103130
After r57215: 116417, 115711, 116228, 122100, 115701
After r57215 with canUseGlyphCache changes commented out: 105953, 106326, 105762, 105035, 105254

So, clearly the canUseGlyphCache changes account for all of the perf regression.
Comment 6 Ojan Vafai 2010-04-09 16:06:36 PDT
Created attachment 53010 [details]
zip of reduced perf test

To run, unzip, and run test.html. It alerts the results when it finishes.
Comment 7 mitz 2010-04-09 16:15:49 PDT
Ojan, thanks for doing this investigation! I think this is good news actually.
Comment 8 Ojan Vafai 2010-04-09 17:36:07 PDT
(In reply to comment #7)
> Ojan, thanks for doing this investigation! I think this is good news actually.

I agree. The ideas in bug 6274 seem likely to address the perf issue. Sorry to be a broken record, but, is there anything we can do in the meantime about the perf and/or memory regressions?

For the perf regression, can we just delete the canUseGlyphCache changes? Then we'll get to fixing the bug in a way that doesn't cause a perf regression without leaving a perf regression in the tree in the interim.

I don't know what to do about the memory regression. Enrica mentioned the idea of tweaking the size of the glyph cache. That seems reasonable to me, but I don't have a sense of the tradeoffs involved there.
Comment 9 Dave Hyatt 2010-04-12 10:51:59 PDT
Yeah I really should not have put r+ on the original bug.  I'll take some of the blame for that.  It was obvious that the bug was going to regress the performance of the complex text code path, and I shouldn't have let that slide.

We should probably do a surgical patch that disables this tracking without removing all the code from the tree, so that we can get the performance back.
Comment 10 Dave Hyatt 2010-04-12 10:54:01 PDT
I should have said "It was obvious performance would regress because we'd use the complex text code path more."
Comment 11 mitz 2010-04-12 10:55:22 PDT
(In reply to comment #9)
> Yeah I really should not have put r+ on the original bug.  I'll take some of
> the blame for that.  It was obvious that the bug was going to regress the
> performance of the complex text code path, and I shouldn't have let that slide.

According to Ojan’s testing, that is not correct. The complex path has not become measurably slower.

> We should probably do a surgical patch that disables this tracking without
> removing all the code from the tree, so that we can get the performance back.

I think the thinks to consider now are:
1) Add a third code path (fast w/glyph overflow) for the “stacked diacritics” range
or if that turns out to be too hard
2) #ifdef-out the change to canUseGlyphCache()
Comment 12 Enrica Casucci 2010-04-12 11:01:44 PDT
(In reply to comment #11)
> (In reply to comment #9)
> > Yeah I really should not have put r+ on the original bug.  I'll take some of
> > the blame for that.  It was obvious that the bug was going to regress the
> > performance of the complex text code path, and I shouldn't have let that slide.
> 
> According to Ojan’s testing, that is not correct. The complex path has not
> become measurably slower.
> 
> > We should probably do a surgical patch that disables this tracking without
> > removing all the code from the tree, so that we can get the performance back.
> 
> I think the thinks to consider now are:
> 1) Add a third code path (fast w/glyph overflow) for the “stacked diacritics”
> range
> or if that turns out to be too hard
> 2) #ifdef-out the change to canUseGlyphCache()

I believe there is some confusion here.
Unless Ojan tests provide a different picture of the problem, here is where I think we stand:
1. we no performance impact for the simple text path
2. we have very small impact on the complex text path (but I don't have the exact numbers)
3. where things have gotten worse is where we started using the complex path for cases where we used to use the simple one.

I will address number 3 shortly.
Comment 13 Ojan Vafai 2010-04-21 14:48:12 PDT
Created attachment 53994 [details]
Patch
Comment 14 Ojan Vafai 2010-04-21 14:54:40 PDT
(In reply to comment #12)
> (In reply to comment #11)
> > (In reply to comment #9)
> > > Yeah I really should not have put r+ on the original bug.  I'll take some of
> > > the blame for that.  It was obvious that the bug was going to regress the
> > > performance of the complex text code path, and I shouldn't have let that slide.
> > 
> > According to Ojan’s testing, that is not correct. The complex path has not
> > become measurably slower.
> > 
> Unless Ojan tests provide a different picture of the problem, here is where I
> think we stand:
> 1. we no performance impact for the simple text path
> 2. we have very small impact on the complex text path (but I don't have the
> exact numbers)
> 3. where things have gotten worse is where we started using the complex path
> for cases where we used to use the simple one.

This is my understanding as well, but I don't know this code or this patch well enough to say. One thing to note, both my testcase and the Chromium page load tests do *not* restart the browser between loads of pages. So any perf hit from loading extra font data that is then cached would not appear on either test since each page is loaded many times.

What does appear very clearly however is the memory regression. Is there anything we can do here? Can we store the bounding box only for a certain range of glyphs instead of all complex text?
Comment 15 Stephanie Lewis 2010-04-21 16:13:27 PDT
A patch for fixing the Membuster memory regression caused by 52715 is here:  http://bugs.webkit.org/show_bug.cgi?id=37936
Comment 16 Adele Peterson 2010-04-21 16:18:12 PDT
Sorry, I didn't see this bug before working on my patch.  I guess it should be a dupe of this bug.  Anyway, my patch breaks up GlyphMetricsMap into GlyphWidthMap and GlyphBoundsMap.  I'm still working on it, but maybe someone could help me test.
Comment 17 Ojan Vafai 2010-04-21 16:32:32 PDT
Seems fine to treat the perf and memory regressions as separate bugs.
Comment 18 Ojan Vafai 2010-04-25 09:01:01 PDT
Ping for the review.
Comment 19 mitz 2010-04-28 10:54:34 PDT
Comment on attachment 53994 [details]
Patch

> #define ROMAN_AND_GREEK_DIACRITICS_CAN_USE_GLYPH_CACHE 0

Isn’t this name backwards? When this is false, characters in U+1E00-U+2000 can (and will) use the glyph cache.
Comment 20 Ojan Vafai 2010-04-28 11:35:37 PDT
Created attachment 54597 [details]
Patch
Comment 21 Ojan Vafai 2010-04-28 11:36:12 PDT
(In reply to comment #19)
> (From update of attachment 53994 [details])
> > #define ROMAN_AND_GREEK_DIACRITICS_CAN_USE_GLYPH_CACHE 0
> 
> Isn’t this name backwards? When this is false, characters in U+1E00-U+2000 can
> (and will) use the glyph cache.

Heh. Whoops. Fixed.
Comment 22 Adele Peterson 2010-04-28 11:58:08 PDT
Comment on attachment 54597 [details]
Patch

I'd really like someone to take a crack at fixing the underlying problem.  Dan has agreed to look into this, but others are welcome to try as well.  Would it be possible to leave the #define set to 0 until someone tries and fails to fix this?  If it really seems unlikely we'll be able to fix the underlying problem, then we can change the default value of the #define.
Comment 23 Ojan Vafai 2010-04-28 12:05:24 PDT
(In reply to comment #22)
> (From update of attachment 54597 [details])
> I'd really like someone to take a crack at fixing the underlying problem.  Dan
> has agreed to look into this, but others are welcome to try as well.  Would it
> be possible to leave the #define set to 0 until someone tries and fails to fix
> this?  If it really seems unlikely we'll be able to fix the underlying problem,
> then we can change the default value of the #define.

It's certainly possible, but I don't think we should leave performance regressions in trunk by default. We wouldn't leave failing tests in the tree, would we? Performance and memory regressions are even worse since there is only one scale to measure them on. That means that other changes that affect performance will mask this one and it becomes nearly impossible to tell if the fix actually fixes the entirety of the regression.
Comment 24 Ojan Vafai 2010-04-28 12:12:03 PDT
Stated differently, what is the advantage to leaving it enabled?
Comment 25 Adele Peterson 2010-04-28 12:17:37 PDT
I guess I just want to make sure we still feel some pressure to do the right fix.  But that's not too important.  Various branches can twiddle the #ifdef as needed.
Comment 26 Ojan Vafai 2010-04-28 12:34:28 PDT
Created attachment 54611 [details]
Patch
Comment 27 Ojan Vafai 2010-04-28 12:37:25 PDT
(In reply to comment #25)
> I guess I just want to make sure we still feel some pressure to do the right
> fix.  But that's not too important. 

This regresses the bug fixed by r57215, so the pressure to do the right fix is to fix the original bug. Isn't this the same as leaving a failing test in the tree so that there is pressure to fix it? In that case, we'd expect the test to be fixed in a reasonable timeframe (hours, not weeks) or for the change to be reverted.

> Various branches can twiddle the #ifdef as needed.

Uploaded a new patch that's properly ifndef'ed so branches can actually twiddle. :)
Comment 28 Adele Peterson 2010-04-28 13:25:58 PDT
Comment on attachment 54611 [details]
Patch

OK sounds good.  Please update https://bugs.webkit.org/show_bug.cgi?id=6274 and we'll keep working on the performance issue.
Comment 29 Ojan Vafai 2010-04-28 13:50:32 PDT
Committed r58426: <http://trac.webkit.org/changeset/58426>