Bug 132650 - [X86] Emit BT instruction for single-bit tests.
Summary: [X86] Emit BT instruction for single-bit tests.
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Andreas Kling
URL:
Keywords:
Depends on: 132670
Blocks:
  Show dependency treegraph
 
Reported: 2014-05-07 05:19 PDT by Andreas Kling
Modified: 2014-05-07 21:12 PDT (History)
4 users (show)

See Also:


Attachments
Patch (4.81 KB, patch)
2014-05-07 05:20 PDT, Andreas Kling
msaboff: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Kling 2014-05-07 05:19:55 PDT
[X86] Emit BT instruction for single-bit tests.
Comment 1 Andreas Kling 2014-05-07 05:20:29 PDT
Created attachment 230993 [details]
Patch
Comment 2 Michael Saboff 2014-05-07 09:32:01 PDT
Comment on attachment 230993 [details]
Patch

r=me.

The logic looks fine.  Is there documentation from Intel that says use bt over test?
Comment 3 Andreas Kling 2014-05-07 16:10:03 PDT
(In reply to comment #2)
> (From update of attachment 230993 [details])
> r=me.
> 
> The logic looks fine.  Is there documentation from Intel that says use bt over test?

From Intel Technology Journal, Vol. 11, Issue 4:

"The bit test instruction bt was introduced in the i386TM processor. In some implementations, including the Intel NetBurst® micro-architecture, the instruction has a high latency. The Intel Core micro-architecture executes bt in a single cycle, when the bit base operand is a register. Therefore, the Intel C++/Fortran compiler uses the bt instruction to implement a common bit test idiom when optimizing for the Intel Core micro-architecture. The optimized code runs about 20% faster than the generic version on an Intel Core 2 Duo processor."
Comment 4 Andreas Kling 2014-05-07 16:11:41 PDT
Note that BT only clobbers the CF flag, while TEST clobbers CF, OF, SF, ZF and PF. :)
Comment 5 Michael Saboff 2014-05-07 16:14:51 PDT
(In reply to comment #3)
> (In reply to comment #2)
> > (From update of attachment 230993 [details] [details])
> > r=me.
> > 
> > The logic looks fine.  Is there documentation from Intel that says use bt over test?
> 
> From Intel Technology Journal, Vol. 11, Issue 4:
> 
> "The bit test instruction bt was introduced in the i386TM processor. In some implementations, including the Intel NetBurst® micro-architecture, the instruction has a high latency. The Intel Core micro-architecture executes bt in a single cycle, when the bit base operand is a register. Therefore, the Intel C++/Fortran compiler uses the bt instruction to implement a common bit test idiom when optimizing for the Intel Core micro-architecture. The optimized code runs about 20% faster than the generic version on an Intel Core 2 Duo processor."

Thanks for the reference.  I looked in the optimization guide and couldn't find anything.
Comment 6 Andreas Kling 2014-05-07 16:24:46 PDT
Committed r168451: <http://trac.webkit.org/changeset/168451>
Comment 7 Filip Pizlo 2014-05-07 19:36:20 PDT
(In reply to comment #5)
> (In reply to comment #3)
> > (In reply to comment #2)
> > > (From update of attachment 230993 [details] [details] [details])
> > > r=me.
> > > 
> > > The logic looks fine.  Is there documentation from Intel that says use bt over test?
> > 
> > From Intel Technology Journal, Vol. 11, Issue 4:
> > 
> > "The bit test instruction bt was introduced in the i386TM processor. In some implementations, including the Intel NetBurst® micro-architecture, the instruction has a high latency. The Intel Core micro-architecture executes bt in a single cycle, when the bit base operand is a register. Therefore, the Intel C++/Fortran compiler uses the bt instruction to implement a common bit test idiom when optimizing for the Intel Core micro-architecture. The optimized code runs about 20% faster than the generic version on an Intel Core 2 Duo processor."
> 
> Thanks for the reference.  I looked in the optimization guide and couldn't find anything.

Did you run any benchmarks?  I don't trust anything that any Intel documentation says.  It has been wrong in the past.
Comment 8 Filip Pizlo 2014-05-07 19:49:51 PDT
So, neither of the big-time compilers pattern match the BT instruction even if you tell them to target Core.  GCC appears to use and directly and relies on it leaving some bits behind while LLVM uses test.  I decided to look at what LLVM does and it appears that it only resorts to BT if the immediate is not encodable with TEST.

So, it would appear that either we've discovered something that professional compiler writers have overlooked (or, in the case of LLVM, which knows about BT and can use it, we have discovered something that professional compiler writers have misunderstood), or there is something that the Intel manual isn't revealing.  Note that those compilers are engineered to know about *every single instruction* that every possible processor might have.  They have been tuned very carefully for a long time and sometimes they do it based on cost models of those instructions.  Often, if you want to figure out which instructions to select, it's usually a good bet to just look at what they do.

In this case, it appears that BT is basically a bad idea.

In general, I don't think that "the Intel manual said so" should ever be used for a justification for a patch.
Comment 9 Filip Pizlo 2014-05-07 19:52:02 PDT
Comment on attachment 230993 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=230993&action=review

> Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:1202
> +    int singleBitIndex(unsigned mask)
> +    {
> +        switch (mask) {
> +        case 0x00000001: return 0;
> +        case 0x00000002: return 1;
> +        case 0x00000004: return 2;
> +        case 0x00000008: return 3;
> +        case 0x00000010: return 4;
> +        case 0x00000020: return 5;
> +        case 0x00000040: return 6;
> +        case 0x00000080: return 7;
> +        case 0x00000100: return 8;
> +        case 0x00000200: return 9;
> +        case 0x00000400: return 10;
> +        case 0x00000800: return 11;
> +        case 0x00001000: return 12;
> +        case 0x00002000: return 13;
> +        case 0x00004000: return 14;
> +        case 0x00008000: return 15;
> +        case 0x00010000: return 16;
> +        case 0x00020000: return 17;
> +        case 0x00040000: return 18;
> +        case 0x00080000: return 19;
> +        case 0x00100000: return 20;
> +        case 0x00200000: return 21;
> +        case 0x00400000: return 22;
> +        case 0x00800000: return 23;
> +        case 0x01000000: return 24;
> +        case 0x02000000: return 25;
> +        case 0x04000000: return 26;
> +        case 0x08000000: return 27;
> +        case 0x10000000: return 28;
> +        case 0x20000000: return 29;
> +        case 0x40000000: return 30;
> +        case 0x80000000: return 31;
> +        default: return -1;
> +        }

We have a function to count the number of set bits in an int and to compute the log2 of an int.  Why not use those instead?

> Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:1212
> +        int bitIndex = singleBitIndex(mask.m_value);
> +        if ((cond == Zero || cond == NonZero) && bitIndex != -1) {
> +            m_assembler.bt_i8r(bitIndex, reg);
> +            return Jump(m_assembler.jCC(cond == Zero ? X86Assembler::ConditionNC : X86Assembler::ConditionC));
> +        }
> +

The LLVM tuning appears to disagree with you - it will pick BT only if the immediate is both a power of two and not representable in the immediate of a TEST.  I think this will only happen if you want to quickly test one of the high 32 bits of a 64 bit integer.  Obviously, this won't happen here.

> Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:1224
> +        int bitIndex = singleBitIndex(mask.m_value);
> +        if ((cond == Zero || cond == NonZero) && bitIndex != -1) {
> +            m_assembler.bt_i8m(bitIndex, address.offset, address.base);
> +            return Jump(m_assembler.jCC(cond == Zero ? X86Assembler::ConditionNC : X86Assembler::ConditionC));
> +        }
> +

Ditto.
Comment 10 Filip Pizlo 2014-05-07 20:09:58 PDT
Another thing: it appears that BT is a bigger instruction than TEST - it requires one extra prefix byte in the opcode.

Seriously, if you have a choice between a one-opcode instruction and a two-opcode instruction and there is no evidence that the two-opcode one is better then you should use the one-opcode one.

The Intel manual does not constitute evidence.
Comment 11 Andreas Kling 2014-05-07 20:22:13 PDT
Hi Phil. I'm running jsc-benchmarks on a slower machine in the background. In the meantime..

BT will always end up 2 bytes shorter than the TEST it replaced since it uses an 8-bit immediate instead of a 32-bit one.

In addition to LLVM, I was looking at GCC and ICC code generation on gcc.godbolt.org, but I did make one fatal mistake. I was testing with this snippet:

void foo (void);

int test (int x, int n)
{
  if (x & (1 << n))
    foo ();

  return 0;
}

Which generates code using BT for -O2 -march=core2 on all three compilers. I got the snippet from http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36473 (where GCC added BT to their codegen.)

Unfortunately I forgot to test with an immediate instead of a variable argument, and I see now that for compile-time constant 1-bit masks in the 0-31 range, all three compilers generate TEST instead. Boo me.

I'm gonna let the benchmarks finish and paste the results here.
Comment 12 Filip Pizlo 2014-05-07 20:24:52 PDT
(In reply to comment #11)
> Hi Phil. I'm running jsc-benchmarks on a slower machine in the background. In the meantime..
> 
> BT will always end up 2 bytes shorter than the TEST it replaced since it uses an 8-bit immediate instead of a 32-bit one.

There is a 8-bit immediate form of TEST.

> 
> In addition to LLVM, I was looking at GCC and ICC code generation on gcc.godbolt.org, but I did make one fatal mistake. I was testing with this snippet:
> 
> void foo (void);
> 
> int test (int x, int n)
> {
>   if (x & (1 << n))
>     foo ();
> 
>   return 0;
> }
> 
> Which generates code using BT for -O2 -march=core2 on all three compilers. I got the snippet from http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36473 (where GCC added BT to their codegen.)
> 
> Unfortunately I forgot to test with an immediate instead of a variable argument, and I see now that for compile-time constant 1-bit masks in the 0-31 range, all three compilers generate TEST instead. Boo me.
> 
> I'm gonna let the benchmarks finish and paste the results here.

OK.  I would expect no performance difference.  If there is no performance difference then we should just defer to what the big compilers do.
Comment 13 Andreas Kling 2014-05-07 20:27:32 PDT
(In reply to comment #12)
> (In reply to comment #11)
> > Hi Phil. I'm running jsc-benchmarks on a slower machine in the background. In the meantime..
> > 
> > BT will always end up 2 bytes shorter than the TEST it replaced since it uses an 8-bit immediate instead of a 32-bit one.
> 
> There is a 8-bit immediate form of TEST.

Hah. You're right. And JSC tries real hard to generate it too.

> > In addition to LLVM, I was looking at GCC and ICC code generation on gcc.godbolt.org, but I did make one fatal mistake. I was testing with this snippet:
> > 
> > void foo (void);
> > 
> > int test (int x, int n)
> > {
> >   if (x & (1 << n))
> >     foo ();
> > 
> >   return 0;
> > }
> > 
> > Which generates code using BT for -O2 -march=core2 on all three compilers. I got the snippet from http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36473 (where GCC added BT to their codegen.)
> > 
> > Unfortunately I forgot to test with an immediate instead of a variable argument, and I see now that for compile-time constant 1-bit masks in the 0-31 range, all three compilers generate TEST instead. Boo me.
> > 
> > I'm gonna let the benchmarks finish and paste the results here.
> 
> OK.  I would expect no performance difference.  If there is no performance difference then we should just defer to what the big compilers do.

Definitely.
Comment 14 Andreas Kling 2014-05-07 21:06:02 PDT
Benchmark report for SunSpider, LongSpider, V8Spider, Octane, Kraken, JSRegress, and AsmBench on locals-iMac (iMac14,2).

VMs tested:
"WithoutBT" at /Volumes/Data/Source/Safari/Ref-OpenSource/WebKitBuild/Release/jsc
"WithBT" at /Volumes/Data/Source/Safari/OpenSource/WebKitBuild/Release/jsc

Collected 4 samples per benchmark/VM, with 4 VM invocations per benchmark. Emitted a call to gc() between sample measurements.
Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level
timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds.

                                                        WithoutBT                   WithBT                                      
SunSpider:
   3d-cube                                            3.8618+-0.3228     ?      3.8717+-0.4712        ?
   3d-morph                                           4.7336+-0.1498            4.6899+-0.1356        
   3d-raytrace                                        4.8748+-0.3113     ?      4.9585+-0.2855        ? might be 1.0172x slower
   access-binary-trees                                1.4155+-0.2128            1.3615+-0.0389          might be 1.0397x faster
   access-fannkuch                                    4.4414+-0.0878     ?      4.5675+-0.2150        ? might be 1.0284x slower
   access-nbody                                       2.2466+-0.0433     ?      2.2950+-0.1750        ? might be 1.0215x slower
   access-nsieve                                      3.0829+-0.0559     ?      3.1282+-0.1827        ? might be 1.0147x slower
   bitops-3bit-bits-in-byte                           1.2347+-0.0768     ?      1.2832+-0.2258        ? might be 1.0393x slower
   bitops-bits-in-byte                                2.2117+-0.1158            2.2071+-0.1079        
   bitops-bitwise-and                                 1.9141+-0.0362     ?      1.9182+-0.0586        ?
   bitops-nsieve-bits                                 3.0856+-0.0882     ?      3.0890+-0.1131        ?
   controlflow-recursive                              1.4973+-0.0714            1.4958+-0.0937        
   crypto-aes                                         3.1792+-0.0513     ?      3.1852+-0.0432        ?
   crypto-md5                                         1.7664+-0.1233            1.7493+-0.1290        
   crypto-sha1                                        1.9402+-0.0816            1.8497+-0.0111          might be 1.0490x faster
   date-format-tofte                                  5.8865+-0.1886     ?      7.2452+-3.8198        ? might be 1.2308x slower
   date-format-xparb                                  5.0603+-0.1396            4.8799+-0.1556          might be 1.0370x faster
   math-cordic                                        2.4778+-0.0507            2.4227+-0.0787          might be 1.0228x faster
   math-partial-sums                                  4.3050+-0.0610     ?      4.4823+-0.1742        ? might be 1.0412x slower
   math-spectral-norm                                 1.4714+-0.0216            1.4637+-0.0517        
   regexp-dna                                         6.2526+-0.1737            6.1306+-0.1809          might be 1.0199x faster
   string-base64                                      3.5334+-0.2921            3.4504+-0.1135          might be 1.0240x faster
   string-fasta                                       5.6995+-0.1165     ?      5.7885+-0.0988        ? might be 1.0156x slower
   string-tagcloud                                    8.3417+-0.1268            8.2930+-0.1537        
   string-unpack-code                                18.4899+-0.7973     ?     18.6165+-0.9547        ?
   string-validate-input                              4.0388+-0.1404            4.0292+-0.1176        

   <arithmetic> *                                     4.1170+-0.0473     ?      4.1712+-0.1430        ? might be 1.0132x slower
   <geometric>                                        3.3258+-0.0302     ?      3.3432+-0.0482        ? might be 1.0052x slower
   <harmonic>                                         2.8016+-0.0476            2.8001+-0.0427          might be 1.0005x faster

                                                        WithoutBT                   WithBT                                      
LongSpider:
   3d-cube                                         1233.4095+-11.3659    ?   1253.3698+-40.1243       ? might be 1.0162x slower
   3d-morph                                         741.4250+-27.4430         727.5229+-2.3772          might be 1.0191x faster
   3d-raytrace                                      765.6695+-6.0680          759.1818+-6.9697        
   access-binary-trees                              916.7892+-12.5503         906.1600+-5.9240          might be 1.0117x faster
   access-fannkuch                                  340.1515+-14.4538    ?    347.3749+-11.1170       ? might be 1.0212x slower
   access-nbody                                     738.1696+-17.3423    ?    760.7197+-47.0308       ? might be 1.0305x slower
   access-nsieve                                    931.6458+-12.5809    ?   1016.2460+-258.9499      ? might be 1.0908x slower
   bitops-3bit-bits-in-byte                          85.7283+-4.8629     ?     85.7570+-4.4779        ?
   bitops-bits-in-byte                              135.4017+-2.2564     ?    136.0309+-12.2223       ?
   bitops-nsieve-bits                               611.3268+-15.0354         608.5967+-10.1367       
   controlflow-recursive                            384.7675+-8.6854     ?    388.6284+-13.3114       ? might be 1.0100x slower
   crypto-aes                                       886.7181+-7.7085     ?    886.8901+-5.0394        ?
   crypto-md5                                       669.3856+-26.4656         664.0869+-6.8129        
   crypto-sha1                                      852.5552+-13.2281    ?    864.0634+-25.9422       ? might be 1.0135x slower
   date-format-tofte                                575.9711+-7.3666     ?    585.4905+-19.7535       ? might be 1.0165x slower
   date-format-xparb                                869.0757+-14.4675         859.2290+-11.2992         might be 1.0115x faster
   math-cordic                                      904.5382+-8.9082          903.4843+-13.9417       
   math-partial-sums                                545.4088+-9.0132     ?    546.7880+-1.3511        ?
   math-spectral-norm                               870.8215+-9.0842          858.9225+-8.2623          might be 1.0139x faster
   string-base64                                    334.2463+-0.8221     ?    334.5509+-2.8044        ?
   string-fasta                                     576.7892+-12.4559         557.0745+-10.5421         might be 1.0354x faster
   string-tagcloud                                  214.0127+-2.0530     ?    216.3233+-10.8192       ? might be 1.0108x slower

   <arithmetic>                                     644.7276+-1.7497     ?    648.4769+-12.9180       ? might be 1.0058x slower
   <geometric> *                                    548.6369+-2.1365     ?    550.8949+-4.9976        ? might be 1.0041x slower
   <harmonic>                                       412.0620+-5.1406     ?    413.3966+-3.7046        ? might be 1.0032x slower

                                                        WithoutBT                   WithBT                                      
V8Spider:
   crypto                                            40.2859+-0.7060     ?     40.6218+-1.1536        ?
   deltablue                                         53.4686+-2.1510     ?     55.6558+-6.8348        ? might be 1.0409x slower
   earley-boyer                                      35.8690+-0.2621     ?     36.2894+-1.3433        ? might be 1.0117x slower
   raytrace                                          21.0439+-1.0350     ?     22.0540+-2.7833        ? might be 1.0480x slower
   regexp                                            51.2761+-0.4551           50.9445+-0.5523        
   richards                                          58.1104+-0.7669     ?     59.4163+-2.3087        ? might be 1.0225x slower
   splay                                             28.8427+-1.1775           28.8246+-1.4657        

   <arithmetic>                                      41.2709+-0.4303     ?     41.9723+-1.0863        ? might be 1.0170x slower
   <geometric> *                                     39.0966+-0.5156     ?     39.7623+-1.0809        ? might be 1.0170x slower
   <harmonic>                                        36.7740+-0.6788     ?     37.4516+-1.2412        ? might be 1.0184x slower

                                                        WithoutBT                   WithBT                                      
Octane:
   encrypt                                           0.23290+-0.00210    !     0.23882+-0.00202       ! definitely 1.0254x slower
   decrypt                                           4.47660+-0.11228    ?     4.48817+-0.13490       ?
   deltablue                                x2       0.28874+-0.00434    ?     0.29091+-0.00125       ?
   earley                                            0.46939+-0.00614    ?     0.47122+-0.00557       ?
   boyer                                             5.84523+-0.16830    ?     5.98271+-0.14127       ? might be 1.0235x slower
   navier-stokes                            x2       6.50766+-0.22653          6.33958+-0.01118         might be 1.0265x faster
   raytrace                                 x2       1.88956+-0.03615    ?     1.93935+-0.13718       ? might be 1.0264x slower
   richards                                 x2       0.15715+-0.00077    !     0.16161+-0.00215       ! definitely 1.0284x slower
   splay                                    x2       0.40737+-0.00817          0.39903+-0.01484         might be 1.0209x faster
   regexp                                   x2      39.24923+-0.55669    ?    39.49937+-0.79358       ?
   pdfjs                                    x2      49.89777+-0.62815    ?    57.38340+-23.58802      ? might be 1.1500x slower
   mandreel                                 x2      76.35007+-1.36543    ?    77.09823+-0.78193       ?
   gbemu                                    x2      36.48157+-1.44080    ?    36.73093+-0.65766       ?
   closure                                           0.48830+-0.00459    ?     0.48972+-0.00176       ?
   jquery                                            5.82282+-0.07320    ?     5.90402+-0.16880       ? might be 1.0139x slower
   box2d                                    x2      12.35646+-0.34995         12.33273+-0.21968       
   zlib                                     x2     512.80497+-15.97694   ?   516.73765+-18.50990      ?
   typescript                               x2     574.16754+-9.84969    ?   579.11633+-24.49226      ?

   <arithmetic>                                     87.94838+-1.29661    ?    89.12110+-2.48256       ? might be 1.0133x slower
   <geometric> *                                     7.48049+-0.04648    ?     7.58400+-0.18156       ? might be 1.0138x slower
   <harmonic>                                        0.84961+-0.00400    !     0.86056+-0.00286       ! definitely 1.0129x slower

                                                        WithoutBT                   WithBT                                      
Kraken:
   ai-astar                                          247.752+-7.939      ?     248.547+-2.499         ?
   audio-beat-detection                              112.494+-0.187      ?     112.762+-1.634         ?
   audio-dft                                         144.443+-11.872           143.542+-1.914         
   audio-fft                                          67.710+-0.130             67.406+-0.221         
   audio-oscillator                                  141.561+-7.109            140.845+-1.927         
   imaging-darkroom                                  146.387+-0.528      ?     148.050+-1.902         ? might be 1.0114x slower
   imaging-desaturate                                 80.594+-1.448      ?      80.683+-1.630         ?
   imaging-gaussian-blur                             163.158+-4.679      ?     163.566+-11.871        ?
   json-parse-financial                               38.503+-1.343             37.966+-0.089           might be 1.0142x faster
   json-stringify-tinderbox                           53.159+-1.567      ?      54.250+-2.986         ? might be 1.0205x slower
   stanford-crypto-aes                                45.046+-1.236      ?      46.167+-5.939         ? might be 1.0249x slower
   stanford-crypto-ccm                                44.463+-6.681      ?      47.611+-7.063         ? might be 1.0708x slower
   stanford-crypto-pbkdf2                            129.166+-3.487      ?     130.468+-3.760         ? might be 1.0101x slower
   stanford-crypto-sha256-iterative                   47.244+-0.394      ?      48.188+-2.022         ? might be 1.0200x slower

   <arithmetic> *                                    104.406+-0.568      ?     105.004+-1.334         ? might be 1.0057x slower
   <geometric>                                        88.640+-0.561      ?      89.480+-1.698         ? might be 1.0095x slower
   <harmonic>                                         75.429+-0.917      ?      76.435+-2.080         ? might be 1.0133x slower

                                                        WithoutBT                   WithBT                                      
JSRegress:
   adapt-to-double-divide                            16.8749+-0.3478     ?     17.0203+-0.7817        ?
   aliased-arguments-getbyval                         0.6537+-0.0267            0.6158+-0.0504          might be 1.0617x faster
   allocate-big-object                                1.8135+-0.0744     ?      1.8187+-0.0972        ?
   arity-mismatch-inlining                            0.6396+-0.0393            0.6182+-0.0588          might be 1.0346x faster
   array-access-polymorphic-structure                 5.7628+-0.3277     ?      5.7760+-0.0780        ?
   array-nonarray-polymorhpic-access                 23.9935+-0.7104           23.7007+-0.4905          might be 1.0124x faster
   array-prototype-every                             60.9655+-1.0479     ?     62.1391+-1.7703        ? might be 1.0192x slower
   array-prototype-forEach                           60.9336+-0.2432     ?     62.1194+-2.2588        ? might be 1.0195x slower
   array-prototype-map                               75.3367+-1.8058           75.1887+-0.2945        
   array-prototype-some                              60.2330+-0.1637     ?     61.2446+-0.9690        ? might be 1.0168x slower
   array-with-double-add                              3.0668+-0.0147     ?      3.0843+-0.1094        ?
   array-with-double-increment                        2.3682+-0.1052     ?      2.4139+-0.2878        ? might be 1.0193x slower
   array-with-double-mul-add                          3.5223+-0.2210            3.4009+-0.0387          might be 1.0357x faster
   array-with-double-sum                              2.9102+-0.0923     ?      2.9844+-0.2202        ? might be 1.0255x slower
   array-with-int32-add-sub                           5.4126+-0.1816            5.3853+-0.0529        
   array-with-int32-or-double-sum                     2.9734+-0.0501            2.9376+-0.0703          might be 1.0122x faster
   ArrayBuffer-DataView-alloc-large-long-lived   
                                                     59.5416+-1.3383           59.4380+-1.0794        
   ArrayBuffer-DataView-alloc-long-lived             17.5625+-0.3735           17.5107+-0.5725        
   ArrayBuffer-Int32Array-byteOffset                  3.2175+-0.4383            3.1110+-0.1813          might be 1.0342x faster
   ArrayBuffer-Int8Array-alloc-large-long-lived   
                                                     58.9164+-1.9242     ?     59.5763+-1.1377        ? might be 1.0112x slower
   ArrayBuffer-Int8Array-alloc-long-lived-buffer   
                                                     27.4478+-0.3950     ?     27.7172+-0.4536        ?
   ArrayBuffer-Int8Array-alloc-long-lived            16.5187+-0.3092     ?     16.8021+-0.3588        ? might be 1.0172x slower
   ArrayBuffer-Int8Array-alloc                       15.1373+-0.7507     ?     15.2064+-0.6375        ?
   asmjs_bool_bug                                     5.0901+-0.1368            5.0353+-0.1861          might be 1.0109x faster
   assign-custom-setter-polymorphic                   2.3589+-0.0777     ?      2.3894+-0.0526        ? might be 1.0129x slower
   assign-custom-setter                               3.1022+-0.1024     ?      3.1182+-0.1303        ?
   basic-set                                          9.1317+-1.2772            8.7687+-0.2894          might be 1.0414x faster
   big-int-mul                                        2.9904+-0.1940            2.9700+-0.1380        
   boolean-test                                       2.6247+-0.2063            2.5182+-0.1577          might be 1.0423x faster
   branch-fold                                        3.2577+-0.0550     ?      3.3245+-0.0925        ? might be 1.0205x slower
   by-val-generic                                     7.3796+-0.1543     ?      7.4485+-0.0709        ?
   call-spread-apply                                 12.1317+-0.3349     ?     12.1882+-0.1723        ?
   call-spread-call                                   4.8375+-0.0777            4.8325+-0.0795        
   captured-assignments                               0.3293+-0.0250            0.3065+-0.0194          might be 1.0745x faster
   cast-int-to-double                                 8.2139+-0.1624     ?      8.2618+-0.2152        ?
   cell-argument                                     10.3142+-0.2305           10.2570+-0.0835        
   cfg-simplify                                       2.5598+-0.2526            2.4996+-0.2191          might be 1.0241x faster
   chain-getter-access                               19.5402+-0.7694     ?     21.0107+-4.2560        ? might be 1.0753x slower
   cmpeq-obj-to-obj-other                             7.3638+-0.3202            7.3226+-0.1575        
   constant-test                                      4.1171+-0.0626     ?      4.1469+-0.1105        ?
   DataView-custom-properties                        63.4835+-1.2854           63.2479+-1.3707        
   delay-tear-off-arguments-strictmode                2.1216+-0.0992            2.0983+-0.0798          might be 1.0111x faster
   destructuring-arguments                            5.0398+-2.2027            4.5248+-0.5397          might be 1.1138x faster
   destructuring-swap                                 4.1785+-0.1576     ?      4.2252+-0.1212        ? might be 1.0112x slower
   direct-arguments-getbyval                          0.5582+-0.0244     ?      0.5785+-0.0355        ? might be 1.0364x slower
   double-get-by-val-out-of-bounds                    3.3326+-0.0797     ?      3.3958+-0.2144        ? might be 1.0190x slower
   double-pollution-getbyval                          7.9317+-0.1571     ?      8.0587+-0.6736        ? might be 1.0160x slower
   double-pollution-putbyoffset                       3.3505+-0.0502            3.3494+-0.1109        
   double-to-int32-typed-array-no-inline              1.6681+-0.0728     ?      1.7090+-0.0957        ? might be 1.0245x slower
   double-to-int32-typed-array                        1.3768+-0.0319     ?      1.3882+-0.0640        ?
   double-to-uint32-typed-array-no-inline             1.7084+-0.0609     ?      1.7646+-0.0533        ? might be 1.0329x slower
   double-to-uint32-typed-array                       1.4252+-0.0356     ?      1.4645+-0.0627        ? might be 1.0276x slower
   empty-string-plus-int                              5.8543+-0.2353     ?      5.8599+-0.1637        ?
   emscripten-cube2hash                              24.1876+-0.6467     ?     24.5358+-0.8620        ? might be 1.0144x slower
   external-arguments-getbyval                        1.1234+-0.0364     ?      1.1275+-0.0489        ?
   external-arguments-putbyval                        1.5861+-0.0376            1.5779+-0.1227        
   fixed-typed-array-storage-var-index                1.0112+-0.0393     ?      1.0138+-0.0228        ?
   fixed-typed-array-storage                          0.6431+-0.0435     ?      0.6665+-0.0386        ? might be 1.0364x slower
   Float32Array-matrix-mult                           3.8936+-0.0388     ?      3.9861+-0.4784        ? might be 1.0238x slower
   Float32Array-to-Float64Array-set                  47.2681+-4.1753           46.2629+-0.9537          might be 1.0217x faster
   Float64Array-alloc-long-lived                     62.4034+-1.5670           62.1289+-0.7957        
   Float64Array-to-Int16Array-set                    54.9105+-1.3133     ?     55.0078+-0.2631        ?
   fold-double-to-int                                11.4935+-0.0516     ?     11.8701+-1.0125        ? might be 1.0328x slower
   for-of-iterate-array-entries                       5.5264+-0.1621     ?      5.6089+-0.5281        ? might be 1.0149x slower
   for-of-iterate-array-keys                          2.1867+-0.1355     ?      2.3857+-0.4778        ? might be 1.0910x slower
   for-of-iterate-array-values                        1.9847+-0.0625     ?      2.0040+-0.0976        ?
   fround                                            22.5043+-0.3450           22.3303+-0.3655        
   function-dot-apply                                 1.1051+-0.2472            1.0642+-0.0609          might be 1.0384x faster
   function-test                                      2.6726+-0.0653            2.6379+-0.0750          might be 1.0131x faster
   function-with-eval                                16.8524+-0.4956           16.6487+-0.3545          might be 1.0122x faster
   get-by-id-chain-from-try-block                     5.5303+-0.0536     ?      5.6227+-0.1750        ? might be 1.0167x slower
   get-by-id-proto-or-self                           11.2628+-0.6030           11.1470+-0.6664          might be 1.0104x faster
   get-by-id-self-or-proto                           11.3000+-0.2177     ?     11.3342+-0.3082        ?
   get-by-val-out-of-bounds                           3.1969+-0.0392            3.1866+-0.0619        
   get_callee_monomorphic                             2.7919+-0.0911     ?      2.7993+-0.0988        ?
   get_callee_polymorphic                             2.7815+-0.1871            2.6780+-0.0679          might be 1.0387x faster
   getter                                            10.5140+-0.1731     ?     10.6976+-0.2849        ? might be 1.0175x slower
   global-var-const-infer-fire-from-opt               0.6813+-0.1792     ?      0.7118+-0.0319        ? might be 1.0447x slower
   global-var-const-infer                             0.5468+-0.0490     ?      0.5789+-0.1194        ? might be 1.0588x slower
   HashMap-put-get-iterate-keys                      21.2709+-0.4996     ?     21.3666+-0.2132        ?
   HashMap-put-get-iterate                           21.2148+-1.3146     ?     21.2971+-0.8254        ?
   HashMap-string-put-get-iterate                    25.7957+-0.6637           25.6835+-0.4569        
   imul-double-only                                   9.3182+-0.0438            9.2085+-0.1093          might be 1.0119x faster
   imul-int-only                                      8.8632+-0.2467     ?      8.9577+-0.2918        ? might be 1.0107x slower
   imul-mixed                                        11.8339+-0.1459           11.7974+-0.3412        
   in-four-cases                                     12.1102+-0.0464     ?     12.2891+-0.6639        ? might be 1.0148x slower
   in-one-case-false                                  6.3717+-0.1432     ?      6.3901+-0.1092        ?
   in-one-case-true                                   6.3008+-0.1341     ?      6.4802+-0.2394        ? might be 1.0285x slower
   in-two-cases                                       6.7437+-0.3053            6.5772+-0.1301          might be 1.0253x faster
   indexed-properties-in-objects                      2.4080+-0.0527     ?      2.4578+-0.0999        ? might be 1.0207x slower
   infer-closure-const-then-mov-no-inline             2.7401+-0.0363            2.7217+-0.0521        
   infer-closure-const-then-mov                      18.2510+-1.3620           17.4486+-0.4636          might be 1.0460x faster
   infer-closure-const-then-put-to-scope-no-inline   
                                                     10.4136+-0.5281           10.2925+-0.1459          might be 1.0118x faster
   infer-closure-const-then-put-to-scope             21.2559+-0.1137     ?     21.9273+-1.2916        ? might be 1.0316x slower
   infer-closure-const-then-reenter-no-inline   
                                                     48.6475+-0.4516     ?     48.8307+-0.8195        ?
   infer-closure-const-then-reenter                  21.8364+-0.6542           21.3676+-0.2367          might be 1.0219x faster
   infer-one-time-closure-ten-vars                   16.9810+-0.2948     ?     17.6420+-0.8652        ? might be 1.0389x slower
   infer-one-time-closure-two-vars                   17.1103+-0.5771     ?     17.1935+-0.0337        ?
   infer-one-time-closure                            16.8564+-0.2188     !     17.4272+-0.2913        ! definitely 1.0339x slower
   infer-one-time-deep-closure                       34.2043+-1.6270           34.0645+-0.8032        
   inline-arguments-access                            0.9550+-0.0741     ?      0.9711+-0.0556        ? might be 1.0169x slower
   inline-arguments-aliased-access                    1.0683+-0.0491     ?      1.0773+-0.0635        ?
   inline-arguments-local-escape                     11.1242+-0.1217           10.8904+-0.3473          might be 1.0215x faster
   inline-get-scoped-var                              3.9630+-0.1105     ?      4.0118+-0.0897        ? might be 1.0123x slower
   inlined-put-by-id-transition                       8.7558+-0.1617     ?      8.8181+-0.4925        ?
   int-or-other-abs-then-get-by-val                   5.1650+-0.1496            5.0964+-0.0715          might be 1.0134x faster
   int-or-other-abs-zero-then-get-by-val             17.4343+-0.3743           17.1487+-0.3421          might be 1.0167x faster
   int-or-other-add-then-get-by-val                   6.6006+-0.3991     ?      6.6057+-0.1506        ?
   int-or-other-add                                   6.1335+-0.0362     !      6.2252+-0.0407        ! definitely 1.0149x slower
   int-or-other-div-then-get-by-val                   4.1525+-0.1009     ?      4.5359+-1.2117        ? might be 1.0923x slower
   int-or-other-max-then-get-by-val                   4.5270+-0.0592     ?      4.6199+-0.1652        ? might be 1.0205x slower
   int-or-other-min-then-get-by-val                   4.7253+-0.0658     ?      4.8945+-0.4785        ? might be 1.0358x slower
   int-or-other-mod-then-get-by-val                   3.8647+-0.0481     ?      3.8676+-0.0776        ?
   int-or-other-mul-then-get-by-val                   4.2692+-0.2850            4.1349+-0.0897          might be 1.0325x faster
   int-or-other-neg-then-get-by-val                   4.8512+-0.0716     ?      4.8620+-0.0756        ?
   int-or-other-neg-zero-then-get-by-val             17.0445+-0.6193     ?     17.1204+-0.8477        ?
   int-or-other-sub-then-get-by-val                   6.8100+-0.1849            6.7993+-0.2825        
   int-or-other-sub                                   5.3719+-0.1693     ?      5.4512+-0.0418        ? might be 1.0148x slower
   int-overflow-local                                 3.6978+-0.0868     ?      3.9538+-0.5597        ? might be 1.0692x slower
   Int16Array-alloc-long-lived                       45.3895+-1.8998           44.7217+-0.1982          might be 1.0149x faster
   Int16Array-bubble-sort-with-byteLength            17.9753+-1.9966     ?     19.4047+-0.7376        ? might be 1.0795x slower
   Int16Array-bubble-sort                            16.3075+-0.1436     ?     16.3254+-0.3998        ?
   Int16Array-load-int-mul                            1.1965+-0.0420     ?      1.2754+-0.1717        ? might be 1.0659x slower
   Int16Array-to-Int32Array-set                      43.2715+-1.2176           42.6219+-0.8450          might be 1.0152x faster
   Int32Array-alloc-large                            12.7415+-1.3867           12.6401+-0.8268        
   Int32Array-alloc-long-lived                       49.8883+-0.3697     ?     50.0630+-0.2130        ?
   Int32Array-alloc                                   2.4424+-0.0594     ?      2.4855+-0.0960        ? might be 1.0176x slower
   Int32Array-Int8Array-view-alloc                    8.4851+-0.5946            8.3391+-0.2479          might be 1.0175x faster
   int52-spill                                        6.5042+-0.1587     ?      6.6044+-0.2007        ? might be 1.0154x slower
   Int8Array-alloc-long-lived                        40.8506+-0.8158           40.7183+-0.6195        
   Int8Array-load-with-byteLength                     3.1348+-0.0431            3.1196+-0.0197        
   Int8Array-load                                     3.1297+-0.0665            3.1229+-0.0654        
   integer-divide                                    10.0457+-0.3361     ?     10.2057+-0.4906        ? might be 1.0159x slower
   integer-modulo                                     1.3184+-0.1003            1.3168+-0.0482        
   large-int-captured                                 5.4174+-0.2135     ?      5.4364+-0.1193        ?
   large-int-neg                                     14.0964+-0.1772     ?     14.6466+-1.6208        ? might be 1.0390x slower
   large-int                                         13.1874+-0.3276           13.1515+-0.3709        
   logical-not                                        4.0453+-0.5646            3.8853+-0.1481          might be 1.0412x faster
   lots-of-fields                                     6.6533+-0.1082     ?      6.8633+-0.6718        ? might be 1.0316x slower
   make-indexed-storage                               2.2798+-0.0716            2.0951+-0.3863          might be 1.0882x faster
   make-rope-cse                                      3.7122+-0.1525            3.6635+-0.1045          might be 1.0133x faster
   marsaglia-larger-ints                             65.3561+-1.9722     ?     65.7378+-1.0200        ?
   marsaglia-osr-entry                               28.7213+-0.9645           28.5740+-0.5672        
   method-on-number                                  17.7003+-0.6576           17.4322+-0.2939          might be 1.0154x faster
   misc-strict-eq                                    37.1450+-1.3845     ?     37.9603+-0.3439        ? might be 1.0219x slower
   negative-zero-divide                               0.2499+-0.0143     ?      0.2610+-0.0328        ? might be 1.0444x slower
   negative-zero-modulo                               0.2535+-0.0205            0.2505+-0.0099          might be 1.0119x faster
   negative-zero-negate                               0.2707+-0.0592            0.2560+-0.0397          might be 1.0572x faster
   nested-function-parsing                           22.6801+-0.9370           21.7504+-0.7935          might be 1.0427x faster
   new-array-buffer-dead                              2.6555+-0.3041            2.6259+-0.1047          might be 1.0113x faster
   new-array-buffer-push                              6.2847+-0.1264            6.2371+-0.1371        
   new-array-dead                                    19.3544+-0.3096           19.2313+-0.7646        
   new-array-push                                     4.1797+-0.1613     ?      4.2916+-0.0853        ? might be 1.0268x slower
   number-test                                        2.4422+-0.0393     ?      2.5016+-0.1472        ? might be 1.0243x slower
   object-closure-call                                4.5679+-0.3652            4.5142+-0.1235          might be 1.0119x faster
   object-test                                        2.5947+-0.0619            2.5651+-0.0743          might be 1.0116x faster
   poly-stricteq                                     45.2128+-1.6479           44.3875+-1.8210          might be 1.0186x faster
   polymorphic-array-call                             1.3683+-0.0965            1.3183+-0.0297          might be 1.0379x faster
   polymorphic-get-by-id                              2.5158+-0.1022            2.4898+-0.0680          might be 1.0104x faster
   polymorphic-put-by-id                             41.5616+-49.3263    ?     58.5696+-58.4548       ? might be 1.4092x slower
   polymorphic-structure                             13.1450+-0.3808     ?     13.2355+-0.2921        ?
   polyvariant-monomorphic-get-by-id                  4.9765+-0.1883            4.8690+-0.0827          might be 1.0221x faster
   proto-getter-access                               19.4456+-0.4488     ?     19.4763+-0.2443        ?
   put-by-id                                         11.9398+-0.1511     ?     12.0521+-0.5963        ?
   put-by-val-large-index-blank-indexing-type   
                                                      5.8743+-0.0572     ?      6.0046+-0.2782        ? might be 1.0222x slower
   put-by-val-machine-int                             1.9415+-0.0820     ?      1.9819+-0.0261        ? might be 1.0208x slower
   rare-osr-exit-on-local                            12.8328+-0.4133           12.6913+-0.3078          might be 1.0112x faster
   register-pressure-from-osr                        15.8768+-0.2107     ?     15.9279+-0.2688        ?
   setter                                            10.6903+-0.2440     ?     10.7468+-0.3192        ?
   simple-activation-demo                            23.5605+-0.5218           23.4045+-0.4602        
   simple-getter-access                              31.9802+-2.4350           31.5043+-0.7733          might be 1.0151x faster
   slow-array-profile-convergence                     2.2819+-0.2244     ?      2.2859+-0.0866        ?
   slow-convergence                                   2.4819+-0.0562     ?      2.5894+-0.1911        ? might be 1.0433x slower
   sparse-conditional                                 0.9025+-0.1286     ?      0.9067+-0.0637        ?
   splice-to-remove                                  34.7090+-1.0254     ?     35.1881+-1.5913        ? might be 1.0138x slower
   string-char-code-at                               12.7178+-1.0729     ?     13.5532+-0.5287        ? might be 1.0657x slower
   string-concat-object                               1.8017+-0.0376     ?      1.9086+-0.3089        ? might be 1.0594x slower
   string-concat-pair-object                          1.7284+-0.0443     ?      1.7340+-0.0454        ?
   string-concat-pair-simple                         10.0825+-0.0947     ?     10.5920+-1.1789        ? might be 1.0505x slower
   string-concat-simple                              10.7007+-0.9321           10.3337+-0.1551          might be 1.0355x faster
   string-cons-repeat                                 6.6406+-0.0839     ?      6.6747+-0.1166        ?
   string-cons-tower                                  6.8616+-0.1788            6.8306+-0.1878        
   string-equality                                   24.5867+-0.3643           24.4832+-1.6380        
   string-get-by-val-big-char                         7.1893+-0.1049            7.0750+-0.1196          might be 1.0162x faster
   string-get-by-val-out-of-bounds-insane             3.1706+-0.0713     ?      3.2250+-0.0501        ? might be 1.0172x slower
   string-get-by-val-out-of-bounds                    2.6890+-0.0451     ?      2.7777+-0.0818        ? might be 1.0330x slower
   string-get-by-val                                  2.4312+-0.0622     ?      2.4323+-0.0076        ?
   string-hash                                        1.5512+-0.0271     ?      1.5786+-0.0337        ? might be 1.0176x slower
   string-long-ident-equality                        21.9065+-0.8889           21.6010+-0.4800          might be 1.0141x faster
   string-repeat-arith                               25.8989+-1.3851           25.8905+-1.1841        
   string-sub                                        50.5704+-1.2057     ?     50.8372+-0.4374        ?
   string-test                                        2.3288+-0.0712            2.2717+-0.0281          might be 1.0251x faster
   string-var-equality                               36.8400+-7.3311           34.3920+-0.6728          might be 1.0712x faster
   structure-hoist-over-transitions                   1.9705+-0.0168     ?      1.9705+-0.0354        ?
   switch-char-constant                               2.0526+-0.0831     ?      2.0605+-0.0338        ?
   switch-char                                        4.5370+-0.0409     ?      4.5388+-0.0767        ?
   switch-constant                                    6.4991+-0.1493     ?      6.5764+-0.1101        ? might be 1.0119x slower
   switch-string-basic-big-var                       12.2833+-0.7003           12.2460+-0.2190        
   switch-string-basic-big                           12.8124+-0.3156           12.6300+-0.1919          might be 1.0144x faster
   switch-string-basic-var                           11.9585+-0.1198     ?     12.0510+-0.2383        ?
   switch-string-basic                               12.5106+-0.8789           11.8444+-0.0804          might be 1.0563x faster
   switch-string-big-length-tower-var                18.0536+-1.1105           17.3878+-0.4741          might be 1.0383x faster
   switch-string-length-tower-var                    12.4534+-0.1587     ?     12.6730+-1.1704        ? might be 1.0176x slower
   switch-string-length-tower                        11.1302+-0.0723     ?     11.2511+-0.1786        ? might be 1.0109x slower
   switch-string-short                               11.0715+-0.0798     ?     11.2012+-0.2591        ? might be 1.0117x slower
   switch                                             9.6192+-0.1459     ?      9.6927+-0.0503        ?
   tear-off-arguments-simple                          1.5692+-0.2965            1.4616+-0.0286          might be 1.0736x faster
   tear-off-arguments                                 2.3118+-0.1417            2.2860+-0.0390          might be 1.0113x faster
   temporal-structure                                11.5048+-0.3431     ?     11.6188+-0.2192        ?
   to-int32-boolean                                  11.4911+-0.1208     ?     11.6115+-0.2057        ? might be 1.0105x slower
   undefined-test                                     2.4951+-0.0878            2.4440+-0.0289          might be 1.0209x faster
   unprofiled-licm                                   32.7851+-0.9030     ?     32.9747+-0.2451        ?
   weird-inlining-const-prop                          1.3372+-0.0895     ?      1.3865+-0.1259        ? might be 1.0369x slower

   <arithmetic>                                      12.8363+-0.1977     ?     12.9325+-0.2726        ? might be 1.0075x slower
   <geometric> *                                      6.5808+-0.0440     ?      6.6054+-0.0271        ? might be 1.0037x slower
   <harmonic>                                         3.0357+-0.0627     ?      3.0359+-0.0389        ? might be 1.0001x slower

                                                        WithoutBT                   WithBT                                      
AsmBench:
   bigfib.cpp                                       904.8443+-16.4908    ?    920.8026+-19.2952       ? might be 1.0176x slower
   cray.c                                           518.1676+-9.0886     ?    521.0995+-5.9082        ?
   dry.c                                            821.8760+-22.7555         815.6345+-14.5357       
   FloatMM.c                                       1414.0047+-23.0696    ?   1434.0497+-18.9260       ? might be 1.0142x slower
   gcc-loops.cpp                                   8355.7632+-79.4811    ?   8450.5498+-127.5787      ? might be 1.0113x slower
   n-body.c                                        1454.5997+-27.5138    ?   1457.6918+-20.9632       ?
   Quicksort.c                                      729.0630+-11.6187    ?    730.0959+-13.2260       ?
   stepanov_container.cpp                          4540.4850+-79.5394    ?   4553.9360+-34.2360       ?
   Towers.c                                         392.7635+-27.2796         384.2907+-5.8973          might be 1.0220x faster

   <arithmetic>                                    2125.7297+-14.6593    ?   2140.9056+-8.1701        ? might be 1.0071x slower
   <geometric> *                                   1270.1096+-6.6314     ?   1273.8272+-4.0344        ? might be 1.0029x slower
   <harmonic>                                       908.3719+-12.2412         906.8104+-3.6583          might be 1.0017x faster

                                                        WithoutBT                   WithBT                                      
All benchmarks:
   <arithmetic>                                     126.3109+-0.2650     ?    127.2125+-1.1030        ? might be 1.0071x slower
   <geometric>                                       11.5171+-0.0530     ?     11.5785+-0.0528        ? might be 1.0053x slower
   <harmonic>                                         2.7865+-0.0337     ?      2.7976+-0.0230        ? might be 1.0040x slower

                                                        WithoutBT                   WithBT                                      
Geomean of preferred means:
   <scaled-result>                                   47.8749+-0.0571     !     48.2860+-0.1744        ! definitely 1.0086x slower
Comment 15 WebKit Commit Bot 2014-05-07 21:08:04 PDT
Re-opened since this is blocked by bug 132670