Bug 152696 - Fragmentation-free allocator for eternal and/or coupled allocations.
Summary: Fragmentation-free allocator for eternal and/or coupled allocations.
Status: REOPENED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Template Framework (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Andreas Kling
URL:
Keywords: Performance
Depends on: 153079
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-04 10:07 PST by Andreas Kling
Modified: 2016-01-19 15:42 PST (History)
8 users (show)

See Also:


Attachments
WIP (43.32 KB, patch)
2016-01-04 10:08 PST, Andreas Kling
no flags Details | Formatted Diff | Diff
WIP2 (43.95 KB, patch)
2016-01-05 06:58 PST, Andreas Kling
no flags Details | Formatted Diff | Diff
WIP3 (45.54 KB, patch)
2016-01-05 08:10 PST, Andreas Kling
buildbot: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ews104 for mac-yosemite-wk2 (1.10 MB, application/zip)
2016-01-05 08:57 PST, Build Bot
no flags Details
WIP4 (45.33 KB, patch)
2016-01-05 09:01 PST, Andreas Kling
no flags Details | Formatted Diff | Diff
Patch (54.52 KB, patch)
2016-01-11 12:32 PST, Andreas Kling
no flags Details | Formatted Diff | Diff
Patch (54.60 KB, patch)
2016-01-11 13:14 PST, Andreas Kling
no flags Details | Formatted Diff | Diff
Patch (52.00 KB, patch)
2016-01-12 07:30 PST, Andreas Kling
no flags Details | Formatted Diff | Diff
Patch (54.59 KB, patch)
2016-01-12 08:00 PST, Andreas Kling
no flags Details | Formatted Diff | Diff
Patch (54.76 KB, patch)
2016-01-12 11:25 PST, Andreas Kling
no flags Details | Formatted Diff | Diff
Patch (54.72 KB, patch)
2016-01-12 11:27 PST, Andreas Kling
no flags Details | Formatted Diff | Diff
Patch (56.56 KB, patch)
2016-01-12 12:20 PST, Andreas Kling
koivisto: review+
Details | Formatted Diff | Diff
Patch for landing (56.57 KB, patch)
2016-01-13 06:04 PST, Andreas Kling
no flags Details | Formatted Diff | Diff
Patch (58.62 KB, patch)
2016-01-14 06:53 PST, Andreas Kling
no flags Details | Formatted Diff | Diff
Patch (59.19 KB, patch)
2016-01-14 08:27 PST, Andreas Kling
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Kling 2016-01-04 10:07:01 PST
I'm writing an allocator for the style object tree that avoids heap fragmentation.
Comment 1 Andreas Kling 2016-01-04 10:08:25 PST
Created attachment 268202 [details]
WIP
Comment 2 WebKit Commit Bot 2016-01-04 10:10:45 PST
Attachment 268202 [details] did not pass style-queue:


ERROR: Source/WebCore/css/StyleArena.cpp:61:  One line control clauses should not use braces.  [whitespace/braces] [4]
Total errors found: 1 in 16 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Andreas Kling 2016-01-05 06:58:15 PST
Created attachment 268275 [details]
WIP2
Comment 4 Andreas Kling 2016-01-05 08:10:42 PST
Created attachment 268281 [details]
WIP3
Comment 5 Build Bot 2016-01-05 08:57:21 PST
Comment on attachment 268281 [details]
WIP3

Attachment 268281 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/653494

Number of test failures exceeded the failure limit.
Comment 6 Build Bot 2016-01-05 08:57:24 PST
Created attachment 268282 [details]
Archive of layout-test-results from ews104 for mac-yosemite-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews104  Port: mac-yosemite-wk2  Platform: Mac OS X 10.10.5
Comment 7 Andreas Kling 2016-01-05 09:01:16 PST
Created attachment 268285 [details]
WIP4

Now supports arbitrarily-sized allocations.
Comment 8 Andreas Kling 2016-01-05 15:27:29 PST
There are a few more obvious use cases for this so retitling.
Comment 9 Andreas Kling 2016-01-05 16:19:02 PST
(In reply to comment #8)
> There are a few more obvious use cases for this so retitling.

I want to use this for objects that live in NeverDestroyed<> containers, e.g NeverDestroyed<String> would allocate the String's StringImpl from an everlasting bump arena.
Comment 10 Filip Pizlo 2016-01-05 17:44:55 PST
Comment on attachment 268285 [details]
WIP4

View in context: https://bugs.webkit.org/attachment.cgi?id=268285&action=review

> Source/WTF/wtf/BumpArena.cpp:142
> +    if (UNLIKELY(size >= (Block::blockSize - sizeof(Block)))) {
> +        Ref<Block> oversizeBlock = Block::create(*this, size);
> +        return oversizeBlock->allocate(size);
> +    }
> +
> +    if (!m_currentBlock || !m_currentBlock->canAllocate(size))
> +        m_currentBlock = Block::create(*this);
> +
> +    return m_currentBlock->allocate(size);

If you want blazing speed, you can arrange it so that the common case has only one branch and no dependent loads.  Usually you do this by:

- Having the notion of an "Allocator" which is a move-semantics data structure that contains the allocation state of a Block.  When BumpArena has m_currentBlock != null, it will also have a m_allocator that contains state stolen from Block::m_allocator.  Allocator will contain the moral equivalent of what you now call Block::m_next and Block::m_end.  Whenever a BlockArena "takes" a block, it steals the block's allocator.  It gives the allocator back to the BlockArena whenever it takes hold of a different block.
    Result: This saves an extra indirection.

- Ensure that a null Allocator always fails to allocate, and does so gracefully (returns nullptr or whatever).  This way, the allocator's check for whether it canAllocate() (or rather, the moral equivalent of that method, see further down) doubles as a check for whether m_currentBlock is not null.
    Result: You save one branch - the null check for m_currentBlock.

- Ensure that the Allocator's notion of m_next/m_end allows for a wrap-around-safe check for whether the requested size is bigger than anything the BumpArena could have allocated.  We came up with this in JSC:

    // This code is written in a gratuitously low-level manner, in order to
    // serve as a kind of template for what the JIT would do. Note that the
    // way it's written it ought to only require one register, which doubles
    // as the result, provided that the compiler does a minimal amount of
    // control flow simplification and the bytes argument is a constant.
    
    size_t currentRemaining = m_currentRemaining;
    if (bytes > currentRemaining)
        return slowPath(size);
    currentRemaining -= bytes;
    m_currentRemaining = currentRemaining;
    return m_currentPayloadEnd - currentRemaining - bytes;

This replaces m_next/m_end with m_currentRemaining and m_currentPayloadEnd.  m_currentPayloadEnd is exactly the same as your m_end.  m_currentRemaining is the number of bytes not yet allocated in the block, so it starts out being equal to the block's available capacity.  This allocator allocates forwards from the left of the block, by placing objects at m_currentPayloadEnd - m_currentRemaining - bytes.
    Result: You don't need an extra check to see if the size is greater than the largest possible size that the block can handle.  m_currentRemaining is always just the upper bound on the size that you can allocate right now.

All of this put together can give you an allocator that has only one path to a slow path, only one branch, two loads, one store, and no dependent loads.
Comment 11 Andreas Kling 2016-01-11 11:12:47 PST
Thanks for the suggestions, Phil. I'll try and incorporate some of it.

Since BumpArena doesn't try to use a Block anymore after it's been filled once, there's no need for the "Allocator" to move back to the Block at any time. Instead we can just always keep that data in the arena object.
Comment 12 Andreas Kling 2016-01-11 12:32:36 PST
Created attachment 268705 [details]
Patch
Comment 13 Andreas Kling 2016-01-11 13:14:29 PST
Created attachment 268707 [details]
Patch
Comment 14 Andreas Kling 2016-01-12 07:30:01 PST
Created attachment 268766 [details]
Patch
Comment 15 Andreas Kling 2016-01-12 08:00:37 PST
Created attachment 268770 [details]
Patch
Comment 16 Andreas Kling 2016-01-12 11:25:50 PST
Created attachment 268787 [details]
Patch
Comment 17 Andreas Kling 2016-01-12 11:27:45 PST
Created attachment 268788 [details]
Patch
Comment 18 Andreas Kling 2016-01-12 12:20:59 PST
Created attachment 268798 [details]
Patch

Okay, this looks pretty nice now. Add a separate arena for SelectorQueryCache too.
@Antti: pls give thoughts on this whole thing?
Comment 19 Filip Pizlo 2016-01-12 14:53:11 PST
Comment on attachment 268798 [details]
Patch

LGTM.  I'll let someone who knows more about WebCore do the r+.
Comment 20 Antti Koivisto 2016-01-13 02:30:08 PST
Comment on attachment 268798 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=268798&action=review

> Source/WTF/ChangeLog:48
> +        As noted above, calling allocate() with a null BumpArena will allocate out
> +        of a global shared arena. Ideally you're always allocating out of a specific,
> +        controlled arena, but there are situations where you may not have one.

Isn't this bit risky though? Allocating from global arena is essentially a memory leak right?

> Source/WTF/wtf/BumpArena.cpp:79
> +#pragma warning(disable: 4200) // Disable "zero-sized array in struct/union" warning
> +#endif
> +    char* m_data;
> +#if COMPILER(MSVC)
> +#pragma warning(pop)

That is not a zero-sized array. It would be clearer if was though.

> Source/WTF/wtf/BumpArena.cpp:134
> +    return adoptRef(*new (NotNull, fastAlignedMalloc(blockSize, sizeof(Block) + capacity)) Block(arena));

sizeof(Block) includes the fake m_data ptr. Does this create a slightly too large arena?

> Source/WebCore/ChangeLog:21
> +        One-off CSS parses that don't work within a StyleSheetContents context will have
> +        their StyleRules & co allocated out of the global BumpArena.

Why doesn't this cause memory to leak? Shouldn't the global arena be used strictly for things that are never freed? Could we have a document-associated arena for these?
Comment 21 Andreas Kling 2016-01-13 05:37:19 PST
(In reply to comment #20)
> Comment on attachment 268798 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=268798&action=review
> 
> > Source/WTF/ChangeLog:48
> > +        As noted above, calling allocate() with a null BumpArena will allocate out
> > +        of a global shared arena. Ideally you're always allocating out of a specific,
> > +        controlled arena, but there are situations where you may not have one.
> 
> Isn't this bit risky though? Allocating from global arena is essentially a
> memory leak right?

Blocks in the global arena are deallocated when their ref-count drops to zero, so these are not leaks unless someone forgets to deallocate something out of that same block.

> > Source/WTF/wtf/BumpArena.cpp:79
> > +#pragma warning(disable: 4200) // Disable "zero-sized array in struct/union" warning
> > +#endif
> > +    char* m_data;
> > +#if COMPILER(MSVC)
> > +#pragma warning(pop)
> 
> That is not a zero-sized array. It would be clearer if was though.

Good catch, it should be!

> > Source/WTF/wtf/BumpArena.cpp:134
> > +    return adoptRef(*new (NotNull, fastAlignedMalloc(blockSize, sizeof(Block) + capacity)) Block(arena));
> 
> sizeof(Block) includes the fake m_data ptr. Does this create a slightly too
> large arena?

Glad you caught this, it doesn't create a slightly larger block, but it does make us miss out on sizeof(void*) bytes of payload region! Fixed by actually making it a zero-length array as mentioned above.

> > Source/WebCore/ChangeLog:21
> > +        One-off CSS parses that don't work within a StyleSheetContents context will have
> > +        their StyleRules & co allocated out of the global BumpArena.
> 
> Why doesn't this cause memory to leak? Shouldn't the global arena be used
> strictly for things that are never freed? Could we have a
> document-associated arena for these?

I suppose we could have a Document-associated arena (or something like that) for localizing the damage a missing deallocate() can cause.
Comment 22 Andreas Kling 2016-01-13 06:04:19 PST
Created attachment 268864 [details]
Patch for landing
Comment 23 Antti Koivisto 2016-01-13 06:43:48 PST
> Blocks in the global arena are deallocated when their ref-count drops to
> zero, so these are not leaks unless someone forgets to deallocate something
> out of that same block.

Any static allocations would pin these pages. This is worse than the malloc since the memory can't be reused. It would be safer if we could manage without a global arena.
Comment 24 WebKit Commit Bot 2016-01-13 06:52:31 PST
Comment on attachment 268864 [details]
Patch for landing

Clearing flags on attachment: 268864

Committed r194963: <http://trac.webkit.org/changeset/194963>
Comment 25 WebKit Commit Bot 2016-01-13 06:52:37 PST
All reviewed patches have been landed.  Closing bug.
Comment 26 Andreas Kling 2016-01-13 07:03:23 PST
(In reply to comment #23)
> > Blocks in the global arena are deallocated when their ref-count drops to
> > zero, so these are not leaks unless someone forgets to deallocate something
> > out of that same block.
> 
> Any static allocations would pin these pages. This is worse than the malloc
> since the memory can't be reused. It would be safer if we could manage
> without a global arena.

Yep ok, let's solve this in a better way!
Comment 27 WebKit Commit Bot 2016-01-13 13:21:05 PST
Re-opened since this is blocked by bug 153079
Comment 28 Chris Dumez 2016-01-13 13:59:38 PST
(In reply to comment #27)
> Re-opened since this is blocked by bug 153079

Also completely broke PLT on iOS.
Comment 29 Andreas Kling 2016-01-13 14:09:58 PST
(In reply to comment #28)
> (In reply to comment #27)
> > Re-opened since this is blocked by bug 153079
> 
> Also completely broke PLT on iOS.

Probably the same issue, not guaranteeing that returned pointers are 8-byte allocated.
Comment 30 Andreas Kling 2016-01-13 14:10:10 PST
(In reply to comment #29)
> (In reply to comment #28)
> > (In reply to comment #27)
> > > Re-opened since this is blocked by bug 153079
> > 
> > Also completely broke PLT on iOS.
> 
> Probably the same issue, not guaranteeing that returned pointers are 8-byte
> allocated.

ehm, 8-byte aligned :)
Comment 31 Chris Dumez 2016-01-13 14:19:10 PST
(In reply to comment #30)
> (In reply to comment #29)
> > (In reply to comment #28)
> > > (In reply to comment #27)
> > > > Re-opened since this is blocked by bug 153079
> > > 
> > > Also completely broke PLT on iOS.
> > 
> > Probably the same issue, not guaranteeing that returned pointers are 8-byte
> > allocated.
> 
> ehm, 8-byte aligned :)

Seems it was crashing on 32bit devices only with a EXC_BAD_ACCESS / EXC_ARM_DA_ALIGN.
Comment 32 Andreas Kling 2016-01-14 06:53:43 PST
Created attachment 268959 [details]
Patch

- Now uses a custom VM allocator in order to support fallback to FastMalloc.
- Now rounds up allocation requests to 8-byte alignment boundary, for ASan and ARM (whoops..)
Comment 33 WebKit Commit Bot 2016-01-14 06:54:54 PST
Attachment 268959 [details] did not pass style-queue:


ERROR: Source/WTF/wtf/BumpArena.cpp:31:  Alphabetical sorting problem.  [build/include_order] [4]
Total errors found: 1 in 25 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 34 Andreas Kling 2016-01-14 08:27:34 PST
Created attachment 268971 [details]
Patch

Stub it out of Windows.
Comment 35 Antti Koivisto 2016-01-15 06:27:03 PST
Comment on attachment 268971 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=268971&action=review

> Source/WTF/wtf/BumpArena.cpp:56
> +static const size_t blockSize = 4096;

I think this would optimally be the page size. We should ask the system.

> Source/WTF/wtf/BumpArena.cpp:130
> +    static const size_t vmSize = 128 * MB;

Science?
Comment 36 WebKit Commit Bot 2016-01-15 12:01:56 PST
Comment on attachment 268971 [details]
Patch

Clearing flags on attachment: 268971

Committed r195141: <http://trac.webkit.org/changeset/195141>
Comment 37 WebKit Commit Bot 2016-01-15 12:02:02 PST
All reviewed patches have been landed.  Closing bug.
Comment 38 Geoffrey Garen 2016-01-15 14:56:20 PST
Comment on attachment 268971 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=268971&action=review

> Source/WTF/wtf/BumpArena.cpp:151
> +BlockAllocator::BlockAllocator()
> +{
> +    vmBase = reinterpret_cast<char*>(mmap(0, vmSize, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, BUMPARENA_VM_TAG, 0));
> +    RELEASE_ASSERT(vmBase != MAP_FAILED);
> +    vmEnd = vmBase + vmSize;
> +    nextBlock = reinterpret_cast<char*>(roundUpToMultipleOf<blockSize>(reinterpret_cast<uintptr_t>(vmBase)));
> +}

This is way too big. We'll run out of VM on iOS much more often, and jetsam as a result.
Comment 39 Geoffrey Garen 2016-01-15 15:01:54 PST
Given my theory that this patch was a memory use / jetsam regression, I have to ask: How will we verify whether this patch is a memory use / jetsam improvement?
Comment 40 Andreas Kling 2016-01-15 19:05:35 PST
(In reply to comment #39)
> Given my theory that this patch was a memory use / jetsam regression, I have
> to ask: How will we verify whether this patch is a memory use / jetsam
> improvement?

A memory use regression? Because of the page table cost of reserving that VM?

If you look at a dirty-per-VM tag breakdown on membuster, you can see that this patch caused the following (according to 3 of the mac bots):

BumpArena:     +3.91MB (same on all bots)
WebKit malloc: -7MB (average, more on some of the bots)

The difference is roughly the same pre- and post-notification.

128 MB is just a number from the emerald dream, I'll gladly make it smaller.
Comment 41 Chris Dumez 2016-01-19 12:16:14 PST
Reverted r195141 for reason:

Seems to cause crashes on iOS9 64bit

Committed r195304: <http://trac.webkit.org/changeset/195304>
Comment 42 Geoffrey Garen 2016-01-19 15:42:25 PST
> A memory use regression? Because of the page table cost of reserving that VM?

A jetsam regression, caused by increased use of virtual memory.

iOS sets a per-process virtual memory limit around 700MB (depending on device), and WebKit hits that limit often, so a new 128MB VM reservation would seem to increase the jetsam risk by roughly 9% (700 / 128 / 2, if we attribute half of jetsams to VM exhaustion and the other half to physical memory exhaustion).

> If you look at a dirty-per-VM tag breakdown on membuster, you can see that
> this patch caused the following (according to 3 of the mac bots):
> 
> BumpArena:     +3.91MB (same on all bots)
> WebKit malloc: -7MB (average, more on some of the bots)
> 
> The difference is roughly the same pre- and post-notification.
> 
> 128 MB is just a number from the emerald dream, I'll gladly make it smaller.

How small can it get? Can we go to 1MB?

As an aside, for non-global non-permanent allocations (like StyleSheetContents), can we just reuse bmalloc::Cache instead? I would be interested to see a comparison between BumpArena and direct use of bmalloc::Cache.

There are many downsides to introducing a whole new allocator -- including the bugs that caused this patch to be rolled out the first time, and the need to retrofit the allocator to work with system memory analysis tools. It would be nice not to have to do all that work twice.