More allocator data — tcmalloc edition

I hooked tcmalloc up today and ran some numbers. As I mentioned before it is pretty fast, but looking at this image certainly doesn’t place it ahead of jemalloc. It looks to be a little less fragmented than nedmalloc, but not as fast. (I’ll note that this image isn’t quite as accurate as some of the previous ones. The sizes I’m using don’t take the allocator’s overhead in to account which would most likely result in gray blocks becoming a little darker, but wouldn’t really effect the number of blocks spread out.)


7 thoughts on “More allocator data — tcmalloc edition

  1. pd

    thanks for your continued diligence in your work

    it would be nice if your blog wasn’t stuck in the crazy wordpress vertically narrow theme because then we might be able to compare diagrams side by side. Oh well.

  2. Peter

    Could you give a few details on how you hook the allocators into the build, so that they are used for the various different ways to allocate memory that seem to be used in the codebase?

  3. pavlov Post author

    Peter: We built a tool to log and replay allocations. We’re then able to easily plug in new allocators and see the results. We’re unable to get good multi-threaded allocation performance results, but we can easily get good fragmentation results. It also lets us test each allocator against the same set of allocations so that we can get comparable results.

  4. DrkShadow

    Perhaps this is just a show of my naivety, but.. how would a bitmap allocator work?

    My thought is: store memory as a bitmap. First, it would have zero size. Then, when you allocate, it’ll need to grab some memory, and start a linked list — one item, pointing to the beginning of the allocated memory. When you deallocate that, the item will stay, and the next allocation, if smaller, will use the start of the space and a second item will be added to the linked list pointing to the byte after the end of the first. Free space is broken up as needed, more space is appended to the end, as needed.

    Surely, the linked list would grow, with varying size fragments and fragments that need to be combine. As things are freed, just throw them on the end of the LL — later, when the system’s not doing anything, sort the list. When grabbing a piece of space, do a binary search to find the closest match to the exact size you need. I envision it working like a filesystem, but with appendable data to the end.

    … I’m curious, do you think this is too slow? Surely it would result in minimal fragmentation, though still some. As I said, I’ve never done anything this low-level, so I’m curious about your thoughts 🙂


  5. some

    Some more naive questions, I hope you don’t mind.
    AIUI, one cannot easily know what code allocated a piece of memory or is responsible for deallocating it. Thus it is difficult to alter allocations. But isn’t there a list of allocations used to prevent double allocation? Can’t that be used to copy memory from one location to another then change what is pointed to in the list of allocations? Doesn’t the responsible code have a pointer to an item in the the list of allocations rather than the object directly such that this would be transparent? If not, what stops it happening this way?

    If memory can be moved about, can’t unallocatable regions be greatly reduced?

  6. Choco

    Keep up the good research work pavlov, you have no idea how much i’m waiting for a memory update. Even with the newest version I’m struggling. While other browsers aren’t coping better, I would love to see the day I can keep my mouse-rage in check. 😀


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s