Windows Low Fragmentation Heap

Be sure that you’ve read my previous blog post about memory fragmentation before reading this.

I previously stated that I had tried to use the Windows Low-fragmentation Heap and hadn’t seen much difference. I realized that I wasn’t setting it on all the heaps.

I’ve added the following code to the top of main():

  HANDLE heaps[1025];
  DWORD nheaps = GetProcessHeaps(1024, heaps);

  for (DWORD i = 0; i < nheaps; i++) {
    ULONG  HeapFragValue = 2;
    HeapSetInformation(heaps[i],
                       HeapCompatibilityInformation,
                       &HeapFragValue,
                       sizeof(HeapFragValue));
  }

What do I see? Well, not quite what I expected.

After startup, loading about:blank:

This represents:
total: 19,247,554 bytes
used: 13,381,970 bytes
free: 5,865,584 bytes

This looks like about 1mb in used blocks more than without the LFH turned on. Notice though that we have 4mb more free blocks than with LFH off.

I repeated the same steps I followed in my last blog post. After loading Tripadvisor, clicking check rates, waiting for pages to load, closing them all and then loading about:blank. After clearing caches, I see:

This represents:
total: 54,253,757 bytes
used: 26,337,381 bytes
free: 27,916,376 bytes

Whoa! What happened here? We’re now using 14mb when we loaded the browser and have 27mb of free blocks on our heap! While we are less fragmented than without the low fragmentation heap, we’re using quite a bit more memory. There seems to be a lot of overhead to using the low fragmentation heap. Being far less fragmented means that future allocations will be a lot easier and we won’t have to keep putting them at the end.

“What does this mean after running for a while?”

A fine question, really. It certainly means that early on you’re going to be using a lot more memory. However, over time you may end up using less memory as you will be less fragmented and thus new allocations as you load new pages should be able to fill in to the free space more easily. From my quick testing, loading lots of pages, gmail, etc we don’t seem to grow much than we are at this state.

“If it doesn’t grow much more that sounds awesome, how do I try it?”

Watch this space. I’ll upload a build that people can test with and report back their results. If we really don’t grow much more than the initial growth from loading lots of pages at once, this might be worth turning on.

“I hate Windows, what about Mac and Linux”

If this does turn out to being a good thing on Windows, it certainly doesn’t solve all our problems. There is lots of work that we need to do to continue to reduce fragmentation. As you can see, there is still some fragmentation going on and we need to work to reduce it. I’m a bit scared of the amount of extra space we’re using with this enabled and would like to better understand what is going on. We’re looking at using cross-platform allocators, pools and arenas, etc. More on those results soon…

19 thoughts on “Windows Low Fragmentation Heap

  1. jmdesp

    I believe removing fragmenting is hard, and on the long term, only acting on the root cause can really help.

    Acting on the root cause means to stop allocating memory blocks that have a long lifetime.
    Now that is hard because everytime you keep a memory of what has happened since the browser started, you allocate such a block.

    But if it’s just too hard to do, there’s another easier approach. If you go the GC way, it becomes foreseeable to go even further and do unfragmenting through memory reallocation ? Even if this is impossible to do for every type of allocation, once the more damaging one are identified, it could be possible to make them run through a memory reallocation friendly allocator ?

    Reply
  2. Aaron

    The documentation for the Windows Low-fragmentation Heap mentions that the feature was added for Windows XP and 2003. Does this imply that it is not supported by Windows 2000? If not, would a separate build be required for Win2000 or could it just be a conditional feature within the same build?

    Reply
  3. Stuart Parmenter

    Aaron: We would just have to check OS version before calling it. Depending on if the function existed at all in W2k we might have to dynamically load the symbol. Both of these things are super easy and wouldn’t require a separate build. You just wouldn’t get the benefit (if there is one) on Win2k.

    Reply
  4. H

    This is probably a really stupid question, but one which I’ve been wondering about the past day.

    If we have so many free pages, why doesn’t the OS happily reuse them for other apps?

    Why is the “total memory usage” of an application seemingly determined by where the *first* used page is, until the *last* free pages, not accounting for the free blocks in between that the OS may reuse however it wishes?

    Sincerely,
    H

    Reply
  5. Mirza

    Hello, I had the same problem: I needed to load millions of street names (as std::string) to memory at once. Albeit size of combined strings is some 100 MB, program easily used all of my 1GB of RAM and crashed. Therefore, I created simple class “StringContainer” (300 lines of C++ code) that stores all strings as one huge string plus additional binary tree for fast access. You can StrHandle h = container.put(std:string) and later use this handle to get string back std::string s = container.get(h). Now all strings use as little memory as combined size of all strings (also, multiple identical strings are stored only once!). Get method is using special binary tree to retrieve string quickly, but tree nodes are stored in std::vector, so tree itself also occupy single chunk of memory! Removing single string from this container is not possible, but it is not problem in way I am using it. For example you can setup one container for each HTML page or each Java Script so container lives as long as its domain is alive. If you want this class, send me your mail to mirza@seznam.cz.

    Reply
  6. ThomasK

    Good work, Stuart.
    This could help a lot of other projects (longer running processes in non-GC environments), so please keep your analysis abstract enough that things != FF could profit too.

    Reply
  7. Lerc

    Is there any facility to tag memory allocations with info from the call stack so you see where they came from?

    combining something like that with these pretty heapmap images so you could hover and see what caused the allocation would help shed light on the issue.

    There is the possibilty of heap defragmentation ofr any objects that allocate memory for their own use and don’t pass pointers to others. Only one reference would have to be maintained. You’d need to have those objects select an alternate allocation system. or just add a ReallocatePrivateMemory method to base objects.

    The lifetime hint comment by max in the previous blogpost is another good idea.

    both of those will involve delving into the actual allocating code, but I suspect the quick fix of an alternate heap manager won’t be the best solution.

    Reply
  8. Peter

    Did you try to see if _heapmin() had any effect? I made some experiments on OS/2 last night, adding that to the cycle collector and it had at least a bit of an effect.

    I also experimented with OS/2 system APIs (DosAllocMem/DosFreeMem) instead of malloc for allocations of large buffers (e.g. the pixel buffers in cairo). This basically allocates memory outside the C heaps and when freed gets released to the system immediately. Made a huge difference! Not sure if there is something like that on the Win or the other platforms.

    Reply
  9. Pingback: malloc replacements? « pavlov.net

  10. Pingback: “Vlad and analysis of dtrace was used” « pavlov.net

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s