As the web and web browsers have matured, people have started expecting different things out of them. When we first released Firefox, few people were browsing with tabs or add-ons. I’ve written before about how web usage patterns have changed, so too have our strategies on how to effectively make use of system resources such as memory.
While Firefox 2 used less memory than it’s predecessor, Firefox 1.5, we intentionally restricted the number of changes to the Gecko platform (Gecko 1.8.1 was only slightly different than Gecko 1.8) on which Firefox was built. However, while the majority of people were working on Firefox 2 / Gecko 1.8.1, others of us were already ripping into the platform that Firefox 3 was to be built on: Gecko 1.9.
We’ve made more significant changes to the platform than I can count, including many to reduce our memory footprint. The result has been dramatic, and you can see for yourself by getting a copy of the recently released Firefox 3 Beta 4.
Here’s What We’ve Done:
Reduced Memory fragmentation
As I’ve written about before, long running applications such as ours can wind up wasting a lot of space due to memory fragmentation. This can occur as a result of mixing lots of various sized allocations and can leave a lot of small holes in memory that are hard to reuse.
One of the things we did to help was to minimize the number of total allocations we did, to avoid unnecessarily churning memory. We’ve managed to reduce allocations in almost all areas of our code base. The graph below shows the number of allocations we do during startup. The graph below shows we were able to get rid of over 1/3 of them! Olli Pettay, Jonas Sicking, Johnny Stenback, and Dan Witte all made a big difference here.
I carefully studied the fragmentation effects of various allocators and concluded that jemalloc gave us the smallest amount of fragmentation after running for a long period of time. I’ve worked closely with the jemalloc author, Jason Evans, to port and tune jemalloc for our platforms. It was a huge effort resulting in Jason doubling the number of lines of code in jemalloc over a 2 month period, but the results paid off. As of beta 4 we now use jemalloc on both Windows and Linux. Our automated tests on Windows Vista showed a 22% drop in memory usage when we turned jemalloc on.
Fixed cycles with the Cycle collector
Some leaks are harder to fix than others. One of the most difficult ones is where two objects have references to each other, holding each other alive. This is called a cycle, and cycles are bad. In previous versions, we’ve used very complex and annoying code to manually break cycles at the right times, but getting the code right and maintaining it always proved to be difficult. For Gecko 1.9, we’ve implemented an automated cycle collector that can recognize cycles in the in-memory object graph and break them automatically. This is great for our code as we can get rid of lots of complexity. It is especially significant for extensions, which can often inadvertently introduce cycles without knowing it because they have access to all of Firefox’s internals. It isn’t reasonable to expect all those authors to write code to manually break the cycles themselves.
Basically, the cycle collector means there are whole classes of leak that we can easily avoid in both our code and in extensions, and that’s good for everyone. You can thank Graydon Hoare, Peter Van der Beken and David Baron for their amazing hard work on this.
Tuned our caches
Firefox uses various in-memory caches to keep performance up including a memory cache for images, a back/forward cache to speed up back and forward navigation, a font cache to improve text rendering speed, and others. We’ve taken a look at how much they cache and how long they cache it for. In many cases we’ve added expiration policies to our caches which give performance benefits in the most important cases, but don’t eat up memory forever.
We now expire cached documents in the back/forward cache after 30 minutes since you likely won’t be going back to them anytime soon. We have timer based font caches as well as caches of computed text metrics that are very short lived.
We also throw away our uncompressed image data as I describe below…
Adjusted how we store image data
Another big change we’ve made in Firefox 3 is improving how we keep image data around.
Images on the web come in a compressed format (GIF, JPEG, PNG, etc). When we load images we uncompress them and keep the full uncompressed image in memory. In Firefox 2 we would keep these around even if the image is just sitting around on a tab that you haven’t looked at in hours. In Firefox 3, thanks to some work by Federico Mena-Quintero (of GNOME fame), we now throw away the uncompressed data after it hasn’t been used for a short while. Not only does this affect images that are on pages in background tabs but also ones that are in the memory cache that might not be attached to a document. This results in pretty dramatic memory reduction for images that aren’t on the page you’re actively looking at. If you have a 100KB JPEG image which uncompress to several megabytes, you won’t be charged with the uncompressed size when you’re not viewing it.
Another fantastic change from Alfred Kayser changed the way we store animated GIFs so that they take up a lot less memory. We now store the animated frames as 8bit data along with a palette rather than storing them as 32 bits per pixel. This savings can be huge for large animations. One extreme example from the bug showed us drop from using 368MB down to 108MB — savings of 260MB!
Hunted down leaks
Most leaks are a pain in the ass to find and fix in any complex piece of software. There are small leaks, big leaks, and in-between leaks. If you leak a small piece of text once an hour you probably won’t notice. If you leak a large image every time you move the cursor, you’ve got a big problem. Both are important to fix, because even the little ones add up. Some leaks are only leaks until you leave a page, so they don’t show up with conventional leak-finding tools, but they make a difference if you have a page opened all day long like GMail.
Ben Turner has gotten pretty good at Leak Hunt.
We’ve fixed many leaks, ranging from small DOM objects that get leaked on GMail until you leave the site to entire windows that were leaked holding on to everything inside of them when you closed them.
Overall, we’ve been able to close over 400 leak bugs so far, most of which are very uncommon, but can still occur. We’ve greatly improved our tools for detecting leaks. Carsten Book, in particular, has done an amazing job at finding and reporting leaks.
Measuring Memory Use
As I’ve learned the hard way, accurately measuring memory usage is hard.
This part gets a bit technical, feel free to skip over. The short summary is Windows Vista (Commit Size) and Linux (RSS) provide pretty accurate memory measurement numbers while Windows XP and MacOS X do not.
If you’re running Windows Vista and take a look at Commit Size in task manager, you should get some pretty accurate memory numbers. If you’re looking at Memory Usage under Windows XP, your numbers aren’t going to be so great. The reason: Microsoft changed the meaning of “private bytes” between XP and Vista (for the better). On XP the number is the amount of virtual memory you’re application has reserved for use. For performance reasons you often want to reserve more memory than you actually use. The application can tell the operating system that it isn’t going to use parts of the reserved space and to not back the virtual space with physical space. On Vista, Private Bytes is the commit size, which only counts the memory the application has actually said it is actively using. Since virtual memory size has to be greater than or equal to your commit size, XP memory numbers will always appear bigger than Vista ones, even though the application is using the same amount of memory.
On Mac, If you look at Activity Monitor it will look like we’re using more memory than we actually are. Mac OS X has a similar, but different, problem to Windows XP. After extensive testing and confirmation from Apple employees we realized that there was no way for an allocator to give unused pages of memory back while keeping the address range reserved.. (You can unmap them and remap them, but that causes some race conditions and isn’t as performant.) There are APIs that claim to do it (both madvise() and msync()) but they don’t actually do anything. It does appear that pages mapped in that haven’t been written to won’t be accounted for in memory stats, but you’ve written to them they’re going to show as taking up space until you unmap them. Since allocators will reuse space, you generally won’t have that many pages mapped in that haven’t been written to. Our application can and will reuse the free pages, so you should see Firefox hit a peak number and generally not grow a lot higher than that.
Linux seems to do a pretty good job of reporting memory usage. It supports madvise(), allowing us to tell Linux about pages we don’t need, and so its resident set size numbers are fairly accurate. You can use ps or top to measure RSS.
Ways to test
There are many ways to measure memory usage in a browser. Open up 10 tabs with your favorite websites in them and see how much memory the browser is using. Close all but the last tab and load about:blank or Google. Measure again. Another simple test is simply loading Zimbra, Google Reader and Zoho each in their own tab and logging in. We’ve learned that users do so many things with the browser it is nearly impossible to construct a single test to measure memory usage.
We wanted more of a stress test — One that was more reproducible than loading random sites from the web. We took our Standalone Talos framework and Mike Schroepfer modified it to cycle pages through a set of windows while opening and closing them to try and approximate people running for a long period of time. Talos makes it pretty straightforward to get this up and running, and is great for measuring things like memory usage and layout speed. This works great for Firefox and allows measuring performance and other metrics, but the page cycling code doesn’t work with other browsers.
Since we wanted to test cross-browser, we modified the tests to run cross-browser and we wired up some of our talos code that uses the Windows Performance Counters to measure Private Bytes (commit size on Vista).
For the results below we loaded 29 different web pages through 30 windows over 11 cycles (319 total page loads), always opening a new window for each page load (closing the oldest window alive once we hit 30 windows). At the end we close all the windows but one and let the browser sit for a few minutes so see if they will reclaim memory, clear short-term caches, etc. There is a 3 second delay between page loads to try and get all the browsers to take the same amount of time. We used the proxy server that is part of Standalone Talos to make sure we were serving up the same content. We had to disable popup blocking to allow the test window to open the 30 windows for running the test. You can get the simple webpage test here and the python script to monitor memory usage here. These things are built on top of the standalone talos framework so you’ll need to drop the python script in with talos to get good results. Mad props to Mike Schroepfer for getting this all working.
Looking at the graph:
- All browsers increase in memory use slightly over time, but the Firefox 3 slope is closer to 0.
- The _peak_ of Firefox 3 is lower than the terminal size of Firefox 2!
- The terminal state of Firefox 3 is nearly 140MB smaller than Firefox 2. 60% less memory!
- IE7 doesn’t appear to give any memory back, even after all the windows are closed!
- Firefox 3 ends up about 400mb smaller than IE7 at the end of the test!
This is just one test that I feel shows the great progress that has been made. We’ll continue working on adding additional tests that can measure more of the ways that users use their browser.
Our work has paid off.
We’re significantly smaller than previous versions of Firefox and other browsers.
You can keep the browser open for much longer using much less memory.
Extensions are much less likely to cause leaks.
We’ve got automated tools in place to detect leaks that might result from new code. We’re always monitoring and testing to make sure we’re moving in the right direction.
All of this has been done while dramatically improving performance.
Many people have worked on this but I’d like to specifically thank: David Baron, Carsten Book, Peter Van der Beken, Igor Bukanov, Brendan Eich, Jason Evans, Alfred Kayser, Federico Mena-Quintero, Robert O’Callahan, Olli Pettay, Mike Schroepfer, Mike Shaver, Jonas Sicking, Johnny Stenback, Ben Turner, Vladimir Vukicevic, Dan Witte, Boris Zbarsky, and everyone else I’m forgetting who has worked on this. Everyone really pulled together to make this happen.