Investigating Memory – Strings

A while back I had to investigate why our application seemed to be using a large amount of memory in certain usage patterns. What we would find is that several calls to a web service would cause an OutOfMemoryException on the client.

Note: The underlying problem was the good old, “references not being released” to an object and thus garbage collection not taking place. But more interesting at the time was the information I found from the steps below.

To investigate the memory usage I loaded up WinDbg and attached (F6) to the instance of the application I wanted to investigate. Then carried out the following steps.

  • I typed .load sosex into the WinDbg command line (see http://stevestechspot.com)
  • I typed !dumpheap -stat into the WinDbg command line to view the memory

At the end of the heap dump I noticed the type consuming the largest amount of memory (by quite a long way) was the humble System.String. This tweaked my interest and so I dug deeper using WinDbg.

  • I typed !dumpheap -strings into WinDbg

This took a while to complete (due to the size of the application) but I noticed that when the dump was completed there were a large number of strings which seemed to have a very large number of instances duplicates stored in memory. Amounting to a hefty amount of the memory being used for the same string stored thousands of times.

Note: The problem with !dumpheap -strings is that it does not output the object reference because it’s outputting a summary of each string’s usage, but from this we can see the strings that take the most memory and/or most instances of and the use the sosex command !strings to inspect each instance in more detail. Type !help !strings to view help for the command. One useful option is !strings /m:

What was happening was that large amounts of data were being retrieved from our web services and in that data was a massive number of duplicate strings. Unfortunately I could do nothing with the web service itself, but I could attempt to address the strings in the client application.

As you may know there’s a static method on the String class called Intern. In essence this acts as an application wide string cache (or more precisely an AppDomain cache), calling it like this

o.Name = String.Intern(o.Name);

Will in essence replace the o.Name string which a reference to the one stored in the application cache, thus the previous o.Name string can be garbaged collected as it’s no longer referenced.

This reduced the amount of duplicates and memory immediately, however there’s an obvious problem with the String.Intern and that is it’s an AppDomain cache. This means that as we read strings from our webservices these add to the cache and never get garbage collected, slowly added more and more unique strings to the cache. This is fine for const strings which are automatically interen’d but not so good for an application mean’t to run pretty much 24/7.

I initially looked to implement my own Dictionary based cache but almost everything has been done by somebody else on the web, so after a short search I came across the StringReference class which did exactly what I wanted. It allows us to use the good bits from Intern but with the ability to clean up unused strings over time.