Stealing Regions
The vectored exception handler used in Listing One is a perfect fit for dealing with out-of-memory exceptions generated by any component in your process. With this kind of exception handling, even components you didn't develop yourself, such as third-party modules loaded by your program, benefit from your watchdog's protection. The DllMain() routine in Listing One sets up the module's vectored exception handler, up front, before intercepting the target system calls. Because this handler invokes routines that depend on virtual memory tracking, as arranged by intercepting these system calls, the interception is not performed unless the handler is successfully put in place first. Vectored exception handling is available on current versions of Windows, including Vista and XP. If your operating system doesn't support vectored exception handling, then you need to provide some means of handling out-of-memory exceptions on each thread that is started. Like the vectored handler in Listing One, your handler can invoke a routine that uses tracked data about your program's virtual memory regions to keep the program alive, as in Figure 4, which introduces the concept of stealing regions.
In Figures 4 and 5, a reserved virtual memory region is considered stolen when it's committed for use by a component that didn't reserve it. Similarly, an unused committed page is considered stolen when it's unreserved and recommitted for use by a component other than the one that originally committed it. How do you know the original component won't come along and try to use its reserved or committed space? You don'tbut your watchdog can try to head off this possibility by stealing the memory that has been unused for the longest amount of time, out of all the memory it's tracking. The real protection involves, well, protection. On Windows, the VirtualProtect() API function can be applied to each stolen page or region to generate an exception when the region is accessed. The use of an exception handler to deal with all accesses to a region means, of course, that your program could run slowly when components start stealing regions. On the other hand, poor performance is generally more tolerable than a crash.
If your program is going to stay alive for long after stealing a region that was originally reserved or committed for some other purpose, then you need a way to tell one component from another. A simple way to do this is based on call chains. Each component itself occupies a region, or a set of regions, where it's loaded in virtual memory. Your watchdog may track those regions along with all the others. If so, then comparing the base addresses of the regions associated with components is a matter of a lookup in your region list. Better yet, you can call an API function, such as GetModuleHandle() on Windows, to find the base address of each module that appears along the call chain.
Your watchdog needs to be able to recognize calls into modules that represent API code or allocator code, so that it won't compare the modules containing this code. Doing so confuses it, making it fail to recognize call chains coming from different components. Because many components share a common allocator, the call chains associated with region creation typically end up in the same few functions. That's why you can't tell components apart by the last few entries of a call chain associated with region creation. But if you'll spend some time debugging your tracking code, particularly in the routine that implements Figure 1, then you get to know where common region creation code lies in virtual memory. By ruling out these ranges during call-chain comparison and instead looking to the next caller beyond one of these common routines for any given chain, you can get an idea of whether the component responsible for stealing a region is the same component that is now accessing it. If so, then your watchdog can safely unprotect the region and go ahead with the access. Otherwise, the time has evidently arrived to steal another region.
An obvious benefit of this region tracking/stealing scheme is the ability to construct a detailed report when an out-of-memory condition arises. Often, when a program runs out of memory, it can't even do so much as complain before the inevitable crash occurs. The technique of making available any unused virtual memory for use in constructing and displaying diagnostic output can be very useful in and of itself. The added ability to provide information about the virtual memory regions that are being used, including information about which components created them and when they were created, can provide the clues you need to prevent similar out-of-memory conditions down the road. You don't need to implement a component-recognition scheme to realize this benefit, if you're happy enough to get your diagnostic output and to let the program crash. But if you want to keep your program alive longer, you will probably want to clean up any virtual memory committed for diagnostic purposes as soon as you can after making your diagnostic information available.
Of course, the watchdog component itself adds to your program's memory footprint, but only modestly. The entire watchdog module can contain perhaps 3-5 times the amount of code in Listings One through Three. The tracking data for virtual memory regions is minimal, because there are typically not more than several hundred regions to track, even for a large and complex program. Plenty of components, even commercial ones such as some JVMs, leave large amounts of virtual memory reserved. If that reserved memory can be reused to keep your program alive in the face of otherwise overwhelming memory pressure, then your watchdog has earned its keep.