An introduction to memory damage problems (original) (raw)

| | | | | ----------------------------------------- | | |

Welcome to the world of memory damage. Or, as we experienced programmers like to call it, A World Of Hurt.

There are no bugs more insidious, more frustrating, and more dangerous that memory damage problems. They are without a doubt among the worst bugs you will ever have to track down.

They are implicated in all kinds of nasty behavior. For most apps, they merely result in the app crashing with some unpleasant access fault, which is generally an unrecoverable error. But in some cases, such as in network stacks, IE, SQL Server, IIS, and other famous (or infamous) applications, they result in the ability of malware to gain control of the application. If the application is privileged, the malware then runs with the privileges of the application, and this allows for unlimited damage to be done to the hosting computer, including the insertion of Trojans, spyware, viruses, and other seriously evil pieces of code.

But for most of us, it just means the program took a nasty access fault and died.

There is one absolutely critical, non-negotiable rule about dealing with memory damage problems. When I was setting development policies in a software company, this is one that I set and everyone immediately bought into.

WHEN A MEMORY DAMAGE BUG IS OBSERVED, IT TAKES PRIORITY OVER ALL OTHER BUG FIXES, ENHANCEMENTS, OR ANY OTHER DEVELOPMENT ACTIVITY. ALL DEVELOPMENT CEASES UNTIL IT IS FOUND.

Why is this? Because memory damage bugs are sensitive to patterns of memory allocation and deallocation. If you continue to do any sort of development, it may appear that the bug has gone away, but in fact it hasn't. It has merely confined itself to damaging memory that is not critical at the moment. But unless you find and fix it, it will reappear. If you do not find and fix it right now, in the development cycle, it will only reappear after the product is delivered. You do not want to have this happen. Look at what it's done for Microsoft's reputation.

There is a simple rule for avoiding memory damage bugs: Do not let them happen. Do not ever code anything that could possibly allow them to happen.

That sounds like good advice, but it goes deeper than that. It involves disciplines of programming that make it all-but-impossible to have memory damage bugs.

For example, I've really lost some of my edge on this. This is because I've been using C++ for so many years now, and using it in highly-structured ways, so that memory damage bugs are impossible to code. And that's where you want to be.

Part of this essay is going to be a tutorial for people who don't know what a heap is, or what we're really talking about here. If you already understand heaps, you can skip this tutorial introduction.

What's a heap?

The "heap", or "pool" (the actual terminology is somewhat cultural, and depends on where you went to school and first learned about this) is essentially a block of memory which can be dynamically subdivided into smaller pieces. For example, suppose we have a block of a megabyte of memory. We would like to use parts of it temporarily to hold smaller chunks of information. So we have a "heap allocator" , a subroutine library package, that handles the details of this for us. So if I need 1000 bytes of memory, I call the allocator and say "Please give me 1000 bytes of memory". It looks at the data structure, and initially sees that it has 1,000,000 bytes of memory available (I'll use powers of ten just to simplify the discussion), so it identifies a sequence of 1000 bytes which are unused (such as the first 1000 bytes of this million-byte array), hands me back a pointer to it, and says "Here. Use the bytes at this address". It then makes sure that it won't use those again for another purpose; for example, by simply adding 1000 to the starting address of the block of memory, leaving 999,000 bytes available for future use.

The problem is that this is always a finite amount of memory. We can't just keep asking for memory indefinitely, because we only have 1,000,000 bytes in this heap. So after 1000 allocations of 1000 bytes, we have no more memory left. We're out of memory. We're probably in trouble.

So it becomes our responsibility to return to the heap any block of memory we aren't using. We go to the heap allocator library and call a function and say "Remember that 1000 bytes of memory I was using? Well, here it is back again."

The heap is our "bank" and we "borrow" and "repay" memory locations.

But it's worse than that. Money is "fungible", meaning it can be freely interchanged. If I get 1000 1bills(orpickyourcurrencyofchoicethathasbills),Iget1000essentially"random"serialnumbers.Anybillisworth1 bills (or pick your currency of choice that has bills), I get 1000 essentially "random" serial numbers. Any bill is worth 1bills(orpickyourcurrencyofchoicethathasbills),Iget1000essentially"random"serialnumbers.Anybillisworth1. But for memory allocation, those serial numbers have to be sequential. So when I repay the money, I can't repay part of it; I have to repay all of it in a block, using the same bills I borrowed. And if you want $2000, you need 2000 sequentially-numbered bills.

So we get into all kinds of problems that the bank-borrow-repay analogy can't really handle well.

Now, somebody has to keep track of what memory has been allocated and what memory is available for allocation. When I return a block of memory, someone has to notice that it is now available again.

This is not going to be a tutorial on first-fit, best-fit, quick-fit, and other algorithms that tend to enhance allocator performance. We're not going to really look too deeply inside how an allocator works (like the making of sausage and laws, you're often better off not knowing). Let me say that I've spent a nontrivial part of my career writing storage allocators, and I've written some of the best and fastest around [IDL: The Language and its Implementation, Nestor, Newcomer, Gianinni, and Stone, Prentice-Hall, 1990], and you can devote an entire semester to studying this topic. I discuss some of the details of allocators in an accompanying essay on memory allocation.

But the way we handle it is that we embed, in the data structures, some "overhead" bytes which help us keep track of what is going on. So to get your 1000 bytes, the allocator might actually give up 1008, or 1016, where the additional bytes are used as part of the bookkeeping. A typical allocator might have two values for each overhead block, such as a length (the nominal length of the block) and a pointer to the start of the next block, and a bit that says whether the current block is allocated or available. In addition, if you ask for 5 bytes, you might actually get a block whose length is 8, because you get better performance if these blocks are allocated on, say, DWORD boundaries.

Now, suppose I want to allocate a block of bytes for a string of length 5. I might get a memory layout which looks like the one below. This assumes that my block of 1,000,000 bytes starts at memory address 100000 (it doesn't matter if its in decimal or hex, and I'm going to use decimal here in the tutorial). Note that a string of length 5 takes 6 bytes to store, because we always have that 0 byte at the end (no, I'm not going to worry about Unicode and 16-bit characters; that's just an unnecessary complication here. Pretend we're limited to 8-bit characters for this discussion)

address value state meaning
100000 100016 A pointer to next block
100004 6 length
100008 'ABCD' first 4 characters
100012 'E\0' ? ? 5th character, NUL
100016 10096 F pointer to next block
100020 76 length

This shows that the block at location 100000 is allocated (A), and the block starting at 10016 is free (F). Note that because blocks are always allocated on a power-of-2 basis, we can use the low-order bit(s) to hold these flags; we just have to remember to mask them off before using the address. So the value stored in location 100000 might actually be 100017, where the low-order bit indicates the block is allocated. Before we try to use it as a pointer, we'll mask off that low-order bit and get 100016 back as a result. These are clever little implementation tricks. C++ structures have even richer detail, but you have to read the C runtime sources to see all the details if you care.

But these overhead bytes are "invisible". That is, if you did

char * p = new char[6];

the value you would see in p, in this example, would be 100008, not 100000. Those first eight bytes are reserved and you can't see them at all.

Heap Integrity

If we believe we have 1,000,000 bytes, then given the above structure of information we should be able to start at location 100000 and, by following the pointers, come to the end of the heap. Each time we pick up a pointer and follow it, we should get a new, valid pointer at that location.

Note that various heap implementations will handle this differently. In the one shown above, every block points to the next block which is physically adjacent. Other heap allocators maintain much more complex lists, but ultimately it should be true that every block has a valid length and a valid "next" pointer, even if the block which is logically "next" is not physically next.

Buffer Overruns

Now let's look at how a data overrun can damage the heap. If you allocate a block of 6 bytes and write 7, what happens?

char * p = malloc(6); strcpy(p, "ABCDEF");

The answer is, nothing much. You copied 7 bytes (6 8-bit characters and the terminating NUL) into a 6-byte buffer, and you got

address value state meaning
100000 100016 A pointer to next block
100004 6 length
100008 'ABCD' first 4 characters
100012 'EF\0' ? 5th, 6th characters, NUL
100016 10096 F pointer to next block
100020 76 length

Note that because the allocator actually gave you two DWORD chunks for your storage, nothing happens. You can see that if you do

strcpy(p, "ABCDEFG")

you're still safe. Note that both of these strcpy call are actually erroneous, because they are violating the specification of how many bytes are stored, but you don't notice.

But suppose we do

strcpy(p, "ABCDEFGHIJKLP");

Now we're in seriously deep trouble. That value which was _supposed_to be a pointer to the next block has been overwritten:

address value state meaning
100000 100016 A pointer to next block
100004 6 length
100008 'ABCD' first 4 characters
100012 'EFGH' 5th-8th characters,
100016 'IJKL' F pointer to next block
100020 'P\0' 0 0 length

So the next time the allocator tries to pick up that pointer and do something with it, instead of a valid pointer, it has the value 0x4B4A4948, which is (a) the actual bit pattern that represents 'HIJK', except it is actually stored low-order-byte first (0x48) when seen as an integer, and (b) it is not the expected value.

This will almost certainly cause your program to take an access fault. The length, which used to be 76, is now seen as the value 0x00000050 ('P' is 0x50) or decimal 80. We are well and truly screwed at this point. You don't even want to think about what is going to happen when the length value is 'PQRS'!

The difference between the "release" heap and the "debug" heap is that the "release" libraries assume that you haven't made any such blunder; that your code is perfect, and there is no reason to waste valuable computing resources in trying to figure out that you screwed up. It trusts you, with a pure form of hero-worship unmatched in human beings. You can truly Do No Wrong. Unfortunately, the hardware is there to detect when you did. Sometimes.

In the debug heap, a lot of extra effort goes into checking those values. The most common form of failure is that you get some sort of debug assertion failure in the heap allocator. If you trace back to the source code, you'll see that it is very unhappy about some value somewhere.

Sometimes, by examining the nearby locations, you can see what an incredibly stupid mistake you made. Addresses that look like characters (ask the debugger to show them in character format!) may contain recognizable characters and you can say "Doh! I did that allocation wrong!" and go on with life.

This is the rare case. The rest of the time, you've overwritten the heap with data that is very hard to interpret, but the results are equally fatal.

In addition, the debug heap initializes heap storage. It stores 0xCD in every byte that is validly allocated. So for example, when you initially allocate the string, you will see the memory contains a lot of "�" characters (that's an upper-case I with an acute accent on it in the standard ISO-8859-1 font that is the most common font used for Windows in Western culture). So the memory would be

address value state meaning
100000 100016 A pointer to next block
100004 6 length
100008 '����' allocated
100012 '����' allocated, no man's land
100016 100096 F pointer to next block
100020 76 length of this block

The six bytes we allocate are initialized to 0xCD, �,. But what about those last two?

They are in what is called "no man's land". They are initialized to 0xFD, the "no man's land" constant, which appears in the ISO-8859-1 font as "�".

When you free storage, the debug heap checks to see that you haven't accidentally overwritten the "no man's land" values with anything else. If you do, you will get an assert failure at the time you free the storage. You will know that you have had a "harmless" data overrun. But the next time your error might not be harmless!

The debug heap also tends to put "trailer" blocks on the storage areas. For example, a redundant allocator might store values such as

address value state meaning
100000 100024 A pointer to next block
100004 6 length of this block
100008 '����' allocated
100012 '����' allocated, no man's land
100016 100000 pointer to this block
100020 6 length of this block
100024 100096 F pointer to next block
100030 68 length of this block

When you go to free a block, it checks that the pointer at 100000+ceil4(length)+8 contains a reference to 100000. If it doesn't, it's been clobbered, and this is reported when the block is freed. (In this example, ceil4 is a function which takes whatever the value is and, if it is not an exact multiple of 4 adds 1, 2, or 3 to the value to obtain a value which is a multiple of 4, that is, ceil4(9) == ceil4(10) == ceil4(11) == ceil4(12) == 12).

Similarly, during the allocation process, as it is searching for a useful block, it checks all the blocks it passes over to make sure they are intact.

In any case, whatever algorithm your allocator uses, it detects that somebody has seriously damaged the integrity of the heap, and you get an assertion failure. It doesn't happen when the damage is done; it happens when the damage is detected. If, in the process of allocation, you don't trip over a damaged block, you won't notice it. So you could do tens, or thousands, or heap operations before the nature of your request causes a damaged block to be examined.

Freeing storage

When you free storage, in the release version, nothing much happens to it. The overhead values are changed, of course, to indicate that the storage is now free. But that's pretty much all that happens. But in the debug heap, the contents of the heap are erased, set to the magical value 0xDD (character '�'). So if the storage is freed, you would see

address value state meaning
100000 100016 F pointer to next block
100004 8 length
100008 '����' freed pattern
100012 '����' freed pattern
100016 10096 F pointer to next block
100020 76 length

except you probably wouldn't actually see this. Allocators are free to practice what is called "adjacent free block coalescing", that is, if there are two free blocks, they are merged to form a single free block:

address value state meaning
100000 100096 F pointer to next block
100004 88 length
100008 '����' freed pattern
100012 '����' freed pattern
100016 '����' freed pattern
100020 '����' freed pattern

This is another place where you will often get assertion failures in the debug allocator: when it goes to free storage. It not only checks that you haven't overwritten the "no man's land" area; it also checks that the adjacent block is also well-formed. If you clobbered it, you'll get an error.

Some allocators do "aggressive coalescing", that is, the always try to combine blocks on every free operation (which means they have to look at the previous block as well). Others do what is called "lazy coalescing"; they don't combine blocks until they feel like it. Each of these has different performance issues.

I freed my blocks but my program didn't get smaller!

There is a myth that freeing storage makes your program smaller. This is unlikely in the extreme. It just depends on what you mean by "smaller".

There's total memory, and there's active memory.

Total memory is the amount of address space which is allocated to your process. It can be huge. See my essay on "how large is my program?" But at any given instant, practically none of it is used. In fact, for any given instruction, the average number of pages that might be touched will be some number like 1.4 or so (some instructions don't touch memory, such as register-to-register instructions). But we normally care about slightly coarser granularity.

Over a small block of execution, such as a time slice you get from the scheduler, typically some tens of pages might be touched. In rarer cases, hundreds of pages. This is normal. The set of most-recently-used (MRU) pages forms what is called the "working set". If you don't use memory, whether you free it or not, it drifts out of the working set and therefore you aren't touching a lot of pages.

But if you don't free memory, each allocation is going to consume more total memory. And if you build a linked list, it means that you get more and more likely to have the next element on the list be in a different page than its predecessor or successor. This leads to a problem of "unbounded working set". Each time you walk along the list, you are touching a different page. This means you could be getting not tens, or even hundreds, of page faults per timeslice, but thousands.

This means your program requires a lot of physical memory to run. This means it will potentially take lots of page faults as you demand more and more physical memory, and other applications also demand that memory. So when you force their pages out, they take lots of page faults, and when then run, they force your pages out, and you may not even have as much physical memory on your machine (512M) as you have data pages (2000M), so you are constantly taking page faults. This is called "page thrashing".

Just to give you an idea about performance: if you get your data in the L1 cache, it costs you 0 CPU clock cycles to get it. If it is in the L2 cache, it costs you 1 or 2 clock cycles to get it (full-speed [Xeon] or half-speed [Celeron] caches). If it isn't in either cache, and you have to go to main memory to fetch it, it costs you a delay of 20 CPU clock cycles. But if you take a page fault, it costs you 20,000,000 clock cycless! That's six orders of magnitude performance loss. Sometimes dynamic memory allocation in small chunks is more destructive to your performance than allocating a single large chunk!

Some allocators try to optimize allocation by keeping the storage compactly allocated. This can be challenging, but it can be done. The C/C++ allocators do not do this.

But suppose you have memory allocated like shown below (red is allocated, yellow is free)

| | | | | | | | | | | | | | | | | | | | | | - | | | | | | | | | | | | | | | | | | |

Note that this allocator uses lazy coalescing because there are two adjacent free blocks.

Now you free up a lot of space:

| | | | | | | | | | | | | | | | | | | | | | - | | | | | | | | | | | | | | | | | | |

What, exactly, can be :"returned to the operating system"?

Answer: nothing. While there may be "free pages" in the middle, you can't return them to the operating system because they contain those headers about the block sizes and pointers to the next block. If they were "returned to the operating system", any attempt of the allocator to use those blocks would take access faults! All that really happens is that those blocks, since they are not used, will fade from the working set and the working set will become smaller (small working set is a Good Thing). But you won't see anything in any of the tools that talk about program size (other than working set size) that would indicate your program got "smaller" because you freed memory!

Stale Pointers

One other major way of getting heap damage is that you keep a pointer to storage that has been freed, and erroneously use it to store something in that space or read something from that space. Such a pointer is called a "stale pointer" or a "dangling pointer". You are writing to space you no longer own.

Because the storage might have been re-allocated for some other purpose, storing through the pointer can cause some other data structure to be damaged. That data structure is an "innocent bystander". Whoever allocated it legitimately believed that they, and they alone, were in charge of the contents of that structure. Now your erroneous code comes along and just jams some random value into one of those memory locations, thus completely invalidating what was going on.

If you are very, very lucky, the field was a pointer and you've clobbered it with some value which will lead very quickly to an access fault as the value you stored there could not be interpreted as a valid pointer. The hardware will catch you.

But given how prevalent pointers are, you may well store a pointer to something that makes sense to you, but to the actual owner of the structure, they interpret it as a pointer to something else entirely, and the world begins to disintegrate as that structure is damaged, which can then propagate the damage to someplace else, and it may be a very long time before the manifestations become evident.

Premature Freeing

One cause of stale pointers is when one part of the program frees up a block of memory while other parts are still assuming their pointers are valid. The result is that pointers which are supposed to be valid become stale. Premature freeing is one of the common errors.

Uninitialized pointers

When you execute a constructor, you should initialize every pointer in your class, preferably to some useless value like NULL. In the release version, you simply inherit whatever bits were left on the heap. This means if there had formerly been a pointer there, it is still a pointer, and if you access what it points to, you can happily write all over something, or read some value that makes no sense but use it anyway. If that value is a pointer, then again we have all kinds of possibilities for collateral damage.

The debug heap and debug stack avoid this because all heap pointers are initialized to the illegal value 0xCDCDCDCD when an allocation is done (or the no-man's land value), all values in the block (except the invisible overhead bytes) are set to 0xDDDDDDDD when storage is freed, and all stack variables are initialized to 0xCCCCCCCC in the debug stack. But you shouldn't rely on this because, while it helps, it isn't good enough. If you made an error that is not caught during development because that particular sequence of code was not encountered, then when it finally is encountered in the release code, you will get heap damage. Instead, you should always initialize to NULL which has the advantage that it fails in both the debug and release versions. Even though it will cause the release version to crash with an access fault, the cause is immediate and proximate to the actual error, not the consequence of seventh-level storage damage detected billions of instructions later.

Memory Fragmentation

This is another danger of memory allocators. Let's look at that block of memory, only this time we'll assume all those rectangles represent 1000 bytes and we have 20,000 bytes total, and all allocations are in multiples of 1,000 bytes just to simplify the discussion

| | | | | | | | | | | | | | | | | | | | | | - | | - | - | | - | - | | | - | | - | - | | | | | | | | 1 | | 2 | 3 | | 4 | 5 | | | 6 | | 7 | 8 | | | | | | |

The numbered blocks represent blocks that have been allocated. Note that some of them are 2000 or 3000 bytes in length in this example.

So you add up: 1 + 2 + 2 + 3 + 2 + 1 + 3 + 1 = 15, and realize that you have 5K bytes available. So you try to allocate 3000 bytes, and you get back a NULL It should be obvious why: although there are 5000 bytes available, there are not 5000 contiguous bytes available. Your memory has fragmented. You can't allocate the space you need.

Doing lots of allocations, especially with an "aggressive splitter" allocator ("first-fit", the classic C/Unix algorithm, is an aggressive splitter) means that large blocks are preferentially split into small blocks.

So the obvious solution is to "repack" the memory to maximize the hole size:

| | | | | | | | | | | | | | | | | | | | | | - | - | - | - | - | - | - | - | | | | | | | | | | | | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | | | | | | | | | | | |

which would allow a 3000 byte allocation to run. But it turns out this isn't possible. In C and C++, you had a pointer to block 2. It now points to the middle of block 2, not the start. You had a pointer to block 5. It now points to the contents of block 6. In C and C++, you can't do memory compaction because to do so you would have to find every pointer to every block and adjust them to represent the new storage layout. This isn't going to happen.

Languages like Java, C#, and Managed C++, however, do know where all the pointers are, and can fix them up.

But that copying can generate a lot of page faults, and it can take a lot of time even if it doesn't page fault. So compaction doesn't happen all the time, only when the situation is desperate. So, in a compactiing allocator, the allocator knows that it has 5K, it can't find a 3K hole, so it compacts, but only at that point. We can forget storage compaction as a workaround for C/C++ unless we're willing to go through a lot of work to do it ourselves!

Freeing memory does not necessarily create "more" space. It does mean that the space that has been freed is available for reallocation.

Don't be surprised if you "run out of storage" even though you think some should be there. This also means that massive allocations, such as for a giant bitmap image, may not work. Asking for 100MB may not work because the allocator migh not be able to find a 100MB hole in your heap. This is why large image processing tends to break the picture up into smaller, manageable chunks. It also adds complexity to some algorithms.

Managed languages also make it impossible to do buffer overruns because _all_accesses are checked, all the time. There's no "debug" or "release" difference in how they work. Because they are garbage-collected, there's no explicit delete operation, so you can't get stale pointers. If a pointer exists, the managed system guarantees that it points to legitimate storage, and that storage will not be freed until all of the pointers to it have been freed.

Pointers are not permitted in managed code. In managed languages, they are replaced with a pointer-like concept, references, but a reference is a pointer on which you can't do pointer arithmetic. This saves you from the possibility of writing something through a bad pointer, because the concept does not exist. There is no such concept as an uninitialized reference (they are all initialized to NULL, or the managed language's equivalent thereof), so you can't have uninitialized pointer errors. Managed languages protect you from most of the ugliness of storage allocation.

(In Managed C++, however, you can do pointer arithmetic. This is considered very dangerous).

But we're talking unmanaged C/C++ here. The storage allocator is In Your Face, All The Time.

Storage fragmentation is deadly to programs like System Services. If you are running 24/7/365, and you reboot your server once every two years, even a tiny amount of fragmentation can add up to a massive loss of usable storage. This is one reason I don't favor C++/MFC in System Services. because you can't control the memory allocation closely enough. Managed languages would be much nicer for System Services.

So what about all those assertions in the allocator?

A common observation I see far too often is "There's something wrong with the debug heap. My program keeps crashing in the debug version, but not in the release version. It runs fine in the release version".

This shows a deep misunderstanding of the whole concept of debugging. You do not even consider creating or running a release version if there is any failure in the debug version.

This is not a negotiation. This is not even worthy of discussion. Until the debug version works, the release version is meaningless.

Some people seem to think that those assertions represent bugs in the debug runtime, because they never get assertions in the release version. Hello, people! That's what a release version does. The assertions are not compiled in a release version, so of course they can't happen. It doesn't mean the problem has gone away; what it means is the problem is not being reported!

These people fail to realize is that their code does not "run fine" in the release version. In fact, the only statement they can make is "in the release version, the fundamental screwups in my code aren't detected, so I can pretend they aren't really there". This is known as the "ostrich approach" to programming. "If I don't see the bug, it isn't there".

This is also what is known in the trade as a "delusional dysfunction".

[Ostriches, by the way, don't bury their heads to avoid looking at danger. This is a myth, but one that has become entrenched in popular folklore].

The problem with those nasty assertions is that people will report "I don't see anything wrong with my code" because they are looking only at the call site, where in fact everything may be fine. The damage was done billions and billions of instructions ago, and you are just now discovering it. The call you are making is quite possibly an innocent bystander.

Let's look at that storage. Suppose we had our example

| | | | | | | | | | | | | | | | | | | | | | - | | - | - | | - | - | | | - | | - | - | | | | | | | | 1 | | 2 | 3 | | 4 | 5 | | | 6 | | 7 | 8 | | | | | | |

Now, somebody who is writing block 6 writes 1200 bytes, and we get

| | | | | | | | | | | | | | | | | | | | | | | - | | - | - | | - | - | | | - | | - | - | | | | | | | | | 1 | | 2 | 3 | | 4 | 5 | | | 6 | | 7 | 8 | | | | | | | |

Note here that the free block between 6 and 7 has now been partially overwritten. Your next four allocations are for 1000 bytes each, and we'll assume "first fit" here for the method of allocation

| | | | | | | | | | | | | | | | | | | | | | | - | - | - | - | -- | - | - | -- | -- | - | | - | - | | | | | | | | | 1 | 9 | 2 | 3 | 10 | 4 | 5 | 11 | 12 | 6 | | 7 | 8 | | | | | | | |

So blocks 9, 10, 11 and 12 are all successfully allocated. Next, we free block 8:

| | | | | | | | | | | | | | | | | | | | | | | - | - | - | - | -- | - | - | -- | -- | - | | - | - | | | | | | | | | 1 | 9 | 2 | 3 | 10 | 4 | 5 | 11 | 12 | 6 | | 7 | 8 | | | | | | | |

Even if we were doing aggressive coalescing, block 8 only has to look back to block 7 (which is still intact) to see that it is still allocated and a coalesce can't happen. So the allocator is still happy, thinking everything is intact.

Now we go to allocate another 1000 bytes. Using "first fit", we go traipsing across the storage, seeing that blocks 1, 9, 2, 3, 10, 4, 5, 11, 12 are all fine. Even block 6 looks good, because it has a pointer to the free block that used to follow it. Except when we get there, we find that it does not look like a valid heap header block. And _that's_when you get the assertion failure in the allocator! (The release version, trusting that everything is good, will simply pick up the pointer and try to use it. If you are lucky, this will cause an immediate Access Fault) This detection was a long time after the damage was actually done, and we were able to do new and delete without any ill effects even after the damage had been done. You might be tempted to wonder what had gone wrong in your call to allocate a new block, but that isn't where the problem is.

A similar error would occur if you tried to free block 6, or if you freed block 7and under aggressive coalescing it tried to see if it could merge with the block in front of it, only to find there wasn't a valid block header there. So you see, where the damage is detected is not related to where it is caused.

[If you are familiar with the roadrunner cartoons, imagine the following scene: the roadrunner has undercut the cliff. The cliff falls down. The coyote is left standing in midair. He doesn't start to fall until he looks down and sees that there is nothing to stand on. That new, malloc, delete, or free call is the point where the coyote looks down. The assertion failure is that little puff of dust as he hits the ground a thousand feet below...but the real damage had been done long before when the cliff fell away.]

So when you get one of these errors, it is almost, but not quite, a certainty that the call you just made on the allocator is not the call that has caused the damage. It is the call that has detected the damage.

Now, for simplicity, let's assume that we allocated block 2 as 2000 bytes, that is, 500 32-bit pointers, to objects of type A. We allocated block 3 as 2000 bytes, that is, 500 32-bit pointers, to objects of type B.

Then, due to a programming error, we wrote 3000 pointers into block 2, pointing to objects of type A.

We have just clobbered half of the pointers in block 3, which now, instead of pointing to objects of type B, point to objects of type A (well, a bit under half because we also clobbered the control blocks of block 3 as well). But the code that is working on the array in block 3 assumes that these pointers are pointing to blocks of type B, and happily uses them, thus corrupting the information in the blocks of type A. Then when the code erroneously uses the blocks of type A which have been damaged uses information that is also erroneous, and clobbers something else. This can continue for many levels and you don't see the failure until the ultimate damaged location is used in a way that the hardware refuses to process, for example, you stored a small integer into what was supposed to be a pointer field and somebody goes off and uses it. Boom! Access fault!

| | | A* | B* | | | | | | | | | | | | | | | | - | --- | --- | - | | - | - | | | - | | - | - | | | | | | 1 | | 2 | 3 | | 4 | 5 | | | 6 | | 7 | 8 | | | | |

becomes

| | | A* | B* | | | | | | | | | | | | | | | | - | --- | --- | - | | - | - | | | - | | - | - | | | | | | 1 | | 2 | 3 | | 4 | 5 | | | 6 | | 7 | 8 | | | | |

True story: around 1982, I was working with one group that not only had an intractable memory damage bug, but one that did not appear in the debug version of the application (it was on a VAX machine, which did not have the concepts of separate debug and release heaps, but since I wrote the allocator I had a lot of these techniques in place. But the bug would just slither away. Unfortunately, this meant that it hit our customers but not our developers.

The debugging tools were not terribly good, either; we didn't have source debuggers for our implementation language. I spent 17 straight hours tracing assembly code, instruction-by-instruction, tracing the damage back. It was seventh-level damage. That means that the original damage caused something else to be damaged, which caused something else to be damaged, which... until we got back to the original cause of the error. The original cause was a piece of non-robust code that should never have been written the way it was.

The error, by the way, was that someone had written

array[index++] = value; if(index == number_of_items) ... report error

but never bothered to reset the index, so the next time through, the index was not equal to the count, but greater than, so they went happily off, clobbering some adjacent structure. And we didn't get a hardware fault (an access fault) until the damage had propagated seven levels.

Hint: when testing bounds, always write

if(index >= number_of_items)

that way, when you do make a mistake, you get some reasonable response to it, not just random damage!

I received an official award for this; at some point I have to dig it out and have it framed. It showed a Flounder spearing a bug.

But it worked in the debug version, but not in the release version!

Yes, there are very interesting cases where your code will work apparently without error in the debug version and fail in the release version. Nearly all of these failures are the result of program errors which the debug version was incapable of detecting, but the release version is sensitive to. They rarely involve heap damage, and they are addressed in aseparate essay.

So how do I avoid memory damage?

Lots of ways. The most important mechanism is to write your code correctly. The second-most-important mechanism is to write it in such a way as so when you do screw up, the error is detected early.

Here are some techniques I've applied over the years.

Do not use malloc in C++ programs.

This is just silly. Unless you are really interfacing at the lowest level, implemented operator new, for example, you have no excuse for using malloc. So don't do it.

Avoid allocating arrays with new

Whenever possible, which is most of the time, you can avoid ever replacing malloc with new of an array. For example, there is no real advantage to recoding

LPDWORD b = (LPDWORD)malloc(n * sizeof(DWORD));

with

LPDWORD b = new DWORD[n];

if you continue to use non-robust algorithms to manage your data. Instead, use CArray or std::vector for such purposes.

Avoid fixed char arrays

Generally, you should forget the data type char exists at all! This is based on the naive myth that you will never require Unicode characters. Always program "Unicode-aware", even if you don't think you're going to need it. But code like

char buffer[20]; sprintf(buffer, "%d", value);

belong in the dustbin of history. Just forget they exist. When you're programming in MFC, use CString::Format. There's no reason to use these antiquated relics of the PDP-11 implementation ever again.

Use CString for all string operations

At least in MFC, CString is even preferred to std::string, and all strings should be stored as CString values, not as char *, or even TCHAR *. You should never be allocating arrays of characters (charor wchar_t) unless you have compelling reasons. And you usually don't have compelling reasons. Most people who use malloc or newof arrays of characters are simply using antiquated techniques they learned in their first C course. These are not appropriate for modern programming methodologies.

Use bound-checked arrays

For example, instead of doing

LPDWORD b = new DWORD[n];

I'm much more likely to use

CDWordArray b; b.SetSize(n);

for two reasons:

  1. It will automatically be deallocated when the variable leaves scope
  2. It will be bounds-checked when I access it

People who are comfortable with std::vector will advocate it for the same reasons, and they are absolutely right as well. There is no reason not to use bounds-checked arrays.

Note that if you need to pass these values into some API or library function, for example, one that requires an array of DWORDs, you can do it by

SomeAPI(b.GetData(), b.GetSize());

because GetData will return the address of the value array.

Always provide bounds for all operations

Never assume that your buffer is going to be "big enough". You must always pass in an explicit buffer length, and it must always be checked. Fortunately, this is now easy. See below.

I once worked on an application that assumed they knew what MAX_PATH should have been. They were sure it was 256, and throughout their code, the magic number 256 was hardwired. It wasn't even a #define, because there was no need to change it. Needless to say, one client had a path that was longer, and was actually MAX_PATH. Since nobody bothered to check the array bounds, damage propagated all over the place.

(For those who don't know, MAX_PATH is defined as 260 in the current Microsoft C header file stdlib.h)

Always avoid strcpy and strcat

And in fact, always program Unicode-aware, and these operations aren't, but you must also avoid the Unicode-aware operations _tcscpy and _tcscat.

Use the safe-string functions. For example, read aboutstrsafe.h and look at operations such as StringCchCopy and StringCchCat. These take explicit buffer lengths and therefore will not commit a buffer overrun.

Initialize all pointers

In your constructors, always set pointer values to NULL. For stack variables, always set pointers to NULL.

Sure, the debug version initializes them to bogus values, but if you don't test the path that would use an uninitialized value at debug time, then you'll get a random value in the release version which just might not access fault, but point to something that shouldn't be damaged.

Declare pointers at the point of initialization

Far too many C programmers think C++ is C, and they will jam all their declarations up into the top of the block, just like we had to do in all those obsolete languages like Algol and Pascal. This is just plain silly. In C++, you can declare a variable anywhere. So don't declare it until it is needed. Instead of writing

Whatever * p = NULL; ... lots of lines here p = new Whatever;

instead just do

...lots of lines here Whatever * p = new Whatever;

that way, you don't have it sitting around as a candidate for error. You don't declare it until it is needed! So the declaration and the initialization are essentially concurrent, not separate by tens or even hundreds of lines, and there is no danger you will accidentally use the variable before it has been properly initialized, because syntactically this would not be possible!

When possible, NULL pointers out after a delete operation

If you have a pointer to a value in a class, and you free the object pointed to, set the pointer to NULL. That way, if you accidentally use the value you'll take an access fault, rather than read from or write to the recently-freed or possibly recently-reallocated memory.

Use smart pointers

"Smart pointers" are essentially reference-counted pointers. When used properly, they reduce (or eliminate) the possibility of stale pointers. If you have complex lifetime issues and distributed lifetime issues, these are a Really Good Idea.

I still have memory damage problems!

The Good News is that if you follow the above guidelines, you probably won't have memory damage from buffer overruns. But you can still get it from any of the other causes. So you followed all the rules, and you're still not seeing it. You've got a 100,000 lines of code, and now it's crashing. Where do you start looking?

The simplest mechanism I've used is the OnIdle test. In your CWinApp-derived class, add an OnIdle handler.

long CMyApp::OnIdle(int count) { ASSERT(_heapchk() == _HEAPOK); return CWinApp::OnIdle(count); }

Because this is an ASSERT it only gets compiled into the debug version. You might feel free to add an additional surrounding #ifdef MEMORY_CHECKING or some similar preprocessor symbol of your choice. You would define this to turn on the memory checking.

What does this do? Well the _heapchk function runs through the heap checking the heap integrity. If you made some error, the next time you go into message wait you will verify the heap. The ASSERT will fail. This is really nice because it catches the damage relatively early. So, if I select the "Do this" menu item, or click the "Do that" toolbar button, or whatever, and the next thing that comes out is this assertion failure, I pretty much know that

  1. It hadn't happened before I selected that operation or the ASSERT would have failed earlier
  2. It happened as a consequence of that selection

This narrows down where the damage can be.

If you still can't stop it, start throwing those ASSERT statements liberally around the execution path that is followed by the path starting at theOnDoThis handler for the "Do this" menu item. As long as you don't have multiple threads out there bashing your memory, this will really narrow it down.

And what about those threads?

Put the ASSERT in your thread loop. Screw the performance. You're looking for errors! That's why we have conditional compilation!

I once turned on all the tracing, checking, and logging features of my allocator. The program was running about 100 times slower than normal. So I went down to the company cafeteria and had a nice pizza. When I got back three hours later (I had to talk to people, didn't I?) it was still running. Twenty minutes later, the allocator detected the error, and we had it nailed. It took time, but no effort. Never confuse the two. Search for zero-effort solutions.

What about HeapValidate?

The HeapValidate call provides for a system heap what _heapchkdoes for the C/C++ runtime heap. The problem is that HeapValidateisn't all that useful. For example, each major block of C/C++ runtime heap is a single "Heap" in the HeapValidate sense, and while the innards of the heap may be completely trashed, the HeapValidate API tells you everything is just fine. So largely, this API doesn't do you much good.

Stack clobbers

These are actually fairly easy to find. You die with an access fault on a return statement. You find yourself stating at a piece of assembly code that makes no sense at all (often just repeats of some single instruction, such as

int 3

which is the interpretation of the byte sequence 0xCC, or

add byte ptr [eax],al

which is the interpretation of 0x0000.

Sometimes you get an instruction fault, and frequently you get an access fault. Often the stack backtrace is completely trashed. If you see this, you can have a high confidence that you overwrote an array on the stack. Go read the rules about not putting fixed-size arrays on the stack, or anywhere else for that matter.

Remember, stacks grow downward in memory. Suppose I had a function

void SomeFunction(first, middle, last) { DWORD data[5]; ...

Then our stack would look, after the call, like

stack bottom � this way
last parameter
middle parameter
first parameter
return address
old frame pointer (EBP)
data[4]
data[3]
data[2]
data[1]
data[0]

Now suppose I had a loop

for(int i = 0; i < sizeof(data); i++) data[i] = rand();

Note the error here. I used sizeof(data) which would be 20, but then I used that as an index! I should have used **sizeof(data)/sizeof(DWORD)**or sizeof(data)/sizeof(data[0]). So what does it do? When it goes to write data[5], it clobbers the old EBP with random data (thus destroying my ability to see a stack backtrace), data[6] overwrites the return address with random data; data[7] clobbers the value of the first parameter, data[8] clobbers the value of the second parameter, and data[9] clobbers the value of the third parameter. After that, the values of the local variables, and perhaps the EBP, return address, and parameters of the calling function will all be destroyed, if the caller's stack frame was small.

This Is Not A Pretty Sight.

Now this damage will cause the old EBP, now a random value, to be placed in the EBP register, and then it will attempt to return to the location specified by the damaged return address. Who knows? There might even be instructions there! (And if there are, you've just seen the buffer-overrun exploit take control of your machine...)

Fortunately, once you've seen a couple of these, they become easier to find. Unfortunately (there's always the Good News/Bad News problem) there's no simple trick you can do to avoid them.

BUT--and this is one of the new features of VS.NET (which in spite of the fact the IDE sucks, has a really great compiler)--in debug mode you can enable stack-frame checking. Each stack frame gets a gratuitous padding of bytes which are set to known values. Whenever a return occurs, a function is called that checks this hidden space on the stack and verifies that it is still the stack's equivalent of the No Man's Land values used in the heap. If it has changed, you are likely to get a much more useful error notification than "Instruction fault at..." with some unintelligible hex address.

Desperation Mode

The problem with some of the heap damage is that while your objects are getting clobbered, the actual heap control blocks are actually quite safe. Your data is being trashed, but all the _heapchk operations give you a clean bill of health.

These are the times that try the souls of programmers.

At one point, I built up the following logic: each structure contained a checksum field. The checksum had a value which was computed by (literally) summing up all the DWORD values found in the structure using a suitablechecksum algorithm. The actual checksum you use doesn't matter, as long as it is reasonably robust. Ideally it isn't to hard to compute.

I placed the checksum value as the first DWORD in the structure.

I removed all public variables from the classes (not a bad idea, anyway) and replaced them with accessor function.

When I called a set method, I set the value then recomputed the checksum.

When I called a get method, I recomputed the checksum.

void CMyClass::setWhatever(int w) { whatever = w; CHECKSUMMING(chk = checksum(this, sizeof(this))); }

int CMyClass::getWhatever() const { ASSERT(chk == checksum(this, sizeof(this))); return whatever; }

where I had defined

#ifdef _CHECKING #define CHECKSUMMING(x) x #else #define CHECKSUMMING(x) #endif

so if I were checking, I'd actually compute the checksums, and if I weren't, no code was generated.

The idea is that there is a checksum which is only valid if there has been a deliberate access that sets a field of the class. If someone else reaches in an clobbers something, the checksum will no longer be valid. Since all legitimate accesses are controlled, all damage has to be from illegal accesses.

This found the problem. It was a dangling pointer. All of my classes were large enough, and the offset of the damaged field small enough, so that it never damaged any heap control block information. But it trashed the actual contents far too badly to manage.

Desperate times require desperate measures.

In one case in an old C program, I had replaced all calls on mallocwith calls on my GetSpace function. The GetSpace function actually linked all my allocated blocks together. Periodically I would just race down the list of allocated blocks and check all the checksums! This was a very aggressive approach, and not easily accomplished in C++ because of the difficulty in overriding new. The error in that case (this was about 1986 or so) was both remarkably stupid (in terms of the coding error) and remarkably subtle (in terms of the damage). When I had replaced malloc/free with my GetSpace/FreeSpace, and run the checksums, the bug popped out in under five minutes of runtime! On an 8088.

The App Verifier

Microsoft has a nice little tool, the "App Verifier", which you can download from their site.

http://msdn.microsoft.com/en-us/library/ms220948.aspx

This has a number of heap tests. One of the powerful ones is the PageHeap test. In this test, blocks are not allocated out of the generic heap as I've shown above, but in fact every block is allocated at the end of a page! Furthermore, the page before and the page after the page in question are "guard pages" and do not exist. Therefore, it you write off the end of a block, you typically write into a nonexistent page and get caught right away. When a block is freed, its page is marked as inaccessible, and any attempt to use a stale pointer takes an access fault. This is, needless to say, a voracious use of memory; you are urged to have at least 256MB of physical memory and a 1GB paging file to run anything using the PageHeap test.

Summary

This essay attempts to explain, particularly to beginners, the nature of memory damage, its causes, and possible cures. It also shows two of the more powerful techniques I have used to detect memory damage as early as possible, thus making it easier to identify.

[Dividing Line Image]

The views expressed in these essays are those of the author, and in no way represent, nor are they endorsed by, Microsoft.

Send mail to newcomer@flounder.com with questions or comments about this web site.

Copyright � 2006, The Joseph M. Newcomer Co.All Rights Reserved
Last modified: May 14, 2011