3 November 2005

Crash: C++ objects moved in memory after construction

I encounted a crash a few days ago that took two days of testing to resolve. It was caused by the addition of an otherwise simple 20-line header file.

Before the crash, my development machine had minor environment changes, and the test machine had always had a dirty environment, so much of the time debugging was spent in checking dependent DLLs and remote debugging. The test machine has specific hardware that is not available on my dev machine, so I was locked into testing on it. Several other niusance issues (network connectivity, remote database configuration, etc.) added to the frustration.

After eliminating the environments, I started looking at the code. Very little changed, and nothing changed in the area of the crash, so a code rollback was wisely-or-unwisely relegated to after the environments were examined. Also, the crash looked exactly the same, and manifested itself in the same MFC function, as a previous one where mixed debug DLLs were the cause. Not so here.

The symptoms:

  • A pointer member of CWnd that should be initialized to NULL ends up being set to 0xfdfdfd00 on first use. Its non-NULL state made it look valid and crash the application.
  • When the CWnd object is constructed, all members are correctly initialized (using the scarey macro AFX_ZERO_INIT_OBJECT(CCmdTarget) to memset() the class):
  • #define AFX_ZERO_INIT_OBJECT(base_class) \
    memset(((base_class*)this)+1, 0, sizeof(*this) - sizeof(class base_class));
  • After stepping out of the constructor, the address of the class is moved three bytes forward in memory, without moving the memory itself. The member that gets un-NULLed is the last member defined in the class header, and so three of its bytes run into that no man's land.

The "otherwise simple" header I included contained a small struct of basic data types packed on 1-byte boundaries. After getting another set of eyes on the problem, we saw that the pack-ing was not being un-pack-ed. Without getting into the specifics of the code's include hierarchy, the results were that sometimes a class's members will be packed on 1-byte boundaries and sometimes the same members will not. This is why I was seeing the object move in memory.

Humorously, we had expected either a file or name collision: duplicate files containing slightly different defines, or objects with duplicate names. The actual problem was a more subtle variation on that.

[ posted by sstrader on 3 November 2005 at 12:46:34 PM in Programming ]