[Scummvm-devel] Weird crash (Mac OS X only? big endian only?)

Max Horn max at quendi.de
Thu May 15 03:44:09 CEST 2003


Yo folks,

last night on #scummvm we discussed a very nasty (crashing / memory  
corrupting bug). That bug occurs in COMI on Mac OS X, and does so  
apparently for quite some time. It's the "garbage when looking at  
inventory" bug which e.g. jake described and which I never could  
reproduced. While we know found out how to 100% reproduce the bug, and  
how to work around it, I am pretty clueless right now as to what is its  
immediate cause and we could fix it. Hence I am writing all details  
known to me to the list now, in the hopes that a) somebody has a good  
idea, b) maybe some other (big endian?) system can reproduce this, and  
c) maybe by explaining it I have some good idea myself 8-)

The bug, AFAIK, only occurs in COMI. I recommend post 0.4.0 CVS on a  
Mac OS X machine, but older versions work, too.

Short version:  Get the save game from  
<https://sourceforge.net/tracker/ 
index.php?func=detail&aid=737139&group_id=37116&atid=418820>. Compile  
ScummVM. Then, strip the binary (strip scummvm). If you now load the  
save game using that stripped binary, one of various effects should  
occur (it segfaults, you get garbage on the screen, or an error message  
about resource problems).

Long version. This also happens if you build w/o Simon or Sky. So we  
thought, why would stripping cause this? Maybe the strip tool is  
defective? But that seemed unlikely since other games work fine, and  
the error pattern doesn't quite fit. One thing stripping causes: the  
binary goes from 8 MB to 1.3 MB. So wjp suggested to malloc 6MB ram in  
main(). I did that, and see, the stripped version now doesn't crash  
anymore! The same effect could be achieved by inserting a global  
variable of 6 MB size (i.e. int global_dummy[6*1024*1025]). Looking at  
the memory layout, code is followed by the heap on Mac OS X. And the  
symbol information which strip removes from the binary is at the end of  
the binary. On MacOS X, the heap follows after the code and grows  
upwards. So roughly spoken, the various scenarios look like this:

Unstripped, no tricks: works fine
+------+---------+---------
| Code | Symbols | Heap -> ...
+------+---------+---------

Stripped, no tricks: crash / odd behavior
+------+---------
| Code | Heap -> ...
+------+---------

Stripped, malloced a 6 MB block: works fine
+------+-------------------
| Code | 6MB block : Heap -> ...
+------+-------------------

Stripped, 6MB static data: works fine
+-------------------+-------
| Code : 6MB static | Heap -> ...
+-------------------+-------


Apparently, inserting a buffer between code and heap, separating the  
two, "fixes" the problem. I continued and performed various  
experiments; e.g. I decreased the buffer size to see when it would  
crash to act up again. It seem ~2.5 MB are needed, that's when I  
noticed problems - but of course, it could be more, since memory  
corruption can come apparent at a much later time only.
I also wrote checkHeap() function (hooking into our existing CHECK_HEAP  
calls), which checks the first and last 100kb of the malloced data  
block, and whether they are still zero. Nada. This triggered nothing (I  
"only" check a total of 200kb because checking the full 6 MB is 30  
times slower... too slow to make testing fun). Again I experimented  
with many different buffer sizes between 1 and 6 MB, but never observed  
a corruption of the buffer block :-/
Maybe I need to try buffers > 6 MB.

For a given "buffer" size, the same problems occurs pretty predictably;  
for example, with one it segfaults; slightly de- or increase the  
buffer, and it might only cause screen garbage (two kinds: either real  
"garbage" is draw; but sometimes the screen has parts of its content  
duplicated in the wrong parts etc. I have added a few screenshots to  
<https://sourceforge.net/tracker/ 
index.php?func=detail&aid=737139&group_id=37116&atid=418820>).  
Sometimes it causes various kinds of errors (e.g. "BlastObject object  
114 (3) image not found").

We also run valgrind on ScummVM (on Linux/x86 of course). Nothing :-/

Any suggestions what kind of bug could cause this? The only two things  
I could think of:

a) A global array in the code (like shake_positions in gfx.cpp) is  
written to with an out-of-bounds (OOB) index -> that would damage data  
in the heap. And if there is a buffer, then it only hits the buffer  
(harmless) thus rendering it "harmless". Why did it not trigger my heap  
check routine, though? Well, maybe it writes zeroes (but I tested with  
both a 0-filled buffer and a 0xE7-filled buffer). Or maybe it still  
writes beyond the buffer, but into other "harmless" data (but that  
seems fairly unlikely, given that I tried various buffer sizes)

b) A global array in the heap (read: any pointer) is accessed with a  
huge negative index. W/o buffer it overwrites code, with it, only the  
buffer. Same problems as in a). In addition, the observed behavior  
doesn't quite match up with overwritten code (that should lead to  
straight crashes).


Any other ideas? Even if they are wild, I am interested in hearing them.

Anyway, I just started a run with a 6 MB buffer filled with 0xE7, where  
it checks the whole buffer every time. It'll be some time before it  
finishes startup, but I'll get back with the results to you.


Cheers,

Max





More information about the Scummvm-devel mailing list