[Scummvm-devel] Weird crash (Mac OS X only? big endian only?)
Max Horn
max at quendi.de
Thu May 15 03:44:09 CEST 2003
Yo folks,
last night on #scummvm we discussed a very nasty (crashing / memory
corrupting bug). That bug occurs in COMI on Mac OS X, and does so
apparently for quite some time. It's the "garbage when looking at
inventory" bug which e.g. jake described and which I never could
reproduced. While we know found out how to 100% reproduce the bug, and
how to work around it, I am pretty clueless right now as to what is its
immediate cause and we could fix it. Hence I am writing all details
known to me to the list now, in the hopes that a) somebody has a good
idea, b) maybe some other (big endian?) system can reproduce this, and
c) maybe by explaining it I have some good idea myself 8-)
The bug, AFAIK, only occurs in COMI. I recommend post 0.4.0 CVS on a
Mac OS X machine, but older versions work, too.
Short version: Get the save game from
<https://sourceforge.net/tracker/
index.php?func=detail&aid=737139&group_id=37116&atid=418820>. Compile
ScummVM. Then, strip the binary (strip scummvm). If you now load the
save game using that stripped binary, one of various effects should
occur (it segfaults, you get garbage on the screen, or an error message
about resource problems).
Long version. This also happens if you build w/o Simon or Sky. So we
thought, why would stripping cause this? Maybe the strip tool is
defective? But that seemed unlikely since other games work fine, and
the error pattern doesn't quite fit. One thing stripping causes: the
binary goes from 8 MB to 1.3 MB. So wjp suggested to malloc 6MB ram in
main(). I did that, and see, the stripped version now doesn't crash
anymore! The same effect could be achieved by inserting a global
variable of 6 MB size (i.e. int global_dummy[6*1024*1025]). Looking at
the memory layout, code is followed by the heap on Mac OS X. And the
symbol information which strip removes from the binary is at the end of
the binary. On MacOS X, the heap follows after the code and grows
upwards. So roughly spoken, the various scenarios look like this:
Unstripped, no tricks: works fine
+------+---------+---------
| Code | Symbols | Heap -> ...
+------+---------+---------
Stripped, no tricks: crash / odd behavior
+------+---------
| Code | Heap -> ...
+------+---------
Stripped, malloced a 6 MB block: works fine
+------+-------------------
| Code | 6MB block : Heap -> ...
+------+-------------------
Stripped, 6MB static data: works fine
+-------------------+-------
| Code : 6MB static | Heap -> ...
+-------------------+-------
Apparently, inserting a buffer between code and heap, separating the
two, "fixes" the problem. I continued and performed various
experiments; e.g. I decreased the buffer size to see when it would
crash to act up again. It seem ~2.5 MB are needed, that's when I
noticed problems - but of course, it could be more, since memory
corruption can come apparent at a much later time only.
I also wrote checkHeap() function (hooking into our existing CHECK_HEAP
calls), which checks the first and last 100kb of the malloced data
block, and whether they are still zero. Nada. This triggered nothing (I
"only" check a total of 200kb because checking the full 6 MB is 30
times slower... too slow to make testing fun). Again I experimented
with many different buffer sizes between 1 and 6 MB, but never observed
a corruption of the buffer block :-/
Maybe I need to try buffers > 6 MB.
For a given "buffer" size, the same problems occurs pretty predictably;
for example, with one it segfaults; slightly de- or increase the
buffer, and it might only cause screen garbage (two kinds: either real
"garbage" is draw; but sometimes the screen has parts of its content
duplicated in the wrong parts etc. I have added a few screenshots to
<https://sourceforge.net/tracker/
index.php?func=detail&aid=737139&group_id=37116&atid=418820>).
Sometimes it causes various kinds of errors (e.g. "BlastObject object
114 (3) image not found").
We also run valgrind on ScummVM (on Linux/x86 of course). Nothing :-/
Any suggestions what kind of bug could cause this? The only two things
I could think of:
a) A global array in the code (like shake_positions in gfx.cpp) is
written to with an out-of-bounds (OOB) index -> that would damage data
in the heap. And if there is a buffer, then it only hits the buffer
(harmless) thus rendering it "harmless". Why did it not trigger my heap
check routine, though? Well, maybe it writes zeroes (but I tested with
both a 0-filled buffer and a 0xE7-filled buffer). Or maybe it still
writes beyond the buffer, but into other "harmless" data (but that
seems fairly unlikely, given that I tried various buffer sizes)
b) A global array in the heap (read: any pointer) is accessed with a
huge negative index. W/o buffer it overwrites code, with it, only the
buffer. Same problems as in a). In addition, the observed behavior
doesn't quite match up with overwritten code (that should lead to
straight crashes).
Any other ideas? Even if they are wild, I am interested in hearing them.
Anyway, I just started a run with a 6 MB buffer filled with 0xE7, where
it checks the whole buffer every time. It'll be some time before it
finishes startup, but I'll get back with the results to you.
Cheers,
Max
More information about the Scummvm-devel
mailing list