[Scummvm-devel] kMD5FileSizeLimit = 1024 * 1024

Stuart George yakumo9275 at gmail.com
Fri Aug 4 20:56:12 CEST 2006

On 8/4/06, Max Horn <max at quendi.de> wrote:
> >
> > why 1mb? why not all or nothing?
> Maybe that was meant rhetoric, but let me answer it anyway:
> 1) "why 1mb?" -- I was arguing *against* 1mb... hu?
> 2) "why not all?" -- because some detect files are several 100 MB
> big, and you don't want to compute the MD5 of that for performance
> reasons

you do if you want to be accurate.. which is the whole point of doing an MD5!

what if the there is an update > 100mb with two versions of the
same game? you wont find it. thats why I said it should be all, or nothing.

now what games have individual files > 100mb that are non media (scripts,
game logic etc?) 100mb is probably good. heck if you used just the game logic
files you could probably have something really small.. until you hit
the game that uses some internal zip/wad system to pack everything into
one file.... partial md5 just dont sit well with me :) it defeats the point
of fingerprinting the file.

> > why is only the first X being summed?
> For performance reasons. You don't want to compute the MD5 (or the
> CRC, for that matter) of a 250 MB file. And 1 MB is IMO too large
> once we allow a "scan directory" option, since one might end up
> scanning dozens and more files. Usually a couple kilobyte are enough
> to generate a useable "fingerprint" of a file.

right, so you have an option that says

"we are going to scan everything under this folder and update the
installed game database. This will take some time"

and it happens once.

> You don't get the CRC for free! After all, you first have to put
> those files into a ZIP archive. So it has to be computed, too, and
> more, you *have* to use a ZIP file. In particular, you'll have a hard
> time using this "feature" for files which reside on a CD...

and those users who convert music to mp3/ogg and convert movies
to dxa what do they do? (i'm not saying scummvm should go to zips,
it was just an example of something I had looked into in the past).

>> doing an md5sum of a big file is not very time consuming.
> Depending on your processor (think mobile device with a low spec
> CPU), your storage media (slow CD drive), and your data file size
> (couple hundred MB), and the number of files you need to compute it
> for (could easily become dozens or hundreds) -- yes it is :-).

the user must understand, nothing comes for free. if they have some
20mhz MIPS device with 128kb ram and a 4gb CF microdrive using iso images
or something, hey, they need to understand those consequences.

if you are going to start scanning files, you are going to take a hit somewhere.
to me, its a moot point how long it takes. use threads, fire one off
for each directory.

where is your baseline? does it have to be acceptable on a 4.1mhz gameboy colour
with 4kb ram? how slow it too slow for doing MD5 hashes?

you say slow CD drive with lots of files is too slow for MD5. At that point its
not the MD5 thats slow, its the drive in the dreamcast or the user with the old
1x cdrom.. its an IO bound issue not an algorythm issue. Switching to CRC32
over MD5 in that case would save you nothing.

obviously there is no perfect solution :) which doesnt help us much with
such different port hardware (DC, gp32, etc)....

{{throws hands in air}}

-- Stuart George
 Homepage : http://mega-tokyo.com

More information about the Scummvm-devel mailing list