[Scummvm-devel] kMD5FileSizeLimit = 1024 * 1024

Max Horn max at quendi.de
Fri Aug 4 19:24:07 CEST 2006


Am 04.08.2006 um 15:56 schrieb Stuart George:

> On 8/4/06, Max Horn <max at quendi.de> wrote:
>> so, how about reducing kMD5FileSizeLimit from 1 MB to something  
>> smaller?
>> Maybe even depending on the engine...
>>
>> For SCUMM, I can work on this eventually. But also the GOB, Kyra  
>> and Lure
>> engines use the first 1 MB of a file. I don't have access to any  
>> of their
>> data files from where I am right now, so I can't check if they  
>> even have
>> detection files that are that big. But considering that new  
>> engines are
>> usually started by looking at / copying from existing engines, I  
>> think it
>> would be nice to lower these values to something "sensible", too (for
>> reference, saga and simon use 5000 bytes, the remaining engines  
>> don't use
>> MD5 at all).
>
> why 1mb? why not all or nothing?

Maybe that was meant rhetoric, but let me answer it anyway:

1) "why 1mb?" -- I was arguing *against* 1mb... hu?
2) "why not all?" -- because some detect files are several 100 MB  
big, and you don't want to compute the MD5 of that for performance  
reasons
3) "why not nothing?" -- because several of our engines compute an  
MD5 fingerprint of certain files to detect and distinguish many games.


> you can use a 16kb buffer or anysize and just md5sum the whole thing.

Hu? Not sure what you mean... What would we use the 16kb buffer for?

> why is only the first X being summed?

For performance reasons. You don't want to compute the MD5 (or the  
CRC, for that matter) of a 250 MB file. And 1 MB is IMO too large  
once we allow a "scan directory" option, since one might end up  
scanning dozens and more files. Usually a couple kilobyte are enough  
to generate a useable "fingerprint" of a file.

> when I was workig on sarien2 I was using zipfiles you get crc32's as a
> free bonus
> along with builtin file lists and such. (sdl has a nice zip library  
> too)...

You don't get the CRC for free! After all, you first have to put  
those files into a ZIP archive. So it has to be computed, too, and  
more, you *have* to use a ZIP file. In particular, you'll have a hard  
time using this "feature" for files which reside on a CD...


> but anyway...
>
> With the AGI engine the files are so small, md5summing is overkill.

Well, of course we could use CRC instead of MD5, but that's not  
really the point here. The use of MD5 is not the issue at hand :-).

>
> doing an md5sum of a big file is not very time consuming.

Depending on your processor (think mobile device with a low spec  
CPU), your storage media (slow CD drive), and your data file size  
(couple hundred MB), and the number of files you need to compute it  
for (could easily become dozens or hundreds) -- yes it is :-).



Cheers,
Max






More information about the Scummvm-devel mailing list