[Scummvm-devel] kMD5FileSizeLimit = 1024 * 1024

Max Horn max at quendi.de
Fri Aug 4 21:37:35 CEST 2006

Am 04.08.2006 um 20:56 schrieb Stuart George:

> On 8/4/06, Max Horn <max at quendi.de> wrote:
>>> why 1mb? why not all or nothing?
>> Maybe that was meant rhetoric, but let me answer it anyway:
>> 1) "why 1mb?" -- I was arguing *against* 1mb... hu?
>> 2) "why not all?" -- because some detect files are several 100 MB
>> big, and you don't want to compute the MD5 of that for performance
>> reasons
> you do if you want to be accurate.. which is the whole point of  
> doing an MD5!

MD5 is simply a (cryptographic, by chance, though that is irrelevant  
for us) hash function. What one does with it is up to the user :-).  
For you it might only be interesting to compute the MD5 of whole  
files (being "accurate" as you call it), but for others (e.g. me), it  
works nicely as a tool to fingerprint a small portion of a file.

> what if the there is an update > 100mb with two versions of the
> same game? you wont find it. thats why I said it should be all, or  
> nothing.

Hu? I don't understand what you are trying to say here? What updates?

> now what games have individual files > 100mb that are non media  
> (scripts,
> game logic etc?) 100mb is probably good. heck if you used just the  
> game logic
> files you could probably have something really small.. until you hit
> the game that uses some internal zip/wad system to pack everything  
> into
> one file.... partial md5 just dont sit well with me :) it defeats  
> the point
> of fingerprinting the file.

Again, I have troubles parsing what your say (stupid language  
barrier). You seem to ask for an example of a game with a 100MB+ data  
file. Well, the mac versions of several SCUMM games (e.g. DOTT) ship  
as such a single big file that contains "everything".


> the user must understand, nothing comes for free. if they have some
> 20mhz MIPS device with 128kb ram and a 4gb CF microdrive using iso  
> images
> or something, hey, they need to understand those consequences.

So we tell them: "Suckers, buy a faster device, we could speed up  
ScummVM for you but we think such morons as you deserve to suffer?" :-)

> if you are going to start scanning files, you are going to take a  
> hit somewhere.
> to me, its a moot point how long it takes. use threads, fire one off
> for each directory.

Threads (which don't even exist on all of our ports) will usually not  
produce a speed gain in an I/O bound scenario as the one discussed here.

> where is your baseline? does it have to be acceptable on a 4.1mhz  
> gameboy colour
> with 4kb ram? how slow it too slow for doing MD5 hashes?
> you say slow CD drive with lots of files is too slow for MD5. At  
> that point its
> not the MD5 thats slow, its the drive in the dreamcast or the user  
> with the old
> 1x cdrom.. its an IO bound issue not an algorythm issue. Switching  
> to CRC32
> over MD5 in that case would save you nothing.
> obviously there is no perfect solution :) which doesnt help us much  
> with
> such different port hardware (DC, gp32, etc)....
> {{throws hands in air}}

I really don't see what you are arguing about, or what you are aiming  
at?!? To cut it down, you are saying: "Because people have to wait  
some time anyway, let's not bother about how long they have to wait,  
even though it would be trivial for us to reduce the time." Or did I  
miss something ?

Really, all I was asking some engine maintainers to consider reducing  
the limit that we *already* impose on MD5 fingerprinting some more.  
Which is terribly simple task to do, the hardest part is recomputing  
the MD5 checksums, which is only "hard" if you do not have access to  
all relevant variants of a game. Which is why I didn't just reduce  
the limit myself, but asked engine maintainers to consider it :-).

The net benefit from it is less wasted CPU power and I/O bandwidth to  
detect a game. Which is *usually* not so important for detecting a  
single file, but becomes important (as pointed out) when scanning a  
bunch of directories. Yes, on a fast high end machine, the speed gain  
might be negligible. But on a 500 Mhz PC with an older (and hence  
slow) HD the difference is *very* noticeable. Telling those users to  
fuck off and buy a new computer because we are too lazy to make the  
above change seems to me like a very strange way to argue about this  

Of course, if you have any *technical* reasons why we shouldn't lower  
the MD5 fingerprinting limited -- as opposed to reasons based on your  
believe on how MD5 should (not) be used, or where people with slow  
hardware should shove their games to -- then we can discuss those :-)


More information about the Scummvm-devel mailing list