[Scummvm-devel] Archive subclass inconsistencies (Was: [Scummvm-cvs-logs] SF.net SVN: scummvm:[52120] scummvm/trunk/common/fs.cpp (and r52121))
Max Horn
max at quendi.de
Mon Aug 16 23:10:09 CEST 2010
Am 16.08.2010 um 20:24 schrieb Eugene Sandulenko:
> On Mon, 16 Aug 2010 19:55:35 +0200
> Max Horn <max at quendi.de> wrote:
>
>> Hi there,
>>
>> as a quick reply (I need to look at this closer), let me point out
>> that yes, we only want to list files, *not* directories. All of the
>> Archive classes is designed around that assumption. If
>> FSDirectory::listMembers lists directories, then that is IMO a
>> regression, which could have serious consequences.
>>
>> Eugene, can you explain what the motivation for your changes was in
>> the first place?
> OK, as discussed on the IRC, I just reverted it altogether.
>
> Thus, this has to be turned into discussion, so CC'ing -devel.
>
> Here is my problem:
>
> I got game data inside of archive:
>
> m/foo/text.txt
> m/bar/text.txt
> baz/1.dat
> baz/2.dat
>
> There are 2 ways to access it. One is via ZipArchive class, and another
> one is via FSDirectory class, both extending Archive.
>
> I have 3 calls:
>
> listMembers()
> listMatchingMembers(m/*)
> listMatchingMembers(baz/*.dat)
>
> Now ZipArchive gives me somewhat expected results (since directories
> have also entries in the archive).
This is a violation of the Archive specification, i.e., a bug in ZipArchive. We specifically and on purpose designed Archive so as to not handle directories at all, for ease of use. We could change that, but then we need to revise the whole Common::Archive design and watch out for all the consequences.
Actually, the Archive docs are not that clear right now :-(. We really need to write proper, good and complete documentation & specification for them. And then make sure the code conforms to this (and add test cases for this, either as unit tests or as part of the "testbed" engine).
So ZipArchive needs fixing, *and* we may need to redesign the whole Archive stuff if it does not meet your needs. A lot of sweat and pain went into the current design, by the way, to make it work in lots of scenarios, but it certainly is far from perfect. One nice perk of adding directory support for Common::Archive might be that one could potentially create "sub"-archives for subdirectories (or for .zip files contained in another archive...). But one does not want to confuse code that does not expect directories. *And* all this gets complicated further by the desire of some engines/games to use Archive to provide a flattened view of their file system, so that they can lookup files by a simple filename, without having to explicitly search for it in multiple directories...
> The result below are after getName(),
> and all of them apparently contain correct pointers to ArchiveMember
> instances:
>
> listMembers() -> [ m/, m/foo/, m/foo/text.txt, m/bar/,
> m/bar/text.txt, baz/, baz/1.dat, baz/2.dat] (8 elements)
The directory entries are wrong and must go. What has to be debated now is which of these is supposed to be "correct":
1) m/foo/text.txt, m/bar/text.txt, baz/1.dat, baz/2.dat
2) text.txt, text.txt, 1.dat, 2.dat
In the original Archive design, the "flat" list in 2 would be the definite correct answer. But at some point we introduced a more elaborate scheme, allowing "non-flat" member names, as some engine(s) required that. May guess is that we partially screwed up doing that. Thinking about it, for FSDirs, the ArchiveMembers are simply FSNodes, and a plain FSNode (which is backend/system specific, anyway!) would not have any way to determine that it should print itself with a (possibly partial, at that!) path prefix. So it is not surprising that a FSDir member, which is an FSNode, would always print only its file name, without prefix...
Side note: Having multiple members with identical names *is* explicitly allowed because, well, it simply can happen in reality for many good (and some bad) reasons. Think of SearchSet, which can merge multiple Archives together. So this is a conscious design decision. When using Archive::getMember or Archive::createReadStreamForMember, the first match (ignoring case!), if any, is used. But by iterating over all members, one can find even the "hidden" duplicate entries.
> listMatchingMembers(m/*) -> [m/foo/, m/foo/text.txt, m/bar/,
> m/bar/text.txt] (4 elements)
Correct would be:
m/foo/text.txt, m/bar/text.txt
i.e. without directories.
> listMatchingMembers(baz/*.dat) -> [baz/1.dat, baz/2.dat]
That is correct.
>
> Additionally I can check result of second call for presence of '/' to
> distinguish directories (I know it's a hack).
It is a hack exploiting a bug, yeah :).
>
> But FSDirectory archive gives somewhat inconsistent results, and
> moreover, they're weird for the first call and completely wrong for
> the second call:
>
> listMembers() -> [ text.txt, text.txt, 1.dat, 2.dat] (4 elements)
>
> listMatchingMembers(m/*) -> [text.txt, text.txt] (2 elements)
>
> listMatchingMembers(baz/*.dat) -> [1.dat, 2.dat]
These look right for me if we assume a "flat" name hierarchy. But not if we use a non-flat hierarchy, then it would be wrong; and that is the default, isn't it?
The cause is clear: The archive members (which is what you are looping over) are FSNodes in this case, and when getName() is called, they return their actual name.
In contrast, in FSDirectory, we assign (at least in non-flat mode) longer names, similar to what FSNode does.
I just wrote some quick & dirty test code, and the result is that I (a) can confirm the outputs you get, and (b) if I call arch.hasFile(member.getName()), then for ZipArchive members I get true, but in the FSNode/FSDir case, I get false.
So, the problem here is that the member's getName method does *not* return the name the FSDir uses to address the member!
There are several ways to address that. First off, I clarified the Archive docs a little bit to state that listMembers etc. iterate over members, not over member names. Secondly, if you just want to read all members, you can use their createReadStream methods directly, not ever using Archive::hasFile().
Lastly, we need to discuss the supposed equality archive.getMember(name).getName() == name. Right now, FSDir violates this. We can fix that by changing FSDir to use GenericArchiveMember. Or, we can change the spec by saying that this equality may *not* hold, which of course is then again confusing for users who have not read the specs in detail... It also seems rather odd...
Bye,
Max
More information about the Scummvm-devel
mailing list