[ scummvm-Patches-2062263 ] COMMON: Revised HashMap implementation

SourceForge.net noreply at sourceforge.net
Wed Aug 20 14:29:00 CEST 2008


Patches item #2062263, was opened at 2008-08-20 14:29
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=418822&aid=2062263&group_id=37116

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Max Horn (fingolfin)
Assigned to: Nobody/Anonymous (nobody)
Summary: COMMON: Revised HashMap implementation

Initial Comment:
Attached you'll find a patch which rewrites Common::HashMap to use some ideas from the very efficient Python dictionary implementation (see also <http://svn.python.org/view/python/trunk/Objects/dictnotes.txt> and <http://svn.python.org/view/python/trunk/Objects/dictobject.c>).

Some of the highlights:
* Hashtable sizes are now powers of 2, making it possible to replace expensive MOD operations by binary masking in the lookup code
* collisions handling now does not use linear displacement, but rather a more clever strategy -- for details, refer to the Python code
* instead of filling the table up to 75%, we reduced this to 2/3=67%, like in Python
* the default table size has been reduced to 8, sufficient for up to 5 elements -- a quick test seems to indicate that this accounts for  98%+ of all HashMaps used in ScummVM (of course this may vary depending on what code you test, so take this with a grain of salt)
* changed the string hash function to match that used in Python (this one is not mandatory, but since the CPython string hash is proven to work well with the CPython dictionary implementation, this seems like a good idea)

Some shortcoming / possible further improvements:
* When there are not enough free slots left, we double the table size; in Python, they quadruple it as long as the table contains less than 5000 elements; that improves performance for tables which grow quickly (say when you have a table and want to insert 100 elements in one go)
* Instead of storing _capacity, we could store _capacity-1 (in a new member var _mask), like CPython does; that would save one decrement when doing lookups, moving the cost (an increment) to rarely used functions.
* In python, the initial storage for the HashMap is allocated inside the HashMap (i.e. an array of 8 Node pointers); we could do something like that, which would save us a malloc for many many HashMaps, while wasting 8*32 bytes for the rare large HashMaps)


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=418822&aid=2062263&group_id=37116




More information about the Scummvm-tracker mailing list