memory efficient set/dictionary

Josiah Carlson josiah.carlson at sbcglobal.net
Mon Jun 11 12:20:37 EDT 2007


koara wrote:
>>> I would recommend you to use a database since it meets your
>>> requirements (off-memory, fast, persistent). The bsdddb module
>>> (berkeley db) even gives you a dictionary like interface.
>>> http://www.python.org/doc/lib/module-bsddb.html
>> Standard SQL databases can work for this, but generally your
>> recommendation of using bsddb works very well for int -> int mappings.
>> In particular, I would suggest using a btree, if only because I have had
>> troubles in the past with colliding keys in the bsddb.hash (and recno is
>> just a flat file, and will attempt to create a file i*(record size) to
>> write to record number i .
>>
>> As an alternative, there are many search-engine known methods for
>> mapping int -> [int, int, ...], which can be implemented as int -> int,
>> where the second int is a pointer to an address on disk.  Looking into a
>> few of the open source search implementations may be worthwhile.
> 
> Thanks guys! I will look into bsddb, hopefully this doesn't keep all
> keys in memory, i couldn't find answer to that during my (very brief)
> look into the documentation.

No, bsddb does not keep all data in memory.


> And how about the extra memory used for set/dict'ing of integers? Is
> there a simple answer?

A non-long integer for a 32 bit Python uses 12 bytes.  A non-long 
integer for a 64 bit Python uses 24 bytes.

Each entry in a dictionary for a 32 bit Python uses 12 bytes; 4 for the 
hash, 4 for the key pointer, 4 for the value pointer.  Double that to 24 
bytes each in the 64 bit Python (I don't know if the hash actually has 
its range increased, but if one doesn't double the size of the pointers, 
then you have alignment issues that can slow down dictionary access).

Sets only have the hash and key pointers, so only use 8/16 bytes per entry.


  - Josiah



More information about the Python-list mailing list