Request for comments on a design

Sat Oct 23 02:26:48 EDT 2010

I have a program that manipulates lots of very large indices, which I 
implement as bit vectors (via the bitarray module).   These are too 
large to keep all of them in memory so I have to come up with a way to 
cache and load them from disk as necessary.  I've been reading about 
weak references and it looks like they may be what I want.

My idea is to use a WeakValueDictionary to hold references to these 
bitarrays, so Python can decide when to garbage collect them.  I then 
keep a key-value database of them (via bsddb) on disk and load them 
when necessary.  The basic idea for accessing one of these indexes is:

_idx_to_bitvector_dict = weakref.WeakValueDictionary()

def retrieve_index(idx):
    if idx in _idx_to_bitvector_dict and _idx_to_bitvector_dict[idx] is 
not None:
        return _idx_to_bitvector_dict[idx]
    else:  # it's been gc'd
        bv_str = bitvector_from_db[idx]        # Load from bsddb
        bv = cPickle.loads(bv_str)                # Deserialize the string
        _idx_to_bitvector_dict[idx] = bv       # Re-initialize the weak 
dict element
        return bv

Hopefully that's not too confusing.  Comments on this approach?  I'm 
wondering whether the weakref stuff isn't duplicating some of the 
caching that bsddb might be doing.

Thanks,
-Tom