key/value store optimized for disk storage

Paul Rubin no.email at nospam.invalid
Wed May 2 23:29:11 EDT 2012


Steve Howell <showell30 at yahoo.com> writes:
> Thanks.  That's definitely in the spirit of what I'm looking for,
> although the non-64 bit version is obviously geared toward a slightly
> smaller data set.  My reading of cdb is that it has essentially 64k
> hash buckets, so for 3 million keys, you're still scanning through an
> average of 45 records per read, which is about 90k of data for my
> record size.  That seems actually inferior to a btree-based file
> system, unless I'm missing something.

1) presumably you can use more buckets in a 64 bit version; 2) scanning
90k probably still takes far less time than a disk seek, even a "seek"
(several microseconds in practice) with a solid state disk.

> http://thomas.mangin.com/data/source/cdb.py
> Unfortunately, it looks like you have to first build the whole thing
> in memory.

It's probably fixable, but I'd guess you could just use Bernstein's
cdbdump program instead.

Alternatively maybe you could use one of the *dbm libraries,
which burn a little more disk space, but support online update.



More information about the Python-list mailing list