key/value store optimized for disk storage

Steve Howell showell30 at yahoo.com
Wed May 2 23:20:23 EDT 2012


On May 2, 7:46 pm, Paul Rubin <no.em... at nospam.invalid> wrote:
> Steve Howell <showel... at yahoo.com> writes:
> >   keys are file paths
> >   directories are 2 levels deep (30 dirs w/100k files each)
> >   values are file contents
> > The current solution isn't horrible,
>
> Yes it is ;-)
> > As I mention up top, I'm mostly hoping folks can point me toward
> > sources they trust, whether it be other mailing lists, good tools,
>
> cdb sounds reasonable for your purposes.  I'm sure there are python
> bindings for it.
>
> http://cr.yp.to/cdb.htmlmentions a 4gb limit (2**32) but I
> half-remember something about a 64 bit version.

Thanks.  That's definitely in the spirit of what I'm looking for,
although the non-64 bit version is obviously geared toward a slightly
smaller data set.  My reading of cdb is that it has essentially 64k
hash buckets, so for 3 million keys, you're still scanning through an
average of 45 records per read, which is about 90k of data for my
record size.  That seems actually inferior to a btree-based file
system, unless I'm missing something.

I did find this as follow up to your lead:

http://thomas.mangin.com/data/source/cdb.py

Unfortunately, it looks like you have to first build the whole thing
in memory.







More information about the Python-list mailing list