key/value store optimized for disk storage

Steve Howell showell30 at yahoo.com
Wed May 2 22:14:54 EDT 2012


This is slightly off topic, but I'm hoping folks can point me in the
right direction.

I'm looking for a fairly lightweight key/value store that works for
this type of problem:

  ideally plays nice with the Python ecosystem
  the data set is static, and written infrequently enough that I
definitely want *read* performance to trump all
  there is too much data to keep it all in memory (so no memcache)
  users will access keys with fairly uniform, random probability
  the key/value pairs are fairly homogenous in nature:
    keys are <= 16 chars
    values are between 1k and 4k bytes generally
  approx 3 million key/value pairs
  total amount of data == 6Gb
  needs to work on relatively recent versions of FreeBSD and Linux

My current solution works like this:

  keys are file paths
  directories are 2 levels deep (30 dirs w/100k files each)
  values are file contents

The current solution isn't horrible, but I'm try to squeeze a little
performance/robustness out of it.  A minor nuisance is that I waste a
fair amount of disk space, since the values are generally less than 4k
in size.  A larger concern is that I'm not convinced that file systems
are optimized for dealing with lots of little files in a shallow
directory structure.

To deal with the latter issue, a minor refinement would be to deepen
the directory structure, but I want to do due diligence on other
options first.

I'm looking for something a little lighter than a full-on database
(either SQL or no-SQL), although I'm not completely ruling out any
alternatives yet.

As I mention up top, I'm mostly hoping folks can point me toward
sources they trust, whether it be other mailing lists, good tools,
etc.  To the extent that this is on topic and folks don't mind
discussing this here, I'm happy to follow up on any questions.

Thanks,

Steve

P.S. I've already found some good information via Google, but there's
a lot of noise out there.



More information about the Python-list mailing list