key/value store optimized for disk storage

Tim Chase python.list at tim.thechases.com
Fri May 4 13:46:43 EDT 2012


On 05/04/12 12:22, Steve Howell wrote:
> Which variant do you recommend?
> 
> """ anydbm is a generic interface to variants of the DBM database
> — dbhash (requires bsddb), gdbm, or dbm. If none of these modules
> is installed, the slow-but-simple implementation in module
> dumbdbm will be used.
> 
> """

If you use the stock anydbm module, it automatically chooses the
best it knows from the ones available:

  import os
  import hashlib
  import random
  from string import letters

  import anydbm

  KB = 1024
  MB = KB * KB
  GB = MB * KB
  DESIRED_SIZE = 1 * GB
  KEYS_TO_SAMPLE = 20
  FNAME = "mydata.db"

  i = 0
  md5 = hashlib.md5()
  db = anydbm.open(FNAME, 'c')
  try:
    print("Generating junk data...")
    while os.path.getsize(FNAME) < 6*GB:
      key = md5.update(str(i))[:16]
      size = random.randrange(1*KB, 4*KB)
      value = ''.join(random.choice(letters)
        for _ in range(size))
      db[key] = value
      i += 1
    print("Gathering %i sample keys" % KEYS_TO_SAMPLE)
    keys_of_interest = random.sample(db.keys(), KEYS_TO_SAMPLE)
  finally:
    db.close()

  print("Reopening for a cold sample set in case it matters")
  db = anydbm.open(FNAME)
  try:
    print("Performing %i lookups")
    for key in keys_of_interest:
      v = db[key]
    print("Done")
  finally:
    db.close()


(your specs said ~6gb of data, keys up to 16 characters, values of
1k-4k, so this should generate such data)

-tkc



More information about the Python-list mailing list