shelve->dbhash->bsddb bugs on Python 1.5.2 for Win32

Warren Postma embed at geocities.com
Wed Feb 16 13:50:57 EST 2000


I've been running little test scripts so far on all the database and
persistence code provided with Python.

Thus far I've found:

1. The bsddb 1.8 module included with Python on Windows appears to be stable
in btree mode, and very unstable in hash mode.

2. The shelve module provided with the system appears to only use bsddb on
Win32. Therefore shelve appears unstable.

3. The alternative bsddb 2.7.x modules provided by Robin Dunn appear to be
much more stable than the shelve routines provided with Python. The same
test code below tested directly on stock bsddb (using btopen) or Robin's
upgraded bsddb appears stable.

  - - -

Here is an example of how to make shelve die horribly. Be sure to delete the
old file from c:\temp before you run or you'll get new and even more
spectacular errors on the second time you run.  whichdb reports the file
created by this script is a dbhash file.


# DBTEST3.PY
#
# Stress-test 'shelve'(which in turns uses dbhash which uses bsddb)
#
# note: I'm aware that by using shelve to store a mere string
# rather than something more exotic, I'm not demonstrating a proper
# use of shelve, nevertheless I think I've demonstrated a bit of
# instability here.

import shelve
import bsddb

filename = "c:/temp/TestShelve.db"


import random
import time
import string

# full test range
#start,end = 4,17

# medium test
start,end = 4,13

# quick test
#start,end = 4,8

print "Database Stress Test in Python (Shelve/Pickle/DbHash)"
print

descriptions = [ "Add Rows", "Read Keys", "Shuffle", "Read Rows", "Delete
75%", "Close" ]

List1 = [ "Abc", "Def", "Ghi", "Jkl", "Mno", "Pqr", "Stu", "Vwx", "Yz" ]

List2 = [ "X123", "Y456", "Z789", "Y012", "Z345", "X678", "Y901", "Z234",
"X567" ]

# Shuffle an array like you might shuffle cards
# Note: This is intended to be Good Enough, Not Perfect.
# We limit shuffle operations to 100 per data set!
def Shuffle(ar):
 sz = len(ar)
 th = sz/2
 lp = sz/4
 if (lp > 100):
  lp = 100
        # now move a bunch of cards up or down:
        x1 = random.randrange(0,sz/2)
        x2 = random.randrange(0,sz/2)+(sz/2)
        c  = random.randrange(0,sz/4)
        ar[x1:x1+c], ar[x2:x2-c] = ar[x2:x2-c], ar[x1:x1+c]
        for k in range(0,lp):
     # do a little random substitution to kick things off
  for i in range(0,lp):
      x = random.randrange(0,th)
      y = random.randrange(th,sz)
      ar[x],ar[y] = ar[y], ar[x]














    # rough scramble of sections:

def testset(testindex,RowCount,db):
    times=[]
    starttime = time.clock()
    bytesread=0
    print "---- Storing "+`RowCount`+" rows in the database
 "+`testindex`+" ) ----"
    for n in range(0,RowCount):
     r = random.randrange(0,8)
         V = List1[r]*200
         K = List2[r]+"-"+`n`+`testindex` # avoid inorder insertion of keys
         #try:
         S = K+':'+V
            db[K] = S   # DB Btree-lookup Key 'K' has value 'V'
     #except bsddb.error: # what row number did we fail on?
     #    print "info: bsddb.error inserting db[",`K`,"] =  <"+`len(S)`+"
bytes of junk >"
     #    raise bsddb.error


    times.append(time.clock()-starttime)

    # Get Keys
    #print "Read Keys"
    Keys = db.keys()
    N = len(Keys)

    times.append(time.clock()-starttime)

    print "Shuffling Key array... (slow!)"
    # Scramble Keys But Good...
    Shuffle(Keys)

    # print Keys[0:10]  # taste and see

    times.append(time.clock()-starttime)

    print "After inserting ",RowCount," rows the Key Count is now ",
len(Keys)

    bytesread = 0
    #print "Reading Rows, in Random Order"
    for r in Keys:
        x = db[r]
        bytesread = bytesread + len(x)

    print "Bytes read = ", `bytesread`

    times.append(time.clock()-starttime)


    # Delete 75% of the data in the database:
    delcount = len(Keys) - ( len(Keys)/4 )
    for k in Keys[0:delcount]:
            del db[k]

    db.sync()
    Keys = db.keys();
    print "After deleting, the key count is ", len(Keys)

    times.append(time.clock()-starttime)

    #print "Closing"
    #print "Done"

    #times.append(time.clock()-starttime)
    print "Elapsed Times:"
    for i in range(0,5):
         print string.ljust(descriptions[i],20), ": ", times[i]
    print "-----------------"
    print


def testloop():
 db1 = shelve.open(filename)
 for i in range(start,end):
  testset(i,long(20**(i/4.0)),db1)
 db1.close()


testloop()


    - - -

Example output and traceback:

[.... previous page and half of output deleted ....]


---- Storing 1788L rows in the database (  10 ) ----
Shuffling Key array... (slow!)
After inserting  1788L  rows the Key Count is now  2027
Bytes read =  1238901
After deleting, the key count is  506
Elapsed Times:
Add Rows             :  1.50382422799
Read Keys            :  1.7653140929
Shuffle              :  2.72998586972
Read Rows            :  3.49537455309
Delete 75%           :  3.93048420107
-----------------

---- Storing 3782L rows in the database (  11 ) ----
Traceback (innermost last):
  File "DBTEST3.py", line 155, in ?
    testloop()
  File "DBTEST3.py", line 144, in testloop
    testset(i,long(20**(i/4.0)),db1)
  File "DBTEST3.py", line 82, in testset
    db[K] = S   # DB Btree-lookup Key 'K' has value 'V'
  File "c:\Python\Lib\shelve.py", line 71, in __setitem__
    self.dict[key] = f.getvalue()
bsddb.error: (0, 'Error')









More information about the Python-list mailing list