[Python-bugs-list] [ python-Bugs-445862 ] bsddb fails for larger amount of data
noreply@sourceforge.net
noreply@sourceforge.net
Sat, 04 Aug 2001 17:02:23 -0700
Bugs item #445862, was opened at 2001-07-30 00:21
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=445862&group_id=5470
Category: Extension Modules
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: bsddb fails for larger amount of data
Initial Comment:
The attached script fails after approx. 72500 insert
operations. If you vary the size of the keys and/or
the values, the bug occurs earlier or later, but even
with a value size of 1 the bug will occur. Probably,
this explains also bug #408271 ("crash in shelve
module").
Platform: W2K
----------------------------------------------------------------------
>Comment By: Tim Peters (tim_one)
Date: 2001-08-04 17:02
Message:
Logged In: YES
user_id=31435
Skip, I reran the test after changing the open line to
db = bsddb.btopen("test.dbm", "n")
I killed it by hand at this point:
Last i: 326577, last key:abcdef4387101.63608
because Win98SE gets mondo unstable when it starts
thrashing madly to disk, and it became impossible to get
any work done while this was running.
I don't know anything about the history, present, or
prospects for bsddb -- like, is there a more recent
unencumbered version we could use? It looks like Sam's
1.85 Windows port is over 5 years old.
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2001-08-04 15:59
Message:
Logged In: NO
According to www.sleepycat.com/historic.html,
talking about bsd db:
"we recommend that you avoid the following operations when
using versions 1.85 and 1.86:
o Btree cursor (seq and put using a cursor) operations.
o Large numbers of btree duplicates (specifically, avoid
migrating duplicate keys to internal pages).
o Large numbers of btree deletes (you should periodically
dump and rebuild the database if you delete large numbers
of records).
o Overwriting or deleting overflow hash key/data pairs
(pairs with items larger than the page size).
o Intermixing hash cursor operations with deletes. "
My problem arises, I think, because I have been doing the
fourth of these operations - i.e. overwriting long items in
a hash. The problems others are experiencing perhaps have a
similar cause, though the original problem summary
says "even with a value size of 1 the bug will occur", so
perhaps not.
I'm now using a workaround which involves writing several
shorter items, each containing a slice of the data formerly
held in the one long item. For keys I use my old key with a
subscript number appended. It isn't nice, but it seems to
be working.
Martin Gradwell.
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2001-08-04 08:12
Message:
Logged In: YES
user_id=44345
Based upon the traceback Tim reported, my guess is that
the exception is being raised near the end of bsddb_ass_sub.
Tim, can you give it a try changing anydbm.open to
bsddb.btopen? As I recall, the significant bug(s) in libdb
were in the hash file implementation. It's unfortunate
that anydbm has used the hash file all these years, but
it's a bit late to spring that change on unsuspecting
users now without going through a significant transition
period.
Skip
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-08-03 14:40
Message:
Logged In: YES
user_id=31435
Thanks for taking a look, Skip! On Win98SE it dies for me
like so:
...
70000
71000
72000
Last i: 72758, last key:abcdef1691515.8934
Traceback (most recent call last):
File "ka.py", line 15, in ?
db[key] = val
bsddb.error: (0, 'Error')
test.dbm is 37,778,944 bytes at the end. I assume
Anonymous has the same problem (if not, he/she should say
so).
On Windows we use the ancient db.1.85.win32.zip, from
the "bsd db" (not "bsddb"!) link at
http://www.nightmare.com/software.html
I doubt Sam has done any maintenance on that in years; and
afraid I don't know anything else about this.
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2001-08-03 13:25
Message:
Logged In: YES
user_id=44345
What version of libdb are you using? I'm running your
script on Linux at the moment. I had to change it slightly
because the only machine I have available with the spare
cojones to run that script is running 1.5.2 (so I call
random.uniform instead of using a Random instance). On that
machine I'm sort of ashamed to say I'm still running the
known buggy libdb 1.85. So far I'm up to 680,000 keys with
a db file of over 166MB with no problem. On my laptop
running 2.1 and libdb3 (and a much more modestly performing
disk drive) I gave up after about 287,000 keys.
I then changed the db open call to bsddb.btopen and watched
it march (slowly) up to 183,000 keys and a 32MB file on
disk before I killed it. Aside from the grief it gives my
disk drives, I don't see anything particularly bad
happening.
You didn't include a traceback with your bug report. What
was printed? Perhaps it's something simple like running
out of disk space. In any case, I think trying to create a
libdb database of 1,000,000 sort of random keys is going to
strain that package and most disk drives in any case, bugs
or no bugs.
My guess is that if there's a bug it's in libdb, not the
bsddb module.
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2001-08-03 00:50
Message:
Logged In: NO
Here it is:
import anydbm
import bsddb
import random
MAX = 1000000
r = random.Random(42)
r.seed(1017)
db = anydbm.open("test.dbm", "n")
#db = bsddb.hashopen("test.dbm", "n")
try:
for i in xrange(0, MAX):
if i % 1000 == 0: print i
key = "abcdef" + str(r.uniform(0, 10 * MAX))
val = "a" * 80 + str(i)
db[key] = val
finally:
db.close()
print "Last i: %s, last key:%s" % (i,key)
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-08-02 12:41
Message:
Logged In: YES
user_id=31435
Alas, there's no script attached -- please attach one, so
we have something concrete to investigate.
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2001-08-02 03:08
Message:
Logged In: NO
I was getting crashes in shelve module, Using NT4 (Python
2.1 (#15, Apr 16 2001, 18:25:49) [MSC 32 bit (Intel)] on
win32). I've changed my program to re-read previously
written keys fairly frequently, and I get keyerrors for
keys that have definitely been written, and that gave no
error a little earlier in the same program. The program
doesn't contain any delete statements.
The same program works when using dumbdbm instead of bsddb
(but produces huge indexes), so there definitely appears to
be a problem with bsddbm on windows NT.
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=445862&group_id=5470