Survey: bsddb is definitely broken. Should it be fixed, or deprecated?

Kragen Sitaker kragen at pobox.com
Tue May 14 02:23:12 EDT 2002


garth at deadly*****serious.com (Garth T Kidd) writes:
> Specifically, shelve uses anydbm by default, which on many systems
> uses bsddb by default, and bsddb is quite broken, and has been since
> Python 1.5 and maybe earlier:
> 
> http://sourceforge.net/tracker/?group_id=5470&atid=105470&func=detail&aid=408271

Specifically, what that bug report says is that BSD DB 1.85 is really
broken, but modern versions of BSD DB work.  The version of BSD DB in
my system libc (I'm running Debian GNU/Linux) works fine.

> The easiest solution is to deprecate bsddb and take it out of the list
> of contenders for anydbm. I've put up a patch for both:

BSD DB 1.x has been deprecated by its authors for years, largely
because it is so buggy.  Presumably you are having this problem
because you are using an obsolete version of BSD DB.  The current
version of BSD DB, 3.x, is open source and easily downloadable from
www.sleepycat.com; I think 2.x is integrated into the GNU C library,
which is standard equipment on most Linux systems.

I think you may be using ActiveState's distribution of Python, and
that ActiveState may have bundled an obsolete buggy version of BSD DB
with it, possibly because they didn't like the license on current
versions.

> Existing databases will still be fine, because whichdb.whichdb will
> figure it out and load bsddb. New databases, however, will avoid
> bsddb.

I have a better solution than deprecating bsddb.  If you're going to
ship BSD DB with a copy of Python, don't ship the obsolete, buggy BSD
DB 1.x.  Don't build the bsddb module where there isn't a modern
version of BSD DB installed.  Don't screw your customers.  (Perhaps
for backward-compatibility, you could still include the obsolete,
buggy BSD DB, but provide access to it via a module called
obsoleteversionof.bsddb; anydbm must fall back to
obsoleteversionof.bsddb when trying to read a BSD DB file but no bsddb
module is present.)

I think this is a better solution because BSD DB is by far the best
dbm-type library available, and it's wonderful that there's an
interface to it in the standard Python library.  Deprecating it would
be a huge mistake.

The above bug URL refers to
http://www.deadly*****serious.com/Python/2002/05/06.html (expletive
starred out, sorry), which follows:

#!/bin/python
# Force the error we've been looking at

import unittest

class BreakHashDB(unittest.TestCase):
    def runTest(self):
        import md5, bsddb, os

        m = md5.new()
        b = "!" * 129       # small string to write

        db = bsddb.hashopen(self.dbname, 'c')
        self.db = db
        for count in xrange(1, 1000000):
            if count % 100==0:
                print "    %d\r" % (count),
            m.update(str(count))
            db[m.digest()] = b

    def unlinkDB(self):
        import os
        if os.path.exists(self.dbname):
            os.unlink(self.dbname)

    def setUp(self):
        self.dbname = 'test.db'
        self.unlinkDB()

    def tearDown(self):
        self.db.close()
        self.unlinkDB()

if __name__ == '__main__':
    runner = unittest.TextTestRunner()
    runner.run(unittest.TestSuite([BreakHashDB()]))

This program ran fine on my laptop, although it took four hours and
made the whole machine quite slow due to the enormous amount of disk
I/O it generated.




More information about the Python-list mailing list