[Python-Dev] Including BSDDB3

Skip Montanaro skip@pobox.com
Tue, 8 Jan 2002 09:10:25 -0600


    >> - It'd be great if we actually provided bsddb1, bsddb2, bsddb3 (and
    >> bsddb4?) modules which compile against the older libraries so
    >> databases written with any version could be accessed in Python.

    Martin> I'm not sure how that would work, though. 

Agreed.  I think trying to use multiple versions of libdb-generated files
simultaneously is a disaster waiting to happen.  It's unfortunate that the
folks at Sleepycat haven't been able to provide a more consistent data
format, but I understand that stuff is internal details and can change.
They have been pretty good about providing update tools.

What would be useful is if whatever bsddb module is installed could be more
intelligent about file version errors.  Instead of reporting something
inscrutable like

    >>> db = bsddb.hashopen("tour.db")
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
    bsddb.error: (-30990, 'Unknown error 4294936306')

I'd like it to realize that it was asked to open an old format file and give
a useful error message like:

    bsddb.error: (-30990, 'Attempt to open old format file - see db_upgrade(1)')

Sleepycat's tools can do this in the face of old files:

    % file tour.db
    tour.db: Berkeley DB (Hash, version 5, native byte-order)
    % db_dump tour.db > tour.txt
    db_dump: tour.db: hash version 5 requires a version upgrade
    db_dump: open: tour.db: DB_OLDVERSION: Database requires a version upgrade
    % db_upgrade tour.db
    % file tour.db
    tour.db: Berkeley DB (Hash, version 7, native byte-order)
    % db_dump tour.db > tour.txt

    Martin> Also, I think it is rare that multiple versions are installed on
    Martin> a single system: I doubt BSDDB even supports simultaneous
    Martin> installation of multiple header file sets, on Unix. 

Actually, RedHat & Mandrake do.  This leads to as many problems as it
solves.  Take a look at the code in setup.py:

    dblib = []
    if self.compiler.find_library_file(lib_dirs, 'db-3.2'):
        dblib = ['db-3.2']
    elif self.compiler.find_library_file(lib_dirs, 'db-3.1'):
        dblib = ['db-3.1']
    elif self.compiler.find_library_file(lib_dirs, 'db3'):
        dblib = ['db3']
    elif self.compiler.find_library_file(lib_dirs, 'db2'):
        dblib = ['db2']
    elif self.compiler.find_library_file(lib_dirs, 'db1'):
        dblib = ['db1']
    elif self.compiler.find_library_file(lib_dirs, 'db'):
        dblib = ['db']

    db185_incs = find_file('db_185.h', inc_dirs,
                           ['/usr/include/db3', '/usr/include/db2'])
    db_inc = find_file('db.h', inc_dirs, ['/usr/include/db1'])

And it's still not correct, as Barry indicated yesterday.  For example,
suppose that even though db3 is installed on your system you want to only
manipulate db2 databases (perhaps for compatibility with another machine).
You're stuck and have to edit setup.py or use Modules/Setup to build bsddb.

    Martin> So even while you can have multiple versions of the shared
    Martin> library installed, compiling it for use with these libraries may
    Martin> be tricky.

Got that right... ;-)

    Martin> For any other scenario, users are to blame for forgetting to
    Martin> update their database files when updating the libraries.

In the presence of anydbm, it's not obvious that users should know what file
format their underlying databases are.

Skip