[Python-bugs-list] whichdb is coded wrong (PR#97)

Guido van Rossum guido@CNRI.Reston.VA.US
Wed, 06 Oct 1999 11:12:24 -0400


> I attempted to reproduce the exercise in Lutz's "Programming Python"
> book using the anydbm module, pp. 39-41. After creating the
> underlying file, I re-open the file using again anydbm and a simple
> program fails with
> 
> File "/user/lib/python1.5/anydbm.py, line 83, in open
> raise error, "db type cannot be determined"
> anydbm.error: db type cannot be determined

Apaprently new versions of bsddb have 12 null bytes in front of the
magin number.  A patch for whichdbm.py exists in the CVS archives; it
is reproduced here:

Index: whichdb.py
===================================================================
RCS file: /projects/cvsroot/python/dist/src/Lib/whichdb.py,v
retrieving revision 1.4
retrieving revision 1.5
diff -c -r1.4 -r1.5
*** whichdb.py	1998/04/28 15:41:03	1.4
--- whichdb.py	1999/06/08 13:13:16	1.5
***************
*** 31,39 ****
      except IOError:
          return None
  
!     # Read the first 4 bytes of the file -- the magic number
!     s = f.read(4)
      f.close()
  
      # Return "" if not at least 4 bytes
      if len(s) != 4:
--- 31,40 ----
      except IOError:
          return None
  
!     # Read the start of the file -- the magic number
!     s16 = f.read(16)
      f.close()
+     s = s16[0:4]
  
      # Return "" if not at least 4 bytes
      if len(s) != 4:
***************
*** 48,53 ****
--- 49,64 ----
      # Check for GNU dbm
      if magic == 0x13579ace:
          return "gdbm"
+ 
+     # Check for BSD hash
+     if magic in (0x00061561, 0x61150600):
+         return "dbhash"
+ 
+     # BSD hash v2 has a 12-byte NULL pad in front of the file type
+     try:
+ 	(magic,) = struct.unpack("=l", s16[-4:])
+     except struct.error:
+         return ""
  
      # Check for BSD hash
      if magic in (0x00061561, 0x61150600):

> Instead of staticly coding whichdb, which might fail for various
> distribution/platform types, couldn't you create a generator that for the
> various test cases above generate a "tailored" whichdb for that particular
> distribution/platform?

The whichdbm module wants to be able to tell you the db type even if
you don't have the library code to read it.  Hardcoding a list of
magic numbers is a common approach.  Often (as you see here) the rules
aren't as simple as "look at the first 4 bytes", and adding a new file 
type requires a little bit of thinking.  Given the infrequent
appearance of new db types, an automated approach is hardly worth it.
(Prove me wrong by submitting the code :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)