[ python-Bugs-857909 ] bsddb craps out sporadically

Sun Dec 21 23:48:04 EST 2003

Bugs item #857909, was opened at 2003-12-10 12:41
Message generated for change (Comment added) made by predragm
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=857909&group_id=5470

Category: None
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Predrag Miocinovic (predragm)
Assigned to: Nobody/Anonymous (nobody)
Summary: bsddb craps out sporadically

Initial Comment:
I get following from Python2.3.2 with BerkeleyDB 3.3.11
running on linux RH7.3;
------------------------
Traceback (most recent call last):
  File "/raid/ANITA-lite/gse/unpackd.py", line 702, in ?
    PacketObject.shelve()
  File "/raid/ANITA-lite/gse/unpackd.py", line 78, in
shelve
    wvShelf[shelfKey] = self
  File "/usr/local/lib/python2.3/shelve.py", line 130,
in __setitem__
    self.dict[key] = f.getvalue()
  File "/usr/local/lib/python2.3/bsddb/__init__.py",
line 120, in __setitem__
    self.db[key] = value
bsddb._db.DBRunRecoveryError: (-30987, 'DB_RUNRECOVERY:
Fatal error, run database recovery -- PANIC: Invalid
argument')
Exception bsddb._db.DBRunRecoveryError: (-30987,
'DB_RUNRECOVERY: Fatal error, run database recovery')
in  ignored
Exception bsddb._db.DBRunRecoveryError: (-30987,
'DB_RUNRECOVERY: Fatal error, run database recovery')
in  ignored
----------------------------------
The server reporting this is running at relatively
heavy load and the error occurs several times per day
(this call occurs roughly 100,000  per day, but only 42
times per any given shelve instance). It  reminds be of
bug report #775414, but this is a non-threaded
application. 
That said, another process is accessing the same
shelve, but I've implemented a lockout system which
should make sure they don't have simultaneous access.
The lockout seems to work fine. 
The same application is running on different machine using 
Python2.3.2 with BerkeleyDB 4.0.14 on linux RH9 and the
same error occured once (to my knowledge), but with
"30987" replaced by "30981" in the traceback above, if
it makes any difference. 
Finally, a third system, python2.3.2 with BerkeleyDB
4.0.14 on linux RH9 (but quite a bit faster, and thus
lighter load) runs w/o reporting this problem so far. 

I don't have a convenient code snipet to exemplify the
problem, but I don't do anything more than open (or
re-open) a shelve and write a single python object
instance to it per opening. If necessary, I can provide
the code in question. 

----------------------------------------------------------------------

>Comment By: Predrag Miocinovic (predragm)
Date: 2003-12-21 18:48

Message:
Logged In: YES 
user_id=860222

I find the last comment somewhat unsatisfactory. While this
appears to be BerkeleyDB issue (and w/o going into details
why the exception gets thrown), it's strange that Shelve
module doesn't handle this more gracefully. Since the
concept of Shelve is hiding implementation details from the
application, having to catch BerkeleyDB exceptions for
simple shelf operations is bit over the top. If I move to
another system, using different underlying DB (as given by
anydbm), will I have to catch new set of exceptions all over
again? 
Shelve (or anydbm) should either provide ability to select
underlying DB implementation to use, or it should handle all
DB implementation aspects so that it is trully transparent
to the end user. 
Just my $0.02.

----------------------------------------------------------------------

Comment By: Andrew I MacIntyre (aimacintyre)
Date: 2003-12-21 01:50

Message:
Logged In: YES 
user_id=250749

As far as I can make out, what you're seeing is a BerkeleyDB
issue, and bsddb is just reporting what BDB is telling it.

DB_RUNRECOVERY (-30987 on DB 3.3, -30981 on DB 4.0) is
documented as (quoted from DB4.0 HTML docs):
"There exists a class of errors that Berkeley DB considers
fatal to an entire Berkeley DB environment. An example of
this type of error is a corrupted database or a log write
failure because the disk is out of free space. The only way
to recover from these failures is to have all threads of
control exit the Berkeley DB environment, run recovery of
the environment, and re-enter Berkeley DB."

Therefore I think you should to followup this in a
BerkeleyDB forum.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=857909&group_id=5470