anydbm safe for simultaneous writes?

Eric S. Johansson esj at harvee.org
Sat Mar 1 00:28:35 EST 2008


chris wrote:
> I need simple data persistence for a cgi application that will be used
> potentially by multiple clients simultaneously.  So I need something
> that can handle locking among writes.  Sqlite probably does this, but
> I am using Python 2.4.4, which does not include sqlite.  The dbm-style
> modules would probably be fine, but I have no idea if they are "write
> safe" (I have no experience with the underlying unix stuff).  Any tips
> appreciated.

the often repeated answer that you need locking is correct but an incomplete 
answer.  it really depends on which DBM you are using.  If you are using a 
fairly recent bsdbm (a.k.a. sleepy cat) it does have the kind of lucky needs to 
fairly complex transactions.  Unfortunately, the API is a sufficiently 
unintelligible that it will take more than an afternoon to figure out how to 
even start to use it.

gdbm is a nice DBM that permits single writer/multiple readers.  If you open a 
DBM for read, any writer blocks.  You open it for read and some times multiple 
readers can get in but not always (or at least that's the way it seems here in 
practice).  when the DBM is busy, you will get an exception with an error value 
of: (11, 'Resource temporarily unavailable').  Just busy wait until this 
exception goes away and you'll get access to the DBM file.  Yes, this officially 
sucks but at least it's a workaround for the problem.

another way to solve this particular problem with DBM files is to stick inside a 
Pyro daemon.  Performance won't be too bad and you should be able to get it 
working relatively easily.  I will warn you that the RPC model for Pyro does 
take some getting used to if you're familiar with more traditional RPC 
environments.  Once you wrap your head around the Pyro model, it's pretty nice. 
  If you want, I can send you a copy of my Pyro daemon I use to wrap a DBM so I 
don't have to worry about multiple processes accessing the same DBM.

the one thing that really bothers me about the DBM interfaces is that the two 
main DBM's are really quite full-featured but the documentation presents a very 
sketchy description of what they support and how.  As a result, I suspect that 
DBMS don't get used as often as they could and people are pushed into more 
complex databases because they don't understand what DBM's are capable of.

Other folks have recommended some form of SQL and while SQL light is a very nice 
small database, personally, I find SQL unintelligible and I have lost more days 
than I care to think about trying to figure out how to do something in SQL.   As 
result, I tend to go more towards databases such as metakit and buzhug 
(http://buzhug.sourceforge.net/).  the former is like gdbm and only handles a 
single writer.  It's really intended for single process use but I don't know if 
you can put it in a Pyro controlled deamon.  The latter looks pretty interesting 
because the documentation implies that it supports concurrent access on a per 
record level (menu item: concurrency control).

Given that I'm currently replacing a DBM for very much the same reason you are, 
I'm going to try using buzhug rather than facing SQL again.  I would be glad to 
compare notes with you if you care to go the same route.  Just let me know off list.

I wish you the best of luck in your project.

---eric

-- 
Speech-recognition in use.  It makes mistakes, I correct some.



More information about the Python-list mailing list