concurrent access to dbms (was best way to store dbm records?)

Fri Oct 6 18:34:55 EDT 2000

"Michael B. Allen" <mballen at NOSPAM_erols.com> writes:
> I may have a different issue however. The docs on shelve(and I would imagine
> dbm's in general) report that you cannot concurrently read from it if it is
> open by a writer. If I have interpreted the docs correctly this is true even
> if it is only open by a single writer.

> Can I have may readers with one writer or might this corrupt the database?.
> Or in the worst case will it mearly display possibly incomplete information
> to a reader?

In my experience (I use gdbm), if a writer has the file then no one else
will be allowed access to it.  Interestingly enough, this limitation doesn't
appear to occur if you use bsddb routines.  However, you, apparently, will 
only get the information that exists when you opened the database file.  It
appears that you won't receive any of the new information.

> In my case I don't care if readers get realtime info. Can I just
> periodically make a copy of the dbm for readers?

I've done this in the past.  The writer may still need to close 
its file descriptor first to sync all the written data to the disk.

You could also do the following (there's probably a bazillion more and
better options than this as well):

	o  write a simple server that does all the dbm work (reads and
	   writes)

	o  use two dbm files--a massive read one and a miniature write
	   one.  When we hit a determined number of records or trigger time 
	   in the mini one, the updater sends the readers a shutdown msg, 
	   the readers close their file descriptor, the updater opens the
	   massive db for write, writes out the records in the mini db,
	   nulls out the mini db, closes the massive db, closes the
	   mini db, and sends the readers a startup msg.  The readers 
	   will then reopen the massive db for read.

	   When we hit the number of records or time threshold again, the 
	   process starts over.

	o  open/close the file as needed (depending on your traffic needs
	   this might actually be okay)

I like option #3 the best ;-).  #2 is probably the performance winner (esp.
in read-many, write-few environments), but is extremely error-prone (using 
signals or any other notification mechanism).  I've seen #1 used, but I hate 
it for anything beyond extremely trivial records.

Or you could just use a relational database. . .they'll have figured out
the concurrency stuff way better than us mere mortals.

> Also, how many records can I put in a dbm without taking a big performance
> hit? 10000?

I dunno.  On a 200Mhz RS/6000, we were able to lookup an entry in a 200k 
member set in approximately one second (NOTE:  from the command 
line. . .includes python's startup time).

FWIW, we ended up using gdbm 'cause AIX's standard dbm module wouldn't
allow records longer than 4500 (or something like that) bytes.

> Thanks,
> Mike

--Brad