shelve vs pickle

Fri Sep 14 16:28:50 EDT 2001

Hi,

I've read a few articles in the mailing list, but it is not apparent 
to me the advantages of shelve over pickle. (These are two modules in 
Python.)

(1) Shelve does not write to disk immediately, at least in Windows 
platform. So, if you are storing things to the shelve, you are often 
doing it in RAM, not to the disk. And if your program crashes before 
you close the shelve, all changes are gone. Which may or may not be 
what you want. For a good database, you choose when and how you 
commit transactions. For shelve, can you choose the transactional 
behavior? (So that it saves things more often?)

(2) Shelve, then, very often stores things just in RAM. I don't know 
the details, but I guess that if the file is large (10GB? 100GB?), it 
may be smart not to load everything into RAM.

(3) Or is the advantage of Shelve in the open() and close() 
statements? Is it smart enough to save only those items that have 
been modified? So that open() and close() are fast enough if only a 
few items have been touched?

(4) In another test, I closed the Python program while it was in a 
loop writing items to shelve. The database file got corrupted enough 
that the majority of items are now missing.

All in all, I am just wondering, just like some other people have 
asked before in the newsgroup (and not getting any real answer): when 
is it good to use shelve?

I guess shelve can only be used safely if extra caution is taken into 
account:

(1) Before opening a shelve file, make sure of making a back up copy. 
(This can be done at the moment of saving, too, depends on each 
person's preference.)
(2) Delete the back up copy only if the shelve has been closed 
successfully.
(3) Close the shelve file often, to make sure that your changes are 
recorded.

However, you folks can realize a contradiction here: in order to use 
shelve safely, you have to copy the big external file everytime you 
want to open (or save) it. The benefit of fast opening and fast 
closing is gone.

So, I still don't see any benefit from shelve. I can only see that to 
do things safely I would use:

(1) a transactional database, or

(2) pickle or xml my Python objects in small units, so that I can 
safely access them. ("safely accessing" means the standard procedure 
of keeping a backup copy before overriding the file, and use file 
renaming schemes to minimize the time of possible data inconsistency 
if the program crashes unexpectly, for instance, due power failure.)

And I just don't see when to use shelve. If I have 1 million items to 
store, I wouldn't use shelve: I'd use transactional database. If I 
have 100 items to save, I wouldn't use shelve: I just use pickle.

There is a small range of applicability of shelve: in situations 
where you: (1) have many items, (2) need to modify many things, but 
in a transactional fashion, typically spending 5 minutes to an hour 
in the process. (3) Don't mind losing the changes if the transaction 
is not completed. I guess you can use shelve for report logging, for 
non-critical web session data, etc.

regards,

Hung Jung