Shelve operations are very slow and create huge files

Sat Nov 1 14:49:56 EST 2003

Eric Wichterich wrote:

> Hello Pythonistas,
> 
> I use Python shelves to store results from MySQL-Queries (using Python
> for web scripting).
> One script searches the MySQL-database and stores the result, the next
> script reads the shelve again and processes the result. But there is a
> problem: if the second script is called too early, the error "(11,
> 'Resource temporarily unavailable') " occurs.
> So I took a closer look at the file that is generated by the shelf: The
> result-list from MySQL-Query contains 14.600 rows with 7 columns. But,
> the saved file is over 3 MB large and contains over 230.000 lines (!),
> which seems way too much!

Let's see:

>>> 3*2**20/14600/7
30.780117416829746
>>>

Are thirty bytes per field, including administrative data, that much?
By the way, don't bother counting the lines in a file containing pickled
data; the pickle protocol inserts a newline after each attribute, unless
you specify the binary mode, e. g.:

shelve.open(filename, binary=True)

> Following statements are used:
> dbase = shelve.open(filename)
> if dbase.has_key(key): #overwrite objects stored with same key
> del dbase[key]
> dbase[key] = object
> dbase.close()

I've never used the shelve module so far, but the rule of least surprise
would suggest that 

if dbase.has_key(key):
    del dbase[key]
dbase[key] = data

is the same as 

dbase[key] = data

> Any ideas?

Try to omit the shelve completely, preferably by moving the second script's
operations into the first. If you want to keep two scripts, don't invoke
them independently, make a little batch file or shell script instead.

If you need an intermediate step with a preprocessed snapshot of the MySQL
table, and you have sufficient rights, use a MySQL table for the temporary
data.

Peter