Shelve operations are very slow and create huge files
Peter Otten
__peter__ at web.de
Sat Nov 1 14:49:56 EST 2003
Eric Wichterich wrote:
> Hello Pythonistas,
>
> I use Python shelves to store results from MySQL-Queries (using Python
> for web scripting).
> One script searches the MySQL-database and stores the result, the next
> script reads the shelve again and processes the result. But there is a
> problem: if the second script is called too early, the error "(11,
> 'Resource temporarily unavailable') " occurs.
> So I took a closer look at the file that is generated by the shelf: The
> result-list from MySQL-Query contains 14.600 rows with 7 columns. But,
> the saved file is over 3 MB large and contains over 230.000 lines (!),
> which seems way too much!
Let's see:
>>> 3*2**20/14600/7
30.780117416829746
>>>
Are thirty bytes per field, including administrative data, that much?
By the way, don't bother counting the lines in a file containing pickled
data; the pickle protocol inserts a newline after each attribute, unless
you specify the binary mode, e. g.:
shelve.open(filename, binary=True)
> Following statements are used:
> dbase = shelve.open(filename)
> if dbase.has_key(key): #overwrite objects stored with same key
> del dbase[key]
> dbase[key] = object
> dbase.close()
I've never used the shelve module so far, but the rule of least surprise
would suggest that
if dbase.has_key(key):
del dbase[key]
dbase[key] = data
is the same as
dbase[key] = data
> Any ideas?
Try to omit the shelve completely, preferably by moving the second script's
operations into the first. If you want to keep two scripts, don't invoke
them independently, make a little batch file or shell script instead.
If you need an intermediate step with a preprocessed snapshot of the MySQL
table, and you have sufficient rights, use a MySQL table for the temporary
data.
Peter
More information about the Python-list
mailing list