Shelve operations are very slow and create huge files

Sun Nov 2 14:50:19 EST 2003

On Mon, 2003-11-03 at 04:28, Eric Wichterich wrote:
> Hello Tim,
> 
> thank you for your thoughts. I tried to use cPickle and gzip instead of 
> shelve. But it ran much slower than before.
> So I used the profiler to check where the most time is needed.
> To read the data and convert it back to a dictionary needed around 9 
> seconds with shelve.
> With cPickle, it even needed 11 seconds.
> With gzip & cPickle, it also needed 11 seconds (the file was now around 
> 250 kB instead of 1,7 MB).
> Using pickle instead of cPickle, it needed over 45 seconds.
> It seems that the file size doesn't matter at all. But 9 seconds just 
> for converting data from a pickle back to a Python dictionary???

Presumably you will be reading the stored query results hundreds or
thousands of times - otherwise the fact that it takes 9 sec versus 1
second doesn't matter. Or maybe you need a faster computer? Or perhaps a
more relaxed lifestyle?

> Thanks also for mentioning DMTools. Do you know whether it is useful 
> for (fast) convertion from a pickle (or some file stored on the server) 
> into a Python dictionary? I didn't find much real-life examples or 
> further descriptions of these tools.

It doesn't use any special tricks - it just provides convenience
functions for caching query results using gzipped cPickles. However,
gains may only be seen with larger result sets that you are dealing
with.

Tim C

> 
> Greetings,
> Eric
> 
> Am Sonntag, 02.11.03 um 08:36 Uhr schrieb Tim Churches:
> 
> > On Sun, 2003-11-02 at 03:38, Eric Wichterich wrote:
> >> One script searches the MySQL-database and stores the result, the next
> >> script reads the shelve again and processes the result. But there is a
> >> problem: if the second script is called too early, the error "(11,
> >> 'Resource temporarily unavailable') " occurs.
> >
> > The only reason to use shelves is if your query results are too large
> > (in total) to fit in memory, and thus have to be retrieved, stored and
> > processed row-by-row.
> >
> >> So I took a closer look at the file that is generated by the shelf: 
> >> The
> >> result-list from MySQL-Query contains 14.600 rows with 7 columns. But,
> >> the saved file is over 3 MB large and contains over 230.000 lines (!),
> >> which seems way too much!
> >
> > But that doesn't seem to be the case - your query results can easily 
> > fit
> > in memory. However, the query may still take a long time to execute, so
> > it may be reasonable to want to store or cache the results for further
> > processing later. However, it is much quicker to just pickle (cPickle)
> > the results to a gzipped file than to use shelve. The use of gzip
> > actually speeds things up, provided that your CPU is reasonably fast 
> > and
> > your disc storage system is mundane (any CPU faster than about 500 Mhz
> > sees gains on most result sets). Also it saves disc space.
> >
> > Ole Nielsen and Peter Christen have written a neat set of Python
> > functions which will automatically handle the caching of query results
> > from an MySQL datadase in gzipped pickles - see
> > http://csl.anu.edu.au/ml/dm/dm_software.html - except the files don't
> > seem to be available from that page - Ole and Peter, please fix!
> >
> > -- 
> >
> > Tim C
> >
> > PGP/GnuPG Key 1024D/EAF993D0 available from keyservers everywhere
> > or at http://members.optushome.com.au/tchur/pubkey.asc
> > Key fingerprint = 8C22 BF76 33BA B3B5 1D5B  EB37 7891 46A9 EAF9 93D0
> >
> >
> > <signature.asc>
-- 

Tim C

PGP/GnuPG Key 1024D/EAF993D0 available from keyservers everywhere
or at http://members.optushome.com.au/tchur/pubkey.asc
Key fingerprint = 8C22 BF76 33BA B3B5 1D5B  EB37 7891 46A9 EAF9 93D0

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-list/attachments/20031103/d2166f3d/attachment.sig>