Shelve operations are very slow and create huge files

Sun Nov 2 16:44:43 EST 2003

Eric Wichterich wrote:
> Hello Pythonistas,
> 
> I use Python shelves to store results from MySQL-Queries (using Python 
> for web scripting).
> One script searches the MySQL-database and stores the result, the next 
> script reads the shelve again and processes the result. But there is a 
> problem: if the second script is called too early, the error "(11, 
> 'Resource temporarily unavailable') " occurs.
> So I took a closer look at the file that is generated by the shelf: The 
> result-list from MySQL-Query contains 14.600 rows with 7 columns. But, 
> the saved file is over 3 MB large and contains over 230.000 lines (!), 
> which seems way too much!
> 
> Following statements are used:
> dbase = shelve.open(filename)
> if dbase.has_key(key): #overwrite objects stored with same key
> 	del dbase[key]
> dbase[key] = object
> dbase.close()
> 
> Any ideas?

Have you thought of simply using the 'keys' as filenames (perhaps with
some canonical name mangling) and storing the object content as a
pickle?   These days filesystems tend to behave a lot like databases and
it might prove to be the fastest solution. 

I once did a check with reiserfs on linux, created like one million
directory entries and read random entries afterwards.  I was then able
to read a couple of hundred files a second (they only contained a small
number).  Ah yes, don't try to run os.listdir on those directories :-)

Another thing: renaming a file is *atomic* across all processes (at
least in POSIX land). This means you can create files, fill them, close
them and then issue a 'rename' operation to the real filename and all
other processes will either see no file or the complete file. 

cheers,

    holger