[shelve] What are the limitations? Entering too many data crashes it on my machine!

F. GEIGER fgeiger at datec.at
Sun Dec 22 08:56:42 EST 2002


I've written a file syncher (well, kind of), which uses walk() to walk dir
trees.

I wanted to look at other possibilities to solve the problem, so I thought
of a dict, which holds file properties (i.e. timestamp and size), where the
pathname is the key. This dict would be delivered by an object of class
DirectoryInfo or the like. Having two of them (one for the source, one for
the target), I could make predictions like: "12345 files will be copied,
because they are newer on source", "678 file will be copied, because they
are missing on target" etc.

Of course, if you supply the ctor of DirectoryInfo with "D:\\" and this
drive is a 30MB drive filled up to 3/4 of its size with files, a normal dict
would require quite a lot of mem.

So I decided to drop in shelve.

But suddenly, after having added 19220 entries an error "(0, 'Error')" is
reported, when executing the statement

self._fileInfos[str(pn)] = FileInfoNode(pn)

I catch the exception, synch the shelve and retry the operation. This time
it succeeds (BTW, synching the shelve or not does not change anything here,
but I hoped it'd prevent the script from the final crash - see below).

The same error occurs a second time, after having added 25913 entries.

Then, after having added 36006 entries the script crashes, because suddenly
the synch() method is no more recognized by the dict:
'''
File
"D:\Lab\Design_Patterns.Python\Structural_Patterns.GoF\Composite__Directory.
py", line 89, in _fileInfosAdd_
   self._fileInfos.sync()
File "C:\Programme\Python21\lib\shelve.py", line 94, in sync
   self.dict.sync()
bsddb.error: (22, 'Invalid argument')"
'''

If I do not synch, calling any other method causes this crash (e.g.
print len(self._fileInfos.keys()) )

The size of the file shelve stores the data in is about 1.00 GB (1 084 782
592 Bytes).

You might say, that my solution is not appropriate for this task, use a real
db, or at least use walk(). But that's not the point. The point is, what did
I do wrong with shelve?

Is 1 GB an implicit limit here? If so, what are those two "non-errors"
occurring much earlier?

Can anybody help to resolve this?

Many thanks in advance and best regards
Franz GEIGER


P.S.: ActivePython 2.1.3 on W2k, but ActivePython 2.2.1 on WinXP yields the
same "results".





More information about the Python-list mailing list