[Tutor] updating shelved record

Magnus Lyckå magnus@thinkware.se
Sun Jul 6 20:55:01 2003


At 17:57 2003-07-06 -0600, Joseph Paish wrote:
>never mind.
>
>found the answer almost immediately after posting the question, though i
>swear i looked for at least an hour before asking on this mailing list.
>
>anyway, the solution :
>
>item = dbase['brian']
>item ['age'] = 44
>dbase['brian'] = item
>
>sorry for the wasted bandwidth.

You're not wasting bandwidth. This is not obvious, and I'm sure
you're neither the first nor the last to fall into this trap. We
might say that shelve isn't behaving quite as one would expect
it to behave. Our expectations that it will act as a persistent
dictionary (which is a fair assumption) makes us draw the wrong
conclusion.

To understand what happens, we must really understand the
implementation of shelve.

When you do x[y], this will be translated into x.__getitem__(y).
In a dictionary, this function call will return a reference
to the object that dictionary x has a reference to under the
key y. If x[y] is modified, as in x[y][z] = 42, it will be the
object that dict x has keyed y to that is updated, so the
next call to x[y] will show the new value.

If I understand correctly, shelves use a simple database (dbm-
style) that stores strings keyed by other strings, and it uses
pickle, the standard python module for turning an arbitrary object
into a string.

This means that the dictionary object in your "dbase['brian']"
doesn't exist in your string. So, it's impossible for
dbase.__getitem__('brian') to simply return a reference to an
existing object, in this case a dictionary.

It's not so easy to make it work smarter than this if our aim
is that shelve is to be a light and simple module that keeps a
dbm file updated and keeps memory usage small.

When we do "dbase[key] = somehing" we will call
"dbase.__setitem__(key, something)" and that's a convenient
time to update the underlying database.

if we do "dbase[key][subkey] = 42" this will turn into
"dbase.__getitem__(key).__setitem__(subkey, 42)".

The shelve object, dbase, will only see a call to __getitem__
which is just the same as if someone would just try to read
the value of dbase[key]. There is no way dbase can see that
we are trying to update the shelve. __getitem__ returns a
dict object, and an ordinary dict object has no concept of
being a component inside a shelve, so it won't "report" its
update to dbase, so dbase can't be aware that it needs to
be updated.

After all, if

dbase['brian']['age'] = 42

would lead to a database change, then

x = dbase['brian']
x['age'] = 42

should do the same. I can't see how this could work.

It would certainly be possible to make a replacement to shelve
that brought in objects into memory when they were requested,
and kept a reference to each unpickled object inside itself so
that we wouldn't need to rebind it as Joseph showed above, but
it would use much more memory if many items in the database
were updated during a program run, and it could (as far as I
can see) only update the database file through specific save()
calls, and/or on close(). It would be quite different from shelve,
with much larger memory usage and no instant reflection of changes
to the disk.

To make it even smarter, something much more complex such as the
object database in Zope, ZODB, would be needed. Even ZODB has
problems with mutable objects such as dictionaries...


--
Magnus Lycka (It's really Lyckå), magnus@thinkware.se
Thinkware AB, Sweden, www.thinkware.se
I code Python ~ The Agile Programming Language