[MATRIX-SIG] Changing shelve...

Jim Fulton jim@digicool.com
Sat, 04 Oct 1997 09:09:34 -0400


Konrad Hinsen wrote:
> 
> > The Numeric module includes it's own special versions of Pickler and
> > Unpickler which inherit from those defined in the pickle module, but add
> > the functionality of Numeric.array pickling.  This is all well and good,
> > but it's still pretty troublesome if you're working with the shelve
> > module.  shelve.Shelf likes to directly call pickle.Pickler and
> > pickle.Unpickler, not those versions found in Numeric.
> 
> And that's only one of the problems with this approach. Another one is
> that it works only as long as there is no more than one
> "specialization" of pickle.

True.

> The fundamental problem is pickle itself, which in its present form
> does not permit the extension to new data types other than by
> subclassing. This might change in later versions.

Actually, the latest pickle and cPickle provide a generalized way
of handling new types without subclassing.  This involves a protocol
for "reducing" custom types to standard types and works very well
for many applications.

Unfortunately, as you have pointed out to me in private email, the new
approach does not work well for objects like arrays because it is too
inefficient to reduce *VERY LARGE* arrays to Python objects.  I've
thought alot about this, and even began to implement an idea or two, 
but haven't been satisfied with any approach so far.  

The major difficulty is handling arrays that are soooo big, that it
isn't good enough to marshal their data to a portable string format in
memory.  I'd guess that many people have arrays that are small enough
that they can afford to marshal the array to a string.  In such cases
the current reduce mechanism can work quite well.

One idea I had was to define a new pickle type "temporary file".
Basically, you could create a temporary file object, write your data to
it, hopefully in a portable format, and then pickle the temporary file.
The pickling machinery would be prepared to pickle and unopickle the
temporary file without reading the contents into memory.  This would
involve an extra copy operation when pickling and unpickling, which
you objected to.

Another option would be to define a CObject like object that would allow
a block of memory to be pickled and unpickled.  This would allow very
fast pickling and unpickling but with a loss of portability.

Hm....What about a special picklable type that would wrap:

  - A void pointer,
  - An object that contains the void pointer,
  - a size, and
  - a type code.

So, an array's __reduce__ method would contruct one of these special
objects and the picking machinery would be prepared to pickle and
unpickle the object efficiently *and* portably.
This last idea takes advantage of an assumption that we 
want to pickle a block of memory that contains objects of
some constant known C type.

I think this last idea can work.  Anybody want to volunteer
to help me make it work? (I have so little time these days. :-()

Jim

_______________
MATRIX-SIG  - SIG on Matrix Math for Python

send messages to: matrix-sig@python.org
administrivia to: matrix-sig-request@python.org
_______________