[SciPy-User] Numpy pickle format

Robert Kern robert.kern at gmail.com
Mon Nov 29 09:09:30 EST 2010


On Mon, Nov 29, 2010 at 07:46, Francesc Alted <faltet at pytables.org> wrote:
> Hi David,
>
> A Thursday 25 November 2010 00:22:02 David Baddeley escrigué:
>> Thanks heaps for the detailed reply! That looks like it should be
>> enough info to get me started ... I know it's a bit of a niche
>> application, but is there likely to be anyone else out there who's
>> likely to be interested in similar functionality? Just want to know
>> if it's worth taking the time to think about supporting some of the
>> additional aspects of the protocol (eg c/fortran order) before I
>> cobble something together -  I wonder if one could wrap JAMA to
>> provide some very basic array functionality ...
>
> I'm interested.  I'm after adopting a protocol to send arrays in a way
> that can serialize/deserialize them without having to duplicate the
> contents in memory (so that the serialized version and the deserialized
> one does not have to happen at the same time)..
>
> My idea is to adopt something similar to the native NPY format for
> files:
>
> http://svn.scipy.org/svn/numpy/trunk/doc/neps/npy-format.txt
>
> but adapting it to support blocking --that is, to be able to send parts
> of the array by blocks, and be able to restore the original array by
> assembling these blocks.  That way, the serialized and deserialized do
> not have to coexist in the same process memory (only one block has) when
> sending the stream to destination.  As a plus, this would add the
> possibility to compress blocks transparently, and with a little bit of
> more effort, perhaps even allowing random access in case the
> serialization goes to a file on-disk (and not to a stream).
>
> I'm thinking in supporting just the metadata that NPY supports right
> now, that is, the dtype, the C/Fortran order and the shape, that's all.
> After this format would be clear, then several implementations can be
> done (like Pyro or zeromq, or just by using something in the Python
> standard library).

Rather than "adapting the format" per se, just wrap your format around
it. Send a message containing the version number of your blocked
format, the number of header blocks, the number of data blocks, and
any information about the compression of the data. Then send the NPY
header in its own message. Then start send the possibly compressed
data messages.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco



More information about the SciPy-User mailing list