[Numpy-discussion] Python 3K merge

Pauli Virtanen pav at iki.fi
Thu Dec 3 07:30:24 EST 2009


to, 2009-12-03 kello 13:04 +0100, René Dudfield kirjoitti:
[clip]
>         In other news, we cannot support Py2 pickles in Py3 -- this is
>         because
>         Py2 str is unpickled as Py3 str, resulting to encoding
>         failures even
>         before the data is passed on to Numpy.
>
> Is this just for the type codes?  Or is there other string data that
> needs to be pickle loaded?  If it is just for the type codes, they are
> all within the ansi character set and unpickle fine without errors.
> I'm guessing numpy uses strings to pickle arrays?

The array data is put in a string in __reduce__.

The dtype is IIRC mostly stored using integers, though endianness is
stored with a character.

Actually, now that I look more closely, Py3 pickle.load takes an
'encoding' argument, which will perhaps help here. We should probably
just instruct users to pass 'latin1' there in Py3 if they want backwards
compatibility.

The Numpy __reduce__ and __setstate__ C code must then just be checked
for compatibility.

[clip]
> Using the python array module to store data might be the way to
> go(rather than strings), since that is available in both py2 and py3.

The array module has the same problem as Numpy, so using it will not
help:

$ python
Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) 
>>> import array
>>> c = array.array('b', '123öä')
>>> c
array('b', [49, 50, 51, -61, -74, -61, -92])
>>> f = open('foo.pck', 'w'); pickle.dump(c, f); f.close()
$ python3
Python 3.0.1+ (r301:69556, Apr 15 2009, 15:59:22) 
>>> import pickle
>>> f = open('foo.pck', 'rb')
>>> pickle.load(f)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.0/pickle.py", line 1335, in load
    return Unpickler(file, encoding=encoding, errors=errors).load()
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3:
ordinal not in range(128)

The 'encoding' argument does not actually help array module, but that
may be just because of some incompatible __setstate__ stuff in 'array'.

[clip]
> A set of pickles saved from python2 would be useful for testing.
> Forwards compatibility is also a useful thing to test.  That is py3.1
> pickles saved to be loaded with python2 numpy.

In Py3 it would be very convenient to __getstate__ the array data in
Bytes (e.g. space savings!), which will be forward incompatible, unless
the Py2 side has a custom unpickler.


-- 
Pauli Virtanen





More information about the NumPy-Discussion mailing list