why is bytearray treated so inefficiently by pickle?

Sun Nov 27 21:09:47 EST 2011

On 11/27/2011 9:33 AM, Irmen de Jong wrote:
> Hi,
>
> A bytearray is pickled (using max protocol) as follows:
>
>>>> pickletools.dis(pickle.dumps(bytearray([255]*10),2))
>      0: \x80 PROTO      2
>      2: c    GLOBAL     '__builtin__ bytearray'
>     25: q    BINPUT     0
>     27: X    BINUNICODE u'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
>     52: q    BINPUT     1
>     54: U    SHORT_BINSTRING 'latin-1'
>     63: q    BINPUT     2
>     65: \x86 TUPLE2
>     66: q    BINPUT     3
>     68: R    REDUCE
>     69: q    BINPUT     4
>     71: .    STOP
>
>>>> bytearray("\xff"*10).__reduce__()
> (<type 'bytearray'>, (u'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff', 'latin-1'), None)
>
>
> Is there a particular reason it is encoded so inefficiently? Most notably, the actual
> *bytes* in the bytearray are represented by an UTF-8 string. This needs to be
> transformed into a unicode string and then encoded back into bytes, when unpickled. The
> thing being a bytearray, I would expect it to be pickled as such: a sequence of bytes.
> And then possibly converted back to bytearray using the constructor that takes the bytes
> directly (BINSTRING/BINBYTES pickle opcodes).
>
> The above occurs both on Python 2.x and 3.x.
>
> Any ideas? Candidate for a patch?

Possibly. The two developers listed as particularly interested in pickle 
are 'alexandre.vassalotti,pitrou' (antoine), so if you do open a tracker 
issue, add them as nosy.

Take a look at http://www.python.org/dev/peps/pep-3154/
by Antoine Pitrou or forwary your message to him.

-- 
Terry Jan Reedy