[Python-Dev] Accepting PEP 3154 for 3.4?

Tue Nov 19 22:17:06 CET 2013

[Tim]
>> ...
>> better than implicit" kinds of reasons.  The only way now to know that
>> you're looking at a frame size is to keep a running count of bytes
>> processed and realize you've reached a byte offset where a frame size
>> "is expected".

[Antoine]
> That's integrated to the built-in buffering.

Well, obviously, because it wouldn't work at all unless the built-in
buffering knew all about it ;-)

> It's not really an additional constraint: the frame sizes simply
> dictate how buffering happens in practice. The main point of
> framing is to *simplify* the buffering logic (of course, the old
> buffering logic is still there for protocols <= 3, unfortunately).

And always will be - there are no pickle simplifications, because
everything always sticks around forever.  Over time, pickle just gets
more complicated.  That's in the nature of the beast.

> Note some drawbacks of frame opcodes:
> - the decoder has to sanity check the frame opcodes (what if a frame
>   opcode is encountered when already inside a frame?)
> - a pickle-mutating function such as pickletools.optimize() may naively
>   ignore the frame opcodes while rearranging the pickle stream, only to
>   emit a new pickle with invalid frame sizes

I suspect we have very different mental models here.  By "has an
opcode", I do NOT mean "must be visible to the opcode-decoding loop".
I just mean "has a unique byte assigned in the pickle opcode space".

I expect that in the CPython implementation of unpickling, the
buffering layer would _consume_ the FRAME opcode, along with the frame
size.  The opcode-decoding loop would never see it.

But if some _other_ implementation of unpickling didn't give a hoot
about framing, having an explicit opcode means that implementation
could ignore the whole scheme very easily:  just implement the FRAME
opcode in *its* opcode-decoding loop to consume the FRAME argument,
ignore it, and move on.  As-is, all other implementations _have_ to
know everything about the buffering scheme because it's all implicit
low-level magic.

So, then, to the 2 points you raised:

1. If the CPython decoder ever sees a FRAME opcode, I expect it to
raise an exception.  That's all - it's an invalid pickle (or bug in
the code) if it contains a FRAME the buffering layer didn't consume.

2. pickletools.optimize() in the CPython implementation should never
see a FRAME opcode either.

Initially, all I desperately ;-) want changed here is for the
_buffering layer_, on the writing end, to write 9 bytes instead of 8
(1 new one for a FRAME opcode), and on the reading end to consume 9
bytes instead of 8 (extra credit if it checked the first byte to
verify it really is a FRAME opcode - there's nothing wrong with sanity
checks).

Then it becomes _possible_ to optimize "small pickles" later (in the
sense of not bothering to frame them at all).  So long as frames
remain implicit magic, that's impossible without moving to yet another
new protocol level.