[Datetime-SIG] Are there any "correct" implementations of tzinfo?
Alexander Belopolsky
alexander.belopolsky at gmail.com
Tue Sep 15 03:42:00 CEST 2015
No credit for anything other than the "extra credit" section. Partial credit for that. Study that printout and you should understand what Tim was saying.
> On Sep 14, 2015, at 9:19 PM, Random832 <random832 at fastmail.com> wrote:
>
>> On Mon, Sep 14, 2015, at 18:09, Tim Peters wrote:
>> Sorry, I'm not arguing about this any more. Pickle doesn't work at
>> all at the level of "count of bytes followed by a string".
>
> The SHORT_BINBYTES opcode consists of the byte b'C', followed by *yes
> indeed* "count of bytes followed by a string".
>
>> If you
>> want to make a pickle argument that makes sense, I'm afraid you'll
>> need to become familiar with how pickle works first. This is not the
>> place for a pickle tutorial.
>>
>> Start by learning what a datetime pickle actually is.
>> pickletools.dis() will be very helpful.
>
> 0: \x80 PROTO 3
> 2: c GLOBAL 'datetime datetime'
> 21: q BINPUT 0
> 23: C SHORT_BINBYTES b'\x07\xdf\t\x0e\x15\x06*\x00\x00\x00'
> 35: q BINPUT 1
> 37: \x85 TUPLE1
> 38: q BINPUT 2
> 40: R REDUCE
> 41: q BINPUT 3
> 43: . STOP
>
> The payload is ten bytes, and the byte immediately before it is in fact
> 0x0a. If I pickle any byte string under 256 bytes long by itself, the
> byte immediately before the data is the length. This is how I initially
> came to the conclusion that "count of bytes followed by a string" was
> valid.
>
> I did, before writing my earlier post, look into the high-level aspects
> of how datetime pickle works - it uses __reduce__ to create up to two
> arguments, one of which is a 10-byte string, and the other is the
> tzinfo. Those arguments are passed into the date constructor and
> detected by that constructor - for example, I can call it directly with
> datetime(b'\x07\xdf\t\x0e\x15\x06*\x00\x00\x00') and get the same result
> as unpickling.
>
> At the low level, the part that represents that first argument does
> indeed appear to be "count of bytes followed by a string". I can add to
> the count, add more bytes, and it will call the constructor with the
> longer string. If I use pickletools.dis on my modified value the output
> looks the same except for, as expected, the offsets and the value of the
> argument to the SHORT_BINBYTES opcode.
>
> So, it appears that, as I was saying, "wasted space" would not have been
> an obstacle to having the "payload" accepted by the constructor (and
> produced by __reduce__ ultimately _getstate) consist of "a byte string
> of >= 10 bytes, the first 10 of which are used and the rest of which are
> ignored by python <= 3.5" instead of "a byte string of exactly 10
> bytes", since it would have accepted and produced exactly the same
> pickle values, but been prepared to accept larger arguments pickled from
> future versions.
>
> For completeness: Protocol version 2 and 1 use BINUNICODE on a
> latin1-to-utf8 version of the byte string, with a similar "count of
> bytes followed by a string" (though the count of bytes is of UTF-8
> bytes). Protocol version 0 uses UNICODE, terminated by \n, and a literal
> \n is represented by \\u000a. In all cases some extra data around the
> value sets it up to call "codecs.encode(..., 'latin1')" upon unpickling.
>
> So have I shown you that I know enough about the pickle format to know
> that permitting a longer string (and ignoring the extra bytes) would
> have had zero impact on the pickle representation of values that did not
> contain a longer string? I'd already figured out half of this before
> writing my earlier post; I just assumed *you* knew enough that I
> wouldn't have to show my work.
>
> Extra credit:
> 0: \x80 PROTO 3
> 2: c GLOBAL 'datetime datetime'
> 21: q BINPUT 0
> 23: ( MARK
> 24: M BININT2 2014
> 27: K BININT1 9
> 29: K BININT1 14
> 31: K BININT1 21
> 33: K BININT1 6
> 35: K BININT1 42
> 37: t TUPLE (MARK at 23)
> 38: q BINPUT 1
> 40: R REDUCE
> 41: q BINPUT 2
> 43: . STOP
> _______________________________________________
> Datetime-SIG mailing list
> Datetime-SIG at python.org
> https://mail.python.org/mailman/listinfo/datetime-sig
> The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/
More information about the Datetime-SIG
mailing list