[Datetime-SIG] Are there any "correct" implementations of tzinfo?

Alexander Belopolsky alexander.belopolsky at gmail.com
Tue Sep 15 03:42:00 CEST 2015


No credit for anything other than the "extra credit" section.  Partial credit for that.  Study that printout and you should understand what Tim was saying. 



> On Sep 14, 2015, at 9:19 PM, Random832 <random832 at fastmail.com> wrote:
> 
>> On Mon, Sep 14, 2015, at 18:09, Tim Peters wrote:
>> Sorry, I'm not arguing about this any more.  Pickle doesn't work at
>> all at the level of "count of bytes followed by a string". 
> 
> The SHORT_BINBYTES opcode consists of the byte b'C', followed by *yes
> indeed* "count of bytes followed by a string".
> 
>> If you
>> want to make a pickle argument that makes sense, I'm afraid you'll
>> need to become familiar with how pickle works first.  This is not the
>> place for a pickle tutorial.
>> 
>> Start by learning what a datetime pickle actually is.
>> pickletools.dis() will be very helpful.
> 
>    0: \x80 PROTO      3
>    2: c    GLOBAL     'datetime datetime'
>   21: q    BINPUT     0
>   23: C    SHORT_BINBYTES b'\x07\xdf\t\x0e\x15\x06*\x00\x00\x00'
>   35: q    BINPUT     1
>   37: \x85 TUPLE1
>   38: q    BINPUT     2
>   40: R    REDUCE
>   41: q    BINPUT     3
>   43: .    STOP
> 
> The payload is ten bytes, and the byte immediately before it is in fact
> 0x0a. If I pickle any byte string under 256 bytes long by itself, the
> byte immediately before the data is the length. This is how I initially
> came to the conclusion that "count of bytes followed by a string" was
> valid.
> 
> I did, before writing my earlier post, look into the high-level aspects
> of how datetime pickle works - it uses __reduce__ to create up to two
> arguments, one of which is a 10-byte string, and the other is the
> tzinfo. Those arguments are passed into the date constructor and
> detected by that constructor - for example, I can call it directly with
> datetime(b'\x07\xdf\t\x0e\x15\x06*\x00\x00\x00') and get the same result
> as unpickling.
> 
> At the low level, the part that represents that first argument does
> indeed appear to be "count of bytes followed by a string". I can add to
> the count, add more bytes, and it will call the constructor with the
> longer string. If I use pickletools.dis on my modified value the output
> looks the same except for, as expected, the offsets and the value of the
> argument to the SHORT_BINBYTES opcode.
> 
> So, it appears that, as I was saying, "wasted space" would not have been
> an obstacle to having the "payload" accepted by the constructor (and
> produced by __reduce__ ultimately _getstate) consist of "a byte string
> of >= 10 bytes, the first 10 of which are used and the rest of which are
> ignored by python <= 3.5" instead of "a byte string of exactly 10
> bytes", since it would have accepted and produced exactly the same
> pickle values, but been prepared to accept larger arguments pickled from
> future versions.
> 
> For completeness: Protocol version 2 and 1 use BINUNICODE on a
> latin1-to-utf8 version of the byte string, with a similar "count of
> bytes followed by a string" (though the count of bytes is of UTF-8
> bytes). Protocol version 0 uses UNICODE, terminated by \n, and a literal
> \n is represented by \\u000a. In all cases some extra data around the
> value sets it up to call "codecs.encode(..., 'latin1')" upon unpickling.
> 
> So have I shown you that I know enough about the pickle format to know
> that permitting a longer string (and ignoring the extra bytes) would
> have had zero impact on the pickle representation of values that did not
> contain a longer string? I'd already figured out half of this before
> writing my earlier post; I just assumed *you* knew enough that I
> wouldn't have to show my work.
> 
> Extra credit:
>    0: \x80 PROTO      3
>    2: c    GLOBAL     'datetime datetime'
>   21: q    BINPUT     0
>   23: (    MARK
>   24: M        BININT2    2014
>   27: K        BININT1    9
>   29: K        BININT1    14
>   31: K        BININT1    21
>   33: K        BININT1    6
>   35: K        BININT1    42
>   37: t        TUPLE      (MARK at 23)
>   38: q    BINPUT     1
>   40: R    REDUCE
>   41: q    BINPUT     2
>   43: .    STOP
> _______________________________________________
> Datetime-SIG mailing list
> Datetime-SIG at python.org
> https://mail.python.org/mailman/listinfo/datetime-sig
> The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/


More information about the Datetime-SIG mailing list