String changing size on failure?

Wed Nov 1 16:39:43 EDT 2017

On 2017-11-01, Ned Batchelder <ned at nedbatchelder.com> wrote:
> On 11/1/17 4:17 PM, MRAB wrote:
>> On 2017-11-01 19:26, Ned Batchelder wrote:
>>>   From David Beazley 
>>> (https://twitter.com/dabeaz/status/925787482515533830):
>>>
>>>       >>> a = 'n'
>>>       >>> b = 'ñ'
>>>       >>> sys.getsizeof(a)
>>>      50
>>>       >>> sys.getsizeof(b)
>>>      74
>>>       >>> float(b)
>>>      Traceback (most recent call last):
>>>         File "<stdin>", line 1, in <module>
>>>      ValueError: could not convert string to float: 'ñ'
>>>       >>> sys.getsizeof(b)
>>>      77
>>>
>>> Huh?
>>>
>> It's all explained in PEP 393.
>>
>> It's creating an additional representation (UTF-8 + zero-byte 
>> terminator) of the value and is caching that, so there'll then be the 
>> bytes for 'ñ' and the bytes for the UTF-8 (0xC3 0xB1 0x00).
>>
>> When the string is ASCII, the bytes of the UTF-8 representation is 
>> identical to those or the original string, so it can share them.
>
> That explains why b is larger than a to begin with

No, that size difference is due to the additional bytes required for
the internal representation of the string.

> but it doesn't explain why float(b) is changing the size of b.

The additional UTF-8 representation isn't being created and cached
until the float() call is made.

-- 
Grant Edwards               grant.b.edwards        Yow! ONE LIFE TO LIVE for
                                  at               ALL MY CHILDREN in ANOTHER
                              gmail.com            WORLD all THE DAYS OF
                                                   OUR LIVES.