String changing size on failure?

Wed Nov 1 16:34:20 EDT 2017

On 11/1/17 4:17 PM, MRAB wrote:
> On 2017-11-01 19:26, Ned Batchelder wrote:
>>   From David Beazley 
>> (https://twitter.com/dabeaz/status/925787482515533830):
>>
>>       >>> a = 'n'
>>       >>> b = 'ñ'
>>       >>> sys.getsizeof(a)
>>      50
>>       >>> sys.getsizeof(b)
>>      74
>>       >>> float(b)
>>      Traceback (most recent call last):
>>         File "<stdin>", line 1, in <module>
>>      ValueError: could not convert string to float: 'ñ'
>>       >>> sys.getsizeof(b)
>>      77
>>
>> Huh?
>>
> It's all explained in PEP 393.
>
> It's creating an additional representation (UTF-8 + zero-byte 
> terminator) of the value and is caching that, so there'll then be the 
> bytes for 'ñ' and the bytes for the UTF-8 (0xC3 0xB1 0x00).
>
> When the string is ASCII, the bytes of the UTF-8 representation is 
> identical to those or the original string, so it can share them.

That explains why b is larger than a to begin with, but it doesn't 
explain why float(b) is changing the size of b.

--Ned.