Flexible string representation, unicode, typography, ...

MRAB python at mrabarnett.plus.com
Thu Aug 23 11:11:05 EDT 2012


On 23/08/2012 14:57, Neil Hodgson wrote:
> wxjmfauth at gmail.com:
>
>> Small illustration. Take an a4 page containing 50 lines of 80 ascii
>> characters, add a single 'EM DASH' or an 'BULLET' (code points>  0x2000),
>> and you will see all the optimization efforts destroyed.
>>
>>>> sys.getsizeof('a' * 80 * 50)
>> 4025
>>>>> sys.getsizeof('a' * 80 * 50 + '•')
>> 8040
>
>      This example is still benefiting from shrinking the number of bytes
> in half over using 32 bits per character as was the case with Python 3.2:
>
>   >>> sys.getsizeof('a' * 80 * 50)
> 16032
>   >>> sys.getsizeof('a' * 80 * 50 + '•')
> 16036
>   >>>
>
Perhaps the solution should've been to just switch between 2/4 bytes 
instead
of 1/2/4 bytes. :-)



More information about the Python-list mailing list