Chardet, file, ... and the Flexible String Representation

Chris Angelico rosuav at gmail.com
Fri Sep 6 12:04:52 EDT 2013


On Sat, Sep 7, 2013 at 1:46 AM, Piet van Oostrum <piet at vanoostrum.org> wrote:
> The FSR simply stores a Unicode string as an array[*] of ints (the Unicode code points of the characters of the string. That's it. Then it uses a memory-efficient way to store this array of ints. But that has nothing to do with character sets. The same principle could be used for any array of ints.

Python does, in fact, store integers in different-sized blocks of
memory according to size - though not for anything smaller than
32-bit.

>>> sys.getsizeof(100)
14
>>> sys.getsizeof(1000000000000000000000000000000000)
28

So why this is suddenly a bad thing for characters is a mystery none
but he can comprehend.

ChrisA



More information about the Python-list mailing list