[Python-Dev] PEP 393: Flexible String Representation

Dj Gilcrease digitalxero at gmail.com
Wed Jan 26 02:50:30 CET 2011


On Tue, Jan 25, 2011 at 5:43 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> I also don't see how this could save a lot of memory. As an example
> take a French text with say 10mio code points. This would end up
> appearing in memory as 3 copies on Windows: one copy stored as UCS2 (20MB),
> one as Latin-1 (10MB) and one as UTF-8 (probably around 15MB, depending
> on how many accents are used). That's a saving of -10MB compared to
> today's implementation :-)

If I am reading the pep right, which I may not be as I am no expert on
unicode, the new implementation would actually give a 10MB saving
since the wchar field is optional, so only the str (Latin-1) and utf8
fields would need to be stored. How it decides not to store one field
or another would need to be clarified in the pep is I am right.


More information about the Python-Dev mailing list