[Python-Dev] PEP 393: Flexible String Representation

M.-A. Lemburg mal at egenix.com
Tue Jan 25 23:43:52 CET 2011


I'll comment more on this later this week...

>From my first impression, I'm
not too thrilled by the prospect of making the Unicode implementation
more complicated by having three different representations on each
object.

I also don't see how this could save a lot of memory. As an example
take a French text with say 10mio code points. This would end up
appearing in memory as 3 copies on Windows: one copy stored as UCS2 (20MB),
one as Latin-1 (10MB) and one as UTF-8 (probably around 15MB, depending
on how many accents are used). That's a saving of -10MB compared to
today's implementation :-)

"Martin v. Löwis" wrote:
> I have been thinking about Unicode representation for some time now.
> This was triggered, on the one hand, by discussions with Glyph Lefkowitz
> (who complained that his server app consumes too much memory), and Carl
> Friedrich Bolz (who profiled Python applications to determine that
> Unicode strings are among the top consumers of memory in Python).
> On the other hand, this was triggered by the discussion on supporting
> surrogates in the library better.
> 
> I'd like to propose PEP 393, which takes a different approach,
> addressing both problems simultaneously: by getting a flexible
> representation (one that can be either 1, 2, or 4 bytes), we can
> support the full range of Unicode on all systems, but still use
> only one byte per character for strings that are pure ASCII (which
> will be the majority of strings for the majority of users).
> 
> You'll find the PEP at
> 
> http://www.python.org/dev/peps/pep-0393/
> 
> For convenience, I include it below.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 25 2011)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the Python-Dev mailing list