Flexible string representation, unicode, typography, ...

Chris Angelico rosuav at gmail.com
Tue Aug 28 23:59:27 EDT 2012


On Wed, Aug 29, 2012 at 12:42 PM, rusi <rustompmody at gmail.com> wrote:
> Clearly there are 3 string-engines in the python 3 world:
> - 3.2 narrow
> - 3.2 wide
> - 3.3 (flexible)
>
> How difficult would it be to giving the choice of string engine as a
> command-line flag?
> This would avoid the nuisance of having two binaries -- narrow and
> wide.
> And it would give the python programmer a choice of efficiency
> profiles.

To what benefit?

3.2 narrow is, I would have to say, buggy. It handles everything up to
\uFFFF without problems, but once you have any character beyond that,
your indexing and slicing are wrong.

3.2 wide is fine but memory-inefficient.

3.3 is never worse than 3.2 except for some tiny checks, and will be
more memory-efficient in many cases.

Supporting narrow would require fixing the handling of surrogates.
Potentially a huge job, and you'll end up with ridiculous performance
in many cases.

So what you're really asking for is a command-line option to force all
strings to have their 'kind' set to 11, UCS-4 storage. That would be
doable, I suppose; it wouldn't require many changes (just a quick
check in string creation functions). But what would be the advantage?
Every string requires 4 bytes per character to store; an optimization
has been lost.

ChrisA



More information about the Python-list mailing list