Flexible string representation, unicode, typography, ...

Chris Angelico rosuav at gmail.com
Sat Aug 25 07:19:38 EDT 2012


On Sat, Aug 25, 2012 at 9:05 PM, Mark Lawrence <breamoreboy at yahoo.co.uk> wrote:
> I thought Terry Reedy had shot down any claims about performance overhead,
> and that the memory savings in many cases must be substantial and therefore
> worthwhile.  Or have I misread something?  Or what?

My reading of the thread(s) is/are that there are two reasons for the
debate to continue to rage:

1) Comparisons with a "narrow build" in which most characters take two
bytes but there are one or two characters that get encoded with
surrogates. The new system will allocate four bytes per character for
the whole string.

2) Arguments on the basis of huge strings that represent _all the
data_ that your program's working with, forgetting that there are
numerous strings all through everything that are ASCII-only.

ChrisA



More information about the Python-list mailing list