[Python-Dev] PEP 393 Summer of Code Project
Xavier Morel
python-dev at masklinn.net
Tue Aug 23 11:46:12 CEST 2011
On 2011-08-23, at 10:55 , Martin v. Löwis wrote:
>> - “The UTF-8 decoding fast path for ASCII only characters was removed
>> and replaced with a memcpy if the entire string is ASCII.”
>> The fast path would still be useful for mostly-ASCII strings, which
>> are extremely common (unless UTF-8 has become a no-op?).
>
> Is it really extremely common to have strings that are mostly-ASCII but
> not completely ASCII? I would agree that pure ASCII strings are
> extremely common.
Mostly ascii is pretty common for western-european languages (French, for
instance, is probably 90 to 95% ascii). It's also a risk in english, when
the writer "correctly" spells foreign words (résumé and the like).
More information about the Python-Dev
mailing list