[Python-Dev] PEP 393 Summer of Code Project

Xavier Morel python-dev at masklinn.net
Tue Aug 23 11:46:12 CEST 2011


On 2011-08-23, at 10:55 , Martin v. Löwis wrote:
>> - “The UTF-8 decoding fast path for ASCII only characters was removed
>>  and replaced with a memcpy if the entire string is ASCII.” 
>>  The fast path would still be useful for mostly-ASCII strings, which
>>  are extremely common (unless UTF-8 has become a no-op?).
> 
> Is it really extremely common to have strings that are mostly-ASCII but
> not completely ASCII? I would agree that pure ASCII strings are
> extremely common.
Mostly ascii is pretty common for western-european languages (French, for
instance, is probably 90 to 95% ascii). It's also a risk in english, when
the writer "correctly" spells foreign words (résumé and the like).


More information about the Python-Dev mailing list