Exended ASCII and code pages [was Re: for / while else doesn't make sense]

Chris Angelico rosuav at gmail.com
Fri May 27 12:16:37 EDT 2016


On Sat, May 28, 2016 at 2:09 AM, Random832 <random832 at fastmail.com> wrote:
> On Fri, May 27, 2016, at 11:53, Rustom Mody wrote:
>> And coding systems are VERY political.
>> Sure what characters are put in (and not) is political
>> But more invisible but equally political is the collating order.
>>
>> eg No one understands what jmf's gripes are... My guess is that a Euro
>> costs 3 times a Dollar.
>>
>> >>> "€".encode("UTF-8")
>> b'\xe2\x82\xac'
>> >>> "$".encode("UTF-8")
>> b'$'
>>
>> [Its another matter that this is not the evil deed of python but of
>> UTF-8!]
>
> AIUI jmf's issue is that python's string type (nothing to do with UTF-8)
> doesn't treat all strings equally. Strings that are only in Latin-1
> (including your dollar example) have only one byte per character,
> whereas strings with BMP characters have two bytes per character (he
> also has some more difficult to understand objections to the large fixed
> overhead and the cached UTF-8 version [which ASCII strings don't have])

The objection, thus, is "some strings perform faster than others do".
The only time that's ever been a serious consideration has been in
cryptography, where timing-based attacks can be used to leech info
about a private key. But this ain't that.

ChrisA



More information about the Python-list mailing list