Exended ASCII and code pages [was Re: for / while else doesn't make sense]

Random832 random832 at fastmail.com
Fri May 27 12:09:00 EDT 2016


On Fri, May 27, 2016, at 11:53, Rustom Mody wrote:
> And coding systems are VERY political.
> Sure what characters are put in (and not) is political
> But more invisible but equally political is the collating order.
> 
> eg No one understands what jmf's gripes are... My guess is that a Euro
> costs 3 times a Dollar.
> 
> >>> "€".encode("UTF-8")
> b'\xe2\x82\xac'
> >>> "$".encode("UTF-8")
> b'$'
> 
> [Its another matter that this is not the evil deed of python but of
> UTF-8!]

AIUI jmf's issue is that python's string type (nothing to do with UTF-8)
doesn't treat all strings equally. Strings that are only in Latin-1
(including your dollar example) have only one byte per character,
whereas strings with BMP characters have two bytes per character (he
also has some more difficult to understand objections to the large fixed
overhead and the cached UTF-8 version [which ASCII strings don't have])



More information about the Python-list mailing list