What extended ASCII character set uses 0x9D?

Chris Angelico rosuav at gmail.com
Fri Aug 18 02:49:26 EDT 2017


On Fri, Aug 18, 2017 at 4:38 PM, Paul Rubin <no.email at nospam.invalid> wrote:
> John Nagle <nagle at animats.com> writes:
>> Since, as someone pointed out, there was UTF-8 which had been
>> run through an ASCII-type lower casing algorithm
>
> I spent a few minutes figuring out if some of the mysterious 0x81's
> could be from ASCII-lower-casing some Unicode combining characters, but
> the numbers didn't seem to work out.  Might still be worth looking for
> in some other cases.

They can't be from anything like that. Lower-casing in ASCII consists
of adding 32 (or setting the fifth bit) on certain byte/character
values. Subtracting 32 from 0x81 gives 0x61 which is lower-case letter
'a'; the fifth bit isn't set in 0x81. So there's no way that UTF-8 +
dumb lowercasing could give you 0x81.

ChrisA



More information about the Python-list mailing list