What extended ASCII character set uses 0x9D?

Ian Kelly ian.g.kelly at gmail.com
Thu Aug 17 20:54:29 EDT 2017


On Thu, Aug 17, 2017 at 6:52 PM, Ian Kelly <ian.g.kelly at gmail.com> wrote:
> On Thu, Aug 17, 2017 at 6:30 PM, John Nagle <nagle at animats.com> wrote:
>> A few more cases:
>>
>> bytearray(b'miguel \xe3\x81ngel santos')
>
> If that were b'\xc3\x81' it would be Á in UTF-8 which would fit the
> rest of the name.
>
>> bytearray(b'\xe5\x81ukasz zmywaczyk')
>
> If that were b'\xc5\x81' it would be Ł in UTF-8 which would fit the
> rest of the name.
>
> I suspect the others contain similar errors. I don't know if it's the
> result of some form of Mojibake or maybe just transcription errors.

Oh shit, I think know what happened. In ASCII you can lower-case
letters by just adding 32 (0x20) to them. Somebody tried to do that
here and fucked up the encoding. That's why all the ASCII letters in
the strings are lower-case while these ones aren't.



More information about the Python-list mailing list