unicode strings and strings mix

Martin v. Löwis loewis at informatik.hu-berlin.de
Tue Jun 18 03:07:44 EDT 2002


Gerhard Häring <gerhard at bigfoot.de> writes:

> 'x' and 'A' are in the ASCII range, so this shouldn't produce an
> exception. I also cannot reproduce it with sys.getdefaultencoding() ==
> "ascii".

These where not 'x' and 'A', but '\xd7\xc1\xd7\xc1\xd7'. Since the
article was posted in KOI8-R, Roman probably meant those bytes to
denote CYRILLIC SMALL LETTER VE and CYRILLIC SMALL LETTER A,
respectively.

Of course, when Python add strings, it can't possibly know that this
is how the byte string was meant to be interpreted, so you need to
write

unichr(0x3345) + unicode('\xd7\xc1\xd7\xc1\xd7', 'koi8-r')

The result string cannot be represented in KOI8-R, though, since it
contains SQUARE MAHHA.

Regards,
Martin



More information about the Python-list mailing list