problem with unicode

John Machin sjmachin at lexicon.net
Fri Apr 25 08:38:22 EDT 2008


On Apr 25, 9:15 pm, "andreas.prof... at googlemail.com"
<andreas.prof... at googlemail.com> wrote:
> Hi everybody,
>
> I'm using the win32 console and have the following short program
> excerpt
>
> # media is a binary string (mysql escaped zipped file)
>
> >> print media
>
> xワユロ[ヨ ...
> (works)
>
> >> print unicode(media)
>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0x9c in position
> 1: ordinal not in range(128)
> (ok i guess print assumes you want to print to ascii)

Guessing is no substitute for reading the manual.

print has nothing to do with your problem; the problem is
unicode(media) -- as you specified no encoding, it uses the default
encoding, which is ascii [unless you have been mucking about, which is
not recommended]. As the 2nd byte is 0x9c, ascii is going nowhere.


>
> >> print unicode(media).encode('utf-8')
>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0x9c in position
> 1: ordinal not in range(128)
> (why does this not work?)

Already unicode(media) "doesn't work", so naturally(?)
unicode(media).whatever() won't be better -- whatever won't be called.

>
> # mapString is a unicode string (i think at least)>> print "'" + mapString + "'"
>
> ' yu_200703_hello\ 831 v1234.9874 '
>
> >>    mystr = "%s %s" % (mapString, media)
>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0x9c in position
> 1: ordinal not in range(128)
>
> >> mystr = "%s %s" % (mapString.encode('utf-8'), media.encode('utf-8'))
>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0x9c in position
> 1: ordinal not in range(128)

This is merely repeating the original problem.

>
> I don't know what to do. I just want to concatenate two string where
> apparently one is a binary string, the other one is a unicode string
> and I always seem to get this error.
>
> Any help is appreciated :)

We need a clue or two; do this and let us know what it says:

print type(media), repr(media)
print type(mapString), repr(mapString)
import sys; print sys.stdout.encoding

Also you say that "print media" works. Do you mean that it produces
some meaningful text that you understand? What I see on the screen in
Google Groups is the following 6 characters:
LATIN SMALL LETTER X
KATAKANA LETTER WA
KATAKANA LETTER YU
KATAKANA LETTER RO
LEFT SQUARE BRACKET
KATAKANA LETTER YO
Is that what you see?

What is it that you call "win32 console"?



More information about the Python-list mailing list