Unicode equality from raw_input

Chris Rebert clp at rebertia.com
Sat Oct 11 22:36:00 EDT 2008


In order to convert a byte sequence to Unicode, Python needs to know
the encoding being used. When you don't specify a encoding, it tries
ASCII, which obviously errors if your byte sequence isn't ASCII, like
in your case.

Figure out what encoding your terminal/system is set to, then use the
.decode() method to change the bytes to a unicode object. E.g.:

bytestring = raw_input("text: ")
as_unicode = bytestring.decode('utf8') #assuming the encoding is UTF-8
print as_unicode == u"おはよう" #==> True

Cheers,
Chris
-- 
Follow the path of the Iguana...
http://rebertia.com


2008/10/11 Damian Johnson <atagar1 at gmail.com>:
> Hi, when getting text via the raw_input method it's always a string (even if
> it contains non-ASCII characters). The problem lies in that whenever I try
> to check equality against a Unicode string it fails. I've tried using the
> unicode method to 'cast' the string to the Unicode type but this throws an
> exception:
>
>>>> a = raw_input("text: ")
> text: おはよう
>>>> b = u"おはよう"
>>>> a == b
> __main__:1: UnicodeWarning: Unicode equal comparison failed to convert both
> arguments to Unicode - interpreting them as being unequal
> False
>>>> type(a)
> <type 'str'>
>>>> type(b)
> <type 'unicode'>
>>>> unicode(a)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0:
> ordinal not in range(128)
>>>> str(b)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3:
> ordinal not in range(128)
>
>
> After a couple hours of hair pulling I think it's about time to admit
> defeat. Any help would be appreciated! -Damian
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>


More information about the Python-list mailing list