Unicode equality from raw_input

Karen Tracey kmtracey at gmail.com
Sat Oct 11 22:50:42 EDT 2008


2008/10/11 Damian Johnson <atagar1 at gmail.com>

> Hi, when getting text via the raw_input method it's always a string (even
> if it contains non-ASCII characters). The problem lies in that whenever I
> try to check equality against a Unicode string it fails. I've tried using
> the unicode method to 'cast' the string to the Unicode type but this throws
> an exception:
>

Python needs to know the encoding of the bytestring in order to convert it
to unicode.  If you don't specify an encoding, ascii is assumed, which
doesn't work for any bytestrings that actually contain non-ASCII data.
Since you are reading the string from standard input, try using the encoding
associated with stdin:

>>> a = raw_input("text: ")
text: おはよう
>>> b = u"おはよう"
>>> import sys
>>> unicode(a,sys.stdin.encoding) == b
True

Karen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20081011/16c970ed/attachment-0001.html>


More information about the Python-list mailing list