How to pass Chinese characters as command-line arguments?

Diez B. Roggisch deets at nospam.web.de
Sun Jan 31 11:58:25 EST 2010


Am 31.01.10 16:52, schrieb kj:
> I want to pass Chinese characters as command-line arguments to a
> Python script.  My terminal has no problem displaying these
> characters, and passing them to the script, but I can't get Python
> to understand them properly.
>
> E.g. if I pass one such character to the simple script
>
> import sys
> print sys.argv[1]
> print type(sys.argv[1])
>
> the first line of the output looks fine (identical to the input),
> but the second line says "<type 'str'>".  If I add the line
>
> arg = unicode(sys.argv[1])
>
> I get the error
>
> Traceback (most recent call last):
>    File "kgrep.py", line 4, in<module>
>      arg = unicode(sys.argv[1])
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 0: ordinal not in range(128)
>
> What must I do to get Python to recognize command-line arguments
> as utf-8 Unicode?

The last sentence reveals your problem: utf-8 is *not* unicode. It's an 
encoding of unicode, which is a crucial difference.

 From the outside you get byte-streams, and if these happen to be 
encoded in utf-8, you can simply decode them:

arg = unicode(sys.argv[1], "utf-8")

Diez



More information about the Python-list mailing list