How to pass Chinese characters as command-line arguments?

kj no.email at please.post
Sun Jan 31 14:35:51 EST 2010


In <7slr5iFe66U1 at mid.uni-berlin.de> "Diez B. Roggisch" <deets at nospam.web.de> writes:

>Am 31.01.10 16:52, schrieb kj:
>> I want to pass Chinese characters as command-line arguments to a
>> Python script.  My terminal has no problem displaying these
>> characters, and passing them to the script, but I can't get Python
>> to understand them properly.
>>
>> E.g. if I pass one such character to the simple script
>>
>> import sys
>> print sys.argv[1]
>> print type(sys.argv[1])
>>
>> the first line of the output looks fine (identical to the input),
>> but the second line says "<type 'str'>".  If I add the line
>>
>> arg = unicode(sys.argv[1])
>>
>> I get the error
>>
>> Traceback (most recent call last):
>>    File "kgrep.py", line 4, in<module>
>>      arg = unicode(sys.argv[1])
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 0: ordinal not in range(128)
>>
>> What must I do to get Python to recognize command-line arguments
>> as utf-8 Unicode?

>The last sentence reveals your problem: utf-8 is *not* unicode. It's an 
>encoding of unicode, which is a crucial difference.

> From the outside you get byte-streams, and if these happen to be 
>encoded in utf-8, you can simply decode them:

>arg = unicode(sys.argv[1], "utf-8")

Thanks!

kynn



More information about the Python-list mailing list