Unicode list

"Martin v. Löwis" martin at v.loewis.de
Sun Apr 1 02:09:39 EDT 2007


> Like this, the program prints some characters as strange escape 
> sequences, which is due to the input file being encoded in utf-8: When I 
> convert "re.findall..." to a string and wrap an "unicode()" around it, 
> the matches get printed correctly. Is it possible to make "matches" 
> unicode without saving it as a single string first? The function "unicode
> ()" seems only to work for strings. Or is there a general way of telling 
> Python to abandon the ancient and evil land of iso-8859 for good and use 
> utf-8 only?

Python does not live in the ancient and evi land of iso-8859; it lives
in the ancient and evil land of ASCII.

When printing a list, the individual elements are converted with repr(),
not with str(). For a string object, repr() adds escape codes for all
bytes that are not printable ASCII characters. To avoid this call to
repr, you need to iterate over the list yourself, and print it:

    if matches:
        for m in matches:
            print m,
        print

HTH,
Martin



More information about the Python-list mailing list