Unicode list
Rehceb Rotkiv
rehceb at no.spam.plz
Sat Mar 31 20:36:20 EDT 2007
Hello,
I have this little grep-like program:
++++++++++snip++++++++++
#!/usr/bin/python
import sys
import re
pattern = sys.argv[1]
inputfile = file(sys.argv[2], 'r')
for line in inputfile:
matches = re.findall(pattern, line)
if matches:
print matches
++++++++++snip++++++++++
Like this, the program prints some characters as strange escape
sequences, which is due to the input file being encoded in utf-8: When I
convert "re.findall..." to a string and wrap an "unicode()" around it,
the matches get printed correctly. Is it possible to make "matches"
unicode without saving it as a single string first? The function "unicode
()" seems only to work for strings. Or is there a general way of telling
Python to abandon the ancient and evil land of iso-8859 for good and use
utf-8 only?
Regards,
Rehceb
More information about the Python-list
mailing list