Unicode list

Georg Brandl g.brandl at gmx.net
Sun Apr 1 06:19:13 EDT 2007


Rehceb Rotkiv schrieb:
> Hello,
> 
> I have this little grep-like program:
> 
> ++++++++++snip++++++++++
> #!/usr/bin/python
> 
> import sys
> import re
> 
> pattern = sys.argv[1]
> inputfile = file(sys.argv[2], 'r')
> 
> for line in inputfile:
>     matches = re.findall(pattern, line)
>     if matches:
>         print matches
> ++++++++++snip++++++++++
> 
> Like this, the program prints some characters as strange escape 
> sequences, which is due to the input file being encoded in utf-8

As Paul said, your terminal is likely set to iso-8859 encoding, which
is why it doesn't display UTF-8 correctly. The above program produces
correct UTF-8 output.

What you could do is:
1. read the file in as unicode
2. print the unicode to the terminal (will use the terminal encoding) or
    convert the unicode to strings with an explicit encoding before printing

codecs.open() is very helpful for step 1, BTW.

Georg




More information about the Python-list mailing list