Problem reading file with umlauts

Claus Hausberger CHausberger at gmx.de
Tue Jul 7 09:59:49 EDT 2009


Hello

I have a text file with is encoding in Latin1 (ISO-8859-1). I can't change that as I do not create those files myself.

I have to read those files and convert the umlauts like ö to stuff like &oumol; as the text files should become html files.

I have this code:


#!/usr/bin/python
# -*- coding: latin1 -*-

import codecs

f = codecs.open('abc.txt', encoding='latin1')

for line in f:
    print line
    for c in line: 
        if c == "ö":
            print "oe"
        else:
            print c


and I get this error message:

$ ./read.py
Abc

./read.py:11: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if c == "ö":
A
b
c



Traceback (most recent call last):
  File "./read.py", line 9, in <module>
    print line
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)




I checked the web and tried several approaches but I also get some strange encoding errors.
Has anyone ever done this before? 
I am currently using Python 2.5 and may be able to use 2.6 but I cannot yet move to 3.1 as many libs we use don't yet work with Python 3.

any help more than welcome.  This has been driving me crazy for two days now.

best wishes

Claus
-- 
Neu: GMX Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
für nur 19,99 Euro/mtl.!* http://portal.gmx.net/de/go/dsl02



More information about the Python-list mailing list