Using codecs.EncodedFile() with Python 2.5

David Hughes dfh at forestfield.co.uk
Wed Jan 3 06:45:08 EST 2007


I used this function successfully with Python 2.4 to alter the encoding
of a set of database records from latin-1 to utf-8, but the same
program raises an exception using Python 2.5. This small example shows
the problem:

import codecs
fo = open('test.dat', 'w')
fo.write('G\xe2teaux')
fo.close()

fi = open("test.dat",'r')
fx = codecs.EncodedFile(fi, 'utf-8', 'latin-1')
astring = fx.readline()
print astring
ustring = unicode(astring, 'utf-8' )
print repr(ustring)
print ustring.encode('latin-1')
print ustring.encode('utf-8')

Python 2.4 gives:

Gâteaux
u'G\xe2teaux'
Gâteaux
Gâteaux

which I believe is correct, while 2.5 produces

Traceback (most recent call last):
  File "test_codec.py", line 8, in <module>
    astring = fx.readline()
  File "C:\Python25\lib\codecs.py", line 709, in readline
    data = self.reader.readline()
  File "C:\Python25\lib\codecs.py", line 471, in readline
    data = self.read(readsize, firstline=True)
  File "C:\Python25\lib\codecs.py", line 418, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3:
invalid data

Is there a genuine problem here, or have I been misusing this function?
--
Regards
David Hughes




More information about the Python-list mailing list