Using codecs.EncodedFile() with Python 2.5

Peter Otten __peter__ at web.de
Wed Jan 3 07:14:49 EST 2007


David Hughes wrote:

> I used this function successfully with Python 2.4 to alter the encoding
> of a set of database records from latin-1 to utf-8, but the same
> program raises an exception using Python 2.5. This small example shows
> the problem:
> 
> import codecs
> fo = open('test.dat', 'w')
> fo.write('G\xe2teaux')
> fo.close()
> 
> fi = open("test.dat",'r')
> fx = codecs.EncodedFile(fi, 'utf-8', 'latin-1')
> astring = fx.readline()
> print astring
> ustring = unicode(astring, 'utf-8' )
> print repr(ustring)
> print ustring.encode('latin-1')
> print ustring.encode('utf-8')
> 
> Python 2.4 gives:
> 
> Gâteaux
> u'G\xe2teaux'
> Gâteaux
> Gâteaux
> 
> which I believe is correct, while 2.5 produces
> 
> Traceback (most recent call last):
>   File "test_codec.py", line 8, in <module>
>     astring = fx.readline()
>   File "C:\Python25\lib\codecs.py", line 709, in readline
>     data = self.reader.readline()
>   File "C:\Python25\lib\codecs.py", line 471, in readline
>     data = self.read(readsize, firstline=True)
>   File "C:\Python25\lib\codecs.py", line 418, in read
>     newchars, decodedbytes = self.decode(data, self.errors)
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3:
> invalid data
> 
> Is there a genuine problem here, or have I been misusing this function?

This is indeed a bug in Python 2.5. Fixed in subversion.

http://svn.python.org/view/python/trunk/Lib/codecs.py?rev=52517&view=log

Peter




More information about the Python-list mailing list