codecs UTF-8 StreamReader ignores errors arg

wade at lightlink.com wade at lightlink.com
Tue Dec 5 07:20:12 EST 2000


I've run into a problem using the codecs module to decode data that is
mostly UTF-8, but with some bogus characters thrown in. The
StreamReader class seems to ignore an 'errors' argument passed to its
constructor, so it uses the default, which is 'strict'.

A short session illustrating the problem is shown below. Any advice
appreciated.

Wade Leftwich
Ithaca, NY

------------------------------------

>>> import codecs
>>> from StringIO import StringIO
>>> encode, decode, reader, writer = codecs.lookup('UTF-8')
>>> s = 'ab\346c'
>>> decode(s, 'replace')
(u'ab\uFFFDc', 4)
>>> fh = StringIO(s)
>>> sr = reader(fh, 'replace')
>>> sr.read()
Traceback (innermost last):
  File "<interactive input>", line 1, in ?
  File "c:\python20\lib\codecs.py", line 208, in read
    return self.decode(self.stream.read())[0]
UnicodeError: UTF-8 decoding error: unexpected end of data


Sent via Deja.com http://www.deja.com/
Before you buy.



More information about the Python-list mailing list