Handling text lines from files with some (few) starnge chars

Paulo da Silva psdasilva.nospam at netcabonospam.pt
Sat Jun 5 19:03:24 EDT 2010


I need to read text files and process each line using string
comparisions and regexp.

I have a python2 program that uses <file object>.readline to read each
line as a string. Then, processing it was a trivial job.

With python3 I got error messagew like:
File "./pp1.py", line 93, in RL
    line=inf.readline()
  File "/usr/lib64/python3.1/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position
4963-4965: invalid data

How do I handle this?

If I use <file object>.read from an open as binary file I got a <bytes>
object. Then how do I handle it? Reg exps, comparisions with strings, ?...

Thanks for any help.



More information about the Python-list mailing list