read a file and remove Mojibake chars

Ben Finney ben+python at benfinney.id.au
Thu Apr 7 05:40:46 EDT 2016


Daiyue Weng <daiyueweng at gmail.com> writes:

> Hi, when I read a file, the file string contains Mojibake chars at the
> beginning

You are explicitly setting an encoding to read the file; that is good,
since Python should not guess the input encoding.

The reason it's good is because the issue, of knowing the correct text
encoding, is dealt with immediately. I am guessing the text encoding may
be not as you expect.

Are you certain the text encoding is “utf-8”? Can you verify that with
whatever created the file — what text encoding does it use to write that
file?

-- 
 \      “Advertising is the price companies pay for being unoriginal.” |
  `\                —Yves Béhar, _New York Times_ interview 2010-12-30 |
_o__)                                                                  |
Ben Finney




More information about the Python-list mailing list