read a file and remove Mojibake chars

Daiyue Weng daiyueweng at gmail.com
Thu Apr 7 04:47:34 EDT 2016


Hi, when I read a file, the file string contains Mojibake chars at the
beginning, the code is like,

file_str = open(file_path, 'r', encoding='utf-8').read()
print(repr(open(file_path, 'r', encoding='utf-8').read())

part of the string (been printing) containing Mojibake chars is like,

  '锘縶\n "name": "__NAME__"'

I tried to remove the non utf-8 chars using the code,

def read_config_file(fname):
    with open(fname, "r", encoding='utf-8') as fp:
        for line in fp:
            line = line.strip()
            line = line.decode('utf-8','ignore').encode("utf-8")

    return fp.read()

but it doesn't work, so how to remove the Mojibakes in this case?

many thanks



More information about the Python-list mailing list