Convert unicode escape sequences to unicode in a file

Jeremy jlconlin at gmail.com
Tue Jan 11 15:53:02 EST 2011


I have a file that has unicode escape sequences, i.e., 

J\u00e9r\u00f4me

and I want to replace all of them in a file and write the results to a new file.  The simple script I've created is copied below.  However, I am getting the following error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 947: ordinal not in range(128)

It appears that the data isn't being converted when writing to the file.  Can someone please help?

Thanks,
Jeremy


if __name__ == "__main__":
    f = codecs.open(filename, 'r', 'unicode-escape')
    lines = f.readlines()
    line = ''.join(lines)
    f.close()

    utFound = re.sub('STRINGDECODE\((.+?)\)', r'\1', line)
    print(utFound[:1000])


    o = open('newDice.sql', 'w')
    o.write(utFound.decode('utf-8'))
    o.close()



More information about the Python-list mailing list