[Tutor] unicode to plain text conversion

Kent Johnson kent37 at tds.net
Tue Apr 7 02:51:24 CEST 2009


On Mon, Apr 6, 2009 at 6:48 PM, Pirritano, Matthew <MPirritano at ochca.com> wrote:
> Hello python people,
>
> I am a total newbie. I have a very large file > 4GB that I need to
> convert from Unicode to plain text. I used to just use dos when the file
> was < 4GB but it no longer seems to work. Can anyone point me to some
> python code that might perform this function?

What is the encoding of the Unicode file?

Assuming that the file has lines that will each fit in memory, you can
use the codecs module to decode the unicode. Something like this:

import codecs

inp = codecs.open('Unicode_file.txt', 'r', 'utf-16le')
outp = open('new_text_file.txt')
outp.writelines(inp)
inp.close()
outp.close()

The above code assumes UTF-16LE encoding, change it to the correct one
if that is not right. A list of supported encodings is here:
http://docs.python.org/library/codecs.html#id3

Kent


More information about the Tutor mailing list