[Tutor] bogus characters in a windows file

Peter Otten __peter__ at web.de
Thu Feb 9 10:15:22 CET 2012


Garry Willgoose wrote:

> I input the data with the lines
> 
> infile = open('c:\cpu.txt','r')
> infile.readline()
> infile.readline()
> infile.readline()
> 
> the readline()s yield the following output
> 
> '\xff\xfeP\x00r\x00o\x00c\x00e\x00s\x00s\x00I\x00d\x00 \x00 \x00\r\x00\n'
> '\x000\x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00\r\x00\n'
> '\x004\x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00\r\x00\n'

You were already told that you are trying to read a UTF-16-encoded file. 
Here's how to deal with that:

>>> import codecs
>>> with codecs.open("cpu.txt", "rU", encoding="UTF-16") as f:
...     for line in f:
...             print line.rstrip("\n")
...
ProcessId
0
4




More information about the Tutor mailing list