[Tutor] bogus characters in a windows file

Thu Feb 9 03:09:25 CET 2012

On Wed, Feb 8, 2012 at 5:46 PM, Garry Willgoose <
garry.willgoose at newcastle.edu.au> wrote:

> I'm reading a file output by the system utility WMIC in windows (so I can
> track CPU usage by process ID) and the text file WMIC outputs seems to have
> extra characters in I've not seen before.
>
> I use os.system('WMIC /OUTPUT:c:\cpu.txt PROCESS GET ProcessId') to output
> the file and parse file c:\cpu.txt
>
> The first few lines of the file look like this in notepad
>
> ProcessId
> 0
> 4
> 568
> 624
> 648
>
>
> I input the data with the lines
>
> infile = open('c:\cpu.txt','r')
> infile.readline()
> infile.readline()
> infile.readline()
>
> the readline()s yield the following output
>
> '\xff\xfeP\x00r\x00o\x00c\x00e\x00s\x00s\x00I\x00d\x00 \x00 \x00\r\x00\n'
> '\x000\x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00\r\x00\n'
> '\x004\x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00\r\x00\n'
>
> Now for the first line the title 'ProcessId' is in this string but the
> individual characters are separated by '\x00' and at least for the first
> line of the file there is an extra '\xff\xfe'. For subsequent its just
> '\x00. Now I can just replace the '\x**' with '' but that seems a bit
> inelegant. I've tried various options on the open 'rU' and 'rb' but no
> effect.
>
> Does anybody know what the rubbish characters are and what has caused the.
> I'm using the latest Enthought python if that matters.
>
> You're trying to read a Unicode text file byte-by-byte.  It'll end in
tears...
The "\xff\xfe" at the beginning is the Byte Order Marker or BOM.

Here's a quick primer on Unicode:
http://www.joelonsoftware.com/articles/Unicode.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20120208/6c45a6fb/attachment.html>