[Tutor] bogus characters in a windows file

Marc Tompkins marc.tompkins at gmail.com
Thu Feb 9 03:54:11 CET 2012


On Wed, Feb 8, 2012 at 6:09 PM, Marc Tompkins <marc.tompkins at gmail.com>wrote:

> On Wed, Feb 8, 2012 at 5:46 PM, Garry Willgoose <
> garry.willgoose at newcastle.edu.au> wrote:
>
>> I'm reading a file output by the system utility WMIC in windows (so I can
>> track CPU usage by process ID) and the text file WMIC outputs seems to have
>> extra characters in I've not seen before.
>>
>> I use os.system('WMIC /OUTPUT:c:\cpu.txt PROCESS GET ProcessId') to
>> output the file and parse file c:\cpu.txt
>>
>> The first few lines of the file look like this in notepad
>>
>> ProcessId
>> 0
>> 4
>> 568
>> 624
>> 648
>>
>>
>> I input the data with the lines
>>
>> infile = open('c:\cpu.txt','r')
>> infile.readline()
>> infile.readline()
>> infile.readline()
>>
>> the readline()s yield the following output
>>
>> '\xff\xfeP\x00r\x00o\x00c\x00e\x00s\x00s\x00I\x00d\x00 \x00 \x00\r\x00\n'
>> '\x000\x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00\r\x00\n'
>> '\x004\x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00\r\x00\n'
>>
>> Now for the first line the title 'ProcessId' is in this string but the
>> individual characters are separated by '\x00' and at least for the first
>> line of the file there is an extra '\xff\xfe'. For subsequent its just
>> '\x00. Now I can just replace the '\x**' with '' but that seems a bit
>> inelegant. I've tried various options on the open 'rU' and 'rb' but no
>> effect.
>>
>> Does anybody know what the rubbish characters are and what has caused
>> the. I'm using the latest Enthought python if that matters.
>>
>> You're trying to read a Unicode text file byte-by-byte.  It'll end in
> tears...
> The "\xff\xfe" at the beginning is the Byte Order Marker or BOM.
>
> Here's a quick primer on Unicode:
> http://www.joelonsoftware.com/articles/Unicode.html
>
> In particular, this phrase:

> "we decided to do everything internally in UCS-2 (two byte) Unicode, which
> is what Visual Basic, COM, and Windows NT/2000/XP use as their native
> string type."
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20120208/1c3ca48c/attachment.html>


More information about the Tutor mailing list