[Tutor] extra characters in XML
Kent Johnson
kent37 at tds.net
Wed Jan 11 17:08:22 CET 2006
Ben Vinger wrote:
> Found the following (which solved the problem, though
> not on the console) at
> http://www.jorendorff.com/articles/unicode/python.html
>
> import codecs
> # Open a UTF-8 file in read mode
> infile = codecs.open("infile.txt", "r", "utf-8")
> # Read its contents as one large Unicode string.
> text = infile.read()
> # Close the file.
> infile.close()
>
> The same function is used to open a file for writing;
> just use "w" (write) or "a" (append) as the second
> argument.
Ben,
Most likely your XML file is 16-bit unicode, not utf-8. When ascii text
is represented as unicode, every other byte will be a null byte. That is
the extra character that shows up as a space or box depending on who is
interpreting it. The utf-8 codec must be swallowing the null bytes.
The first line of your XML should show what encoding it is if it is
different from utf-8. What is in that line?
In your code above, instead of utf-8 try utf_16_be and utf_16_le, one of
them should work.
Kent
>
>
> --- Ben Vinger <benvinger at yahoo.co.uk> wrote:
>
>
>>Hello all
>>
>>I want to do the following in an XML file:
>>
>>XFile = open(XmlFile,'r')
>>for line in XFile.readlines():
>> if line.find('<end_time>') > 0:
>> print line
>>
>>However, it does not work due to extra characters
>>that
>>appear in the XML file. For example if I use the
>>previous code without the if condition, on a console
>>it looks like:
>>< e n d _ t i m e >
>>And if you output that to a text file and open that
>>in
>>a text editor, the text editor shows a square
>>instead
>>of a space in between every character. What is
>>going
>>on?
>>
>>Thanks
>>Ben
>>
>>
>>
>>
>>
>>
>
> ___________________________________________________________
>
>>Yahoo! Photos – NEW, now offering a quality print
>>service from just 8p a photo
>>http://uk.photos.yahoo.com
>>_______________________________________________
>>Tutor maillist - Tutor at python.org
>>http://mail.python.org/mailman/listinfo/tutor
>>
>
>
>
>
>
>
>
> ___________________________________________________________
> Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
More information about the Tutor
mailing list