[Tutor] extra characters in XML

Kent Johnson kent37 at tds.net
Wed Jan 11 17:08:22 CET 2006


Ben Vinger wrote:
> Found the following (which solved the problem, though
> not on the console) at
> http://www.jorendorff.com/articles/unicode/python.html
> 
>   import codecs
>   # Open a UTF-8 file in read mode
>   infile = codecs.open("infile.txt", "r", "utf-8")
>   # Read its contents as one large Unicode string.
>   text = infile.read()
>   # Close the file.
>   infile.close()
> 
> The same function is used to open a file for writing;
> just use "w" (write) or "a" (append) as the second
> argument.

Ben,

Most likely your XML file is 16-bit unicode, not utf-8. When ascii text 
is represented as unicode, every other byte will be a null byte. That is 
the extra character that shows up as a space or box depending on who is 
interpreting it. The utf-8 codec must be swallowing the null bytes.

The first line of your XML should show what encoding it is if it is 
different from utf-8. What is in that line?

In your code above, instead of utf-8 try utf_16_be and utf_16_le, one of 
them should work.

Kent

> 
> 
> --- Ben Vinger <benvinger at yahoo.co.uk> wrote:
> 
> 
>>Hello all
>>
>>I want to do the following in an XML file:
>>
>>XFile = open(XmlFile,'r')
>>for line in XFile.readlines():
>>    if line.find('<end_time>') > 0:
>>      print line
>>
>>However, it does not work due to extra characters
>>that
>>appear in the XML file.  For example if I use the
>>previous code without the if condition, on a console
>>it looks like:
>>< e n d _ t i m e >
>>And if you output that to a text file and open that
>>in
>>a text editor, the text editor shows a square
>>instead
>>of a space in between every character.  What is
>>going
>>on?
>>
>>Thanks
>>Ben
>>
>>
>>
>>
>>		
>>
> 
> ___________________________________________________________
> 
>>Yahoo! Photos – NEW, now offering a quality print
>>service from just 8p a photo
>>http://uk.photos.yahoo.com
>>_______________________________________________
>>Tutor maillist  -  Tutor at python.org
>>http://mail.python.org/mailman/listinfo/tutor
>>
> 
> 
> 
> 
> 	
> 	
> 		
> ___________________________________________________________ 
> Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 
> 




More information about the Tutor mailing list