elementtree XML() unicode

Kee Nethery kee at kagi.com
Tue Nov 3 20:14:28 EST 2009


On Nov 3, 2009, at 4:44 PM, Gabriel Genellina wrote:

> En Tue, 03 Nov 2009 21:01:46 -0300, Kee Nethery <kee at kagi.com>  
> escribió:
>
>> I've removed all the stuff in my code and tried to distill it down  
>> to just what is failing. Hopefully I have not removed something  
>> essential.

Sounds like I did remove something essential.

>
> et expects bytes as input, not unicode. You're decoding too early  
> (decoding early is good, but not in this case, because et does the  
> work for you). Either feed et.XML with the bytes before decoding, or  
> reencode the received xml text in UTF-8 (since this is the declared  
> encoding).

Here is the code that hits the URL:
         getResponse1 = urllib2.urlopen(theUrl)
         getResponse2 = getResponse1.read()
         getResponse3 = unicode(getResponse2,'UTF-8')
	theResponseXml = et.XML(getResponse3)

So are you saying I want to do:
         getResponse1 = urllib2.urlopen(theUrl)
         getResponse4 = getResponse1.read()
	theResponseXml = et.XML(getResponse4)

The reason I am confused is that getResponse2 is classified as an  
"str" in the Komodo IDE. I want to make sure I don't lose the non- 
ASCII characters coming from the URL. If I do the second set of code,  
does elementtree auto convert the str into unicode? How do I deal with  
the XML as unicode when I put it into elementtree as a string?

Very confusing. Thanks for the help.

Kee


More information about the Python-list mailing list