[Tutor] Extracting xml text

Karim karim.liateni at free.fr
Sun Jun 20 10:24:56 CEST 2010


Hello Stefan,

I know you are promoting Etree and I am very interesting in it.
Is there any chance to have it integrated in future standard Python version?

Regards
Karim

On 06/20/2010 10:14 AM, Stefan Behnel wrote:
> T.R. D., 20.06.2010 08:03:
>> I'm trying to parse a list of xml strings and so far it looks like the
>> xml.parsers.expat is the way to go but I'm not quite sure how it works.
>>
>> I'm trying to parse something similar to the following.  I'd like to 
>> collect
>> all headings and bodies and associate them in a variable (dictionary for
>> example). How would I use the expat class to do this?
>
> Well, you *could* use it, but I *would* not recommend it. :)
>
>
>> <note>
>> <to>Tove</to>
>> <from>Jani</from>
>> <heading>Reminder</heading>
>> <body>Don't forget me this weekend!</body>
>> </note>
>>
>> <note>
>> <to>Jani</to>
>> <from>Tovi</from>
>> <heading>Reminder 2</heading>
>> <body>Don't forget to bring snacks!</body>
>> </note>
>
> Use ElementTree's iterparse:
>
>     from xml.etree.cElementTree import iterparse
>
>     for _, element in iterparse("the_file.xml"):
>         if element.tag == 'note':
>             # find text in interesting child elements
>             print element.findtext('heading'), element.findtext('body')
>
>             # safe some memory by removing the handled content
>             element.clear()
>
> iterparse() iterates over parser events, but it builds an in-memory 
> XML tree while doing so. That makes it trivial to find things in the 
> stream. The above code receives an event whenever a tag closes, and 
> starts working when the closing tag is a 'note' element, i.e. when the 
> complete subtree of the note element has been parsed into memory.
>
> Stefan
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list