[Tutor] Extracting xml text
Karim
karim.liateni at free.fr
Sun Jun 20 10:24:56 CEST 2010
Hello Stefan,
I know you are promoting Etree and I am very interesting in it.
Is there any chance to have it integrated in future standard Python version?
Regards
Karim
On 06/20/2010 10:14 AM, Stefan Behnel wrote:
> T.R. D., 20.06.2010 08:03:
>> I'm trying to parse a list of xml strings and so far it looks like the
>> xml.parsers.expat is the way to go but I'm not quite sure how it works.
>>
>> I'm trying to parse something similar to the following. I'd like to
>> collect
>> all headings and bodies and associate them in a variable (dictionary for
>> example). How would I use the expat class to do this?
>
> Well, you *could* use it, but I *would* not recommend it. :)
>
>
>> <note>
>> <to>Tove</to>
>> <from>Jani</from>
>> <heading>Reminder</heading>
>> <body>Don't forget me this weekend!</body>
>> </note>
>>
>> <note>
>> <to>Jani</to>
>> <from>Tovi</from>
>> <heading>Reminder 2</heading>
>> <body>Don't forget to bring snacks!</body>
>> </note>
>
> Use ElementTree's iterparse:
>
> from xml.etree.cElementTree import iterparse
>
> for _, element in iterparse("the_file.xml"):
> if element.tag == 'note':
> # find text in interesting child elements
> print element.findtext('heading'), element.findtext('body')
>
> # safe some memory by removing the handled content
> element.clear()
>
> iterparse() iterates over parser events, but it builds an in-memory
> XML tree while doing so. That makes it trivial to find things in the
> stream. The above code receives an event whenever a tag closes, and
> starts working when the closing tag is a 'note' element, i.e. when the
> complete subtree of the note element has been parsed into memory.
>
> Stefan
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
More information about the Tutor
mailing list