[Tutor] Python XML for newbie
Stefan Behnel
stefan_ml at behnel.de
Mon Jul 2 18:49:59 CEST 2012
Peter Otten, 02.07.2012 09:57:
> Sean Carolan wrote:
>>> Thank you, this is helpful. Minidom is confusing, even the
>>> documentation confirms this:
>>> "The name of the functions are perhaps misleading...."
Yes, I personally think that (Mini)DOM should be locked away from beginners
as far as possible.
>> Ok, so I read through these tutorials and am at least able to print
>> the XML output now. I did this:
>>
>> doc = etree.parse('computer_books.xml')
>>
>> and then this:
>>
>> for elem in doc.iter():
>> print elem.tag, elem.text
>>
>> Here's the data I'm interested in:
>>
>> index 1
>> field 11
>> value 9780596526740
>> datum
>>
>> How do you say, "If the field is 11, then print the next value"? The
>> raw XML looks like this:
>>
>> <datum>
>> <index>1</index>
>> <field>11</field>
>> <value>9780470286975</value>
>> </datum>
>>
>> Basically I just want to pull all these ISBN numbers from the file.
>
> With http://lxml.de/ you can use xpath:
>
> $ cat computer_books.xml
> <foo>
> <bar>
> <datum>
> <index>1</index>
> <field>11</field>
> <value>9780470286975</value>
> </datum>
> </bar>
> </foo>
> $ cat read_isbn.py
> from lxml import etree
>
> root = etree.parse("computer_books.xml")
> print root.xpath("//datum[field=11]/value/text()")
> $ python read_isbn.py
> ['9780470286975']
> $
And lxml.objectify is also a nice tool for this:
$ cat example.xml
<items>
<item>
<id>108</id>
<data>
<datum>
<index>1</index>
<field>2</field>
<value>Essential System Administration</value>
</datum>
</data>
</item>
</items>
$ python
Python 2.7.3
>>> from lxml import objectify
>>> t = objectify.parse('example.xml')
>>> for datum in t.iter('datum'):
... if datum.field == 2:
... print(datum.value)
...
Essential System Administration
>>>
It's not impossible that this is faster than the XPath version, but that
depends a lot on the data.
Stefan
More information about the Tutor
mailing list