[Tutor] ElementTree: finding a tag with specific attribute

Kent Johnson kent37 at tds.net
Sat Sep 17 00:48:57 CEST 2005


Kent Johnson wrote:
> Bernard Lebel wrote:
> 
>>Hello,
>>
>>With ElementTree, can you search a tag under an Element by not only
>>specifying the tag label, but also some tag attribute values? That
>>would be in the case where I have several tags with the same label but
>>with various attribute values.
> 
> FLASH: I just found PDIS XPath which evaluates XPath expressions against ElementTree trees!
> http://pdis.hiit.fi/pdis/download/

Here is a complete program that reads your file and uses pdis.xpath to dig out the value of a single parameter:

from elementtree import ElementTree
from pdis.xpath import compile

doc = ElementTree.parse('Camera_Root_bernard.xml')

path = compile('/root/sceneobject[@type="CameraRoot"]/localproperties/property[@name="Visibility"]/parameters/parameter[@scriptname="shdw"]')

node = path.evaluate(doc.getroot())[0]
print node
print node.text

>  
> 
>>I'm looking for something a bit like BeautifulSoup, like:
>>
>>oTag = oElement.find( 'taglabel', { 'value' : 'xx' } )
>>
>>
>>Btw in case you wonder, I don't use BeautifulSoup because somehow it
>>takes 20-30 seconds to parse a 2000-line xml file, and I don't know
>>why. ElementTree is proving very performing.
> 
> 
> Would you send me privately a copy of your file and your code that reads it with BS? I'm curious why this takes so long.

I took a bit of a look at this using the Python profiler. If anyone is interested, here is the main program to generate the profile results:

import BeautifulSoup, profile

sFile = r'Camera_Root_bernard.xml'

def reader():
    oFile = file( sFile, 'r' )
    oSoup = BeautifulSoup.BeautifulStoneSoup( oFile.read() )

profile.run('reader()', 'profile.out')


This creates a file called profile.out that can be analyzed with pstats.Stats:
 >>> from pstats import Stats
 >>> s=Stats('profile.out')
 >>> s.sort_stats('cum')
<pstats.Stats instance at 0x009BF918>
 >>> s.print_stats()

Here is an excerpt from the output. It doesn't work very well in email unfortunately. The most notable thing is the staggering number of times some functions are called. The first column (ncalls) is the total number of calls of a function. The second column (tottime) is the total time spent in the function, not counting the time spent in lower-level functions.

If you look at the list, for a while the functions are being called 777 times. This is probably the number of start tags in the document. But when you get to recursiveChildGenerator(), all of a sudden it is called 898655 times, over 1000 times for each call to _fetch()! This is a staggering number of calls, it is called 8 times for every character in the file!

I gave up trying to understand why this is happening, I would need to spend more time understanding the code...

Kent

Fri Sep 16 17:12:22 2005    profile.out

         9095825 function calls (9095048 primitive calls) in 80.402 CPU seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   80.402   80.402 profile:0(reader())
        1    0.000    0.000   80.398   80.398 <string>:1(?)
        1    0.002    0.002   80.397   80.397 F:\Tutor\bsTest\reader.py:5(reader)
        1    0.000    0.000   80.395   80.395 C:\Python24\lib\site-packages\BeautifulSoup.py:633(__init__)
        1    0.000    0.000   80.395   80.395 C:\Python24\lib\site-packages\BeautifulSoup.py:687(feed)
        1    0.000    0.000   80.381   80.381 C:\Python24\lib\sgmllib.py:86(feed)
        1    0.041    0.041   80.381   80.381 C:\Python24\lib\sgmllib.py:107(goahead)
      777    0.123    0.000   80.093    0.103 C:\Python24\lib\sgmllib.py:229(parse_starttag)
2331/1554    0.057    0.000   79.989    0.051 :0(getattr)
      777    0.013    0.000   79.844    0.103 C:\Python24\lib\sgmllib.py:304(finish_starttag)
      777    0.024    0.000   79.763    0.103 C:\Python24\lib\site-packages\BeautifulSoup.py:817(unknown_starttag)
      777    0.079    0.000   79.646    0.103 C:\Python24\lib\site-packages\BeautifulSoup.py:769(_smartPop)
     3108    0.051    0.000   79.575    0.026 C:\Python24\lib\site-packages\BeautifulSoup.py:676(__getattr__)
      777    0.019    0.000   79.496    0.102 C:\Python24\lib\site-packages\BeautifulSoup.py:348(__getattr__)
      777    0.010    0.000   79.467    0.102 C:\Python24\lib\site-packages\BeautifulSoup.py:467(first)
      777    0.014    0.000   79.456    0.102 C:\Python24\lib\site-packages\BeautifulSoup.py:477(fetch)
      777   10.923    0.014   79.443    0.102 C:\Python24\lib\site-packages\BeautifulSoup.py:168(_fetch)
   898655   21.556    0.000   38.801    0.000 C:\Python24\lib\site-packages\BeautifulSoup.py:525(recursiveChildGenerator)
   301476   10.316    0.000   23.791    0.000 C:\Python24\lib\site-packages\BeautifulSoup.py:233(_matches)
  2998523   14.816    0.000   14.816    0.000 :0(isinstance)
   602953    4.356    0.000    6.985    0.000 C:\Python24\lib\site-packages\BeautifulSoup.py:541(isList)
  1206683    5.852    0.000    5.852    0.000 :0(hasattr)
   905237    3.231    0.000    3.231    0.000 :0(len)
   601431    3.022    0.000    3.022    0.000 :0(range)
   599875    2.201    0.000    2.201    0.000 :0(pop)
   605655    2.153    0.000    2.153    0.000 :0(append)
   301476    1.080    0.000    1.080    0.000 :0(callable)



More information about the Tutor mailing list