[XML-SIG] PyXML XPath woes

Matt Patterson list-matt at reprocessed.org
Sat Feb 7 14:33:37 EST 2004


Hello,

I've got an XML file in which I want to locate all elements with the 
attribute boundary set to 'true'. I use the following XPath with 4DOM:

//*[@boundary='true']

like so:
boundaryFinder = Compile("//*[@boundary='true']")
context = Context(self.document)
# evaluate the expression and get a nodeList
boundaryNodes = boundaryFinder.evaluate(context)

But the results of the XPath do not return all the nodes which match!

The file I'm parsing is an external entity which I've reached by 
parsing it as if it were stand-alone XML and not by following an entity 
reference from a different XML doc. It's well-formed, but missing its 
<?xml version="1.0"?> declaration (it's a FrameMaker generated entity: 
Frame outputs multi-document projects ('books') as a single XML file 
with all the component documents referenced as entities).

The project I'm working on involves paginating large XML files in an 
arbitrary way using DOM Range. To figure out where the page boundaries 
lie I'm using XPath to locate the nodes which cause a new page to 
start. The boundary="true" attributes are added in a pre-processing 
step

Can anyone shed any light onto why only some of the boundary="true" 
nodes are being found?

The files in question are here:

http://www.emdash.co.uk/opf/12_%20Chldrns_edction.e12
http://www.emdash.co.uk/opf/tmp603_12_%20Chldrns_edction.e12

Thanks,

Matt

As an aside, I originally did the whole thing using PyXML but the XPath 
were too complex for it (example below) and it would return no results! 
I now pre-process the file by running with the complex XPaths through 
libxslt to add the boundary attributes using the compkex XPath, and 
then searching for the attributes with PyXML. This is the XPath to find 
the boundary nodes without help:

//H1[ancestor::boxtexttable = false()][ancestor::casestudy = 
false()][ancestor::casetexttable = false()][ancestor::checklist = 
false()]|//H2[ancestor::boxtexttable = false()][ancestor::casestudy = 
false()][ancestor::casetexttable = false()][ancestor::checklist = 
false()][preceding-sibling::*[1][name() != 
'H1']]|//H2[ancestor::boxtexttable = false()][ancestor::casestudy = 
false()][ancestor::casetexttable = false()][ancestor::checklist = 
false()][count(preceding-sibling::*) = 0]|//H3[ancestor::boxtexttable = 
false()][ancestor::casestudy = false()][ancestor::casetexttable = 
false()][ancestor::checklist = false()][preceding-sibling::*[1][name() 
!= 'H2'][name() != 'H1']]|//H3[ancestor::boxtexttable = 
false()][ancestor::casestudy = false()][ancestor::casetexttable = 
false()][ancestor::checklist = false()][count(preceding-sibling::*) = 
0]

-- 
    Matt Patterson | Typographer
    <matt at emdash.co.uk> | http://www.emdash.co.uk/
    <matt at reprocessed.org> | http://reprocessed.org/




More information about the XML-SIG mailing list