[XML-SIG] PyXML XPath woes
Matt Patterson
list-matt at reprocessed.org
Sat Feb 7 14:33:37 EST 2004
Hello,
I've got an XML file in which I want to locate all elements with the
attribute boundary set to 'true'. I use the following XPath with 4DOM:
//*[@boundary='true']
like so:
boundaryFinder = Compile("//*[@boundary='true']")
context = Context(self.document)
# evaluate the expression and get a nodeList
boundaryNodes = boundaryFinder.evaluate(context)
But the results of the XPath do not return all the nodes which match!
The file I'm parsing is an external entity which I've reached by
parsing it as if it were stand-alone XML and not by following an entity
reference from a different XML doc. It's well-formed, but missing its
<?xml version="1.0"?> declaration (it's a FrameMaker generated entity:
Frame outputs multi-document projects ('books') as a single XML file
with all the component documents referenced as entities).
The project I'm working on involves paginating large XML files in an
arbitrary way using DOM Range. To figure out where the page boundaries
lie I'm using XPath to locate the nodes which cause a new page to
start. The boundary="true" attributes are added in a pre-processing
step
Can anyone shed any light onto why only some of the boundary="true"
nodes are being found?
The files in question are here:
http://www.emdash.co.uk/opf/12_%20Chldrns_edction.e12
http://www.emdash.co.uk/opf/tmp603_12_%20Chldrns_edction.e12
Thanks,
Matt
As an aside, I originally did the whole thing using PyXML but the XPath
were too complex for it (example below) and it would return no results!
I now pre-process the file by running with the complex XPaths through
libxslt to add the boundary attributes using the compkex XPath, and
then searching for the attributes with PyXML. This is the XPath to find
the boundary nodes without help:
//H1[ancestor::boxtexttable = false()][ancestor::casestudy =
false()][ancestor::casetexttable = false()][ancestor::checklist =
false()]|//H2[ancestor::boxtexttable = false()][ancestor::casestudy =
false()][ancestor::casetexttable = false()][ancestor::checklist =
false()][preceding-sibling::*[1][name() !=
'H1']]|//H2[ancestor::boxtexttable = false()][ancestor::casestudy =
false()][ancestor::casetexttable = false()][ancestor::checklist =
false()][count(preceding-sibling::*) = 0]|//H3[ancestor::boxtexttable =
false()][ancestor::casestudy = false()][ancestor::casetexttable =
false()][ancestor::checklist = false()][preceding-sibling::*[1][name()
!= 'H2'][name() != 'H1']]|//H3[ancestor::boxtexttable =
false()][ancestor::casestudy = false()][ancestor::casetexttable =
false()][ancestor::checklist = false()][count(preceding-sibling::*) =
0]
--
Matt Patterson | Typographer
<matt at emdash.co.uk> | http://www.emdash.co.uk/
<matt at reprocessed.org> | http://reprocessed.org/
More information about the XML-SIG
mailing list