[XML-SIG] C14N Performacne
Joseph Reagle
reagle@w3.org
Tue, 28 May 2002 16:31:21 -0400
Doing XPath selection and C14N has *terrible* performance with PyXML. For a
100K xml file, it's not even worth it, I'll walk away from the computer
come back later, and it'll still be working at it. The big problem is the
XPath evaluation. The default nodeset that is canonicalized is to "blow up"
the input document/nodeset akin to a "pattern = '(//. | //@* |
//namespace::*)'" [1] I know this is a very bad (slow) evaluation (this is
where I time-out) so can anyone suggest an optimization/alternative I can
use in its place?
test_c14n.py
...
pattern = '(//. | //@* | //namespace::*)'
...
r = PYE()
dom = r.fromStream(IN)
context = Context(dom, processorNss=nsdict)
nodelist = xpath.Evaluate(query, context=context)
if exclusive:
Canonicalize(dom, OUT, subset=nodelist, comments=comments,
unsuppressedPrefixes=pfxlist)
else:
Canonicalize(dom, OUT, subset=nodelist, comments=comments) #
nsdict=nsdict
OUT.close()
[1] http://www.w3.org/TR/2001/REC-xml-c14n-20010315#DefaultExpression
---------- Forwarded Message ----------
Subject: Re: Exclusive C14n in Apache
Date: Tue, 28 May 2002 16:24:03 -0400
From: Joseph Reagle <reagle@w3.org>
To: Christian Geuer-Pollmann <geuer-pollmann@nue.et-inf.uni-siegen.de>,
jboyer@PureEdge.com
Cc: w3c-ietf-xmldsig@w3.org
On Monday 27 May 2002 11:29 am, Christian Geuer-Pollmann wrote:
> Well, the main difference between inclusive and exclusive c14n is that in
> inclusive c14n, I can simply output the changes to the inscope namespace
> decls. In exclusive c14n, I also have to check whether a namespace is
> visibly utilized. That's the additional overhead. But I have to reduce
> this 1.3-1.6 to a lower level of 1.1-1.2.
Yes, so that should mean for every element one tests to see if the
node.prefix or any of the attributes prefixes are the same as the namespace
prefix avaible in its axis. I haven't dug deeply but in the small tests I
can do, this is a neglible difference in performance (and for other
reasons, exc-c14n tends to be slight faster as the size of the document
grows.) However, I'll also say I'm not expert at this...
-------------------------------------------------------
--
Joseph Reagle Jr. http://www.w3.org/People/Reagle/
W3C Policy Analyst mailto:reagle@w3.org
IETF/W3C XML-Signature Co-Chair http://www.w3.org/Signature/
W3C XML Encryption Chair http://www.w3.org/Encryption/2001/