[XML-SIG] C14N Performacne

Joseph Reagle reagle@w3.org
Tue, 28 May 2002 16:31:21 -0400


Doing XPath selection and C14N has *terrible* performance with PyXML. For a 
100K xml file, it's not even worth it, I'll walk away from the computer 
come back later, and it'll still be working at it. The big problem is the 
XPath evaluation. The default nodeset that is canonicalized is to "blow up" 
the input document/nodeset akin to a "pattern = '(//. | //@* | 
//namespace::*)'" [1] I know this is a very bad (slow) evaluation (this is 
where I time-out) so can anyone suggest an optimization/alternative I can 
use in its place?



test_c14n.py
    ...
        pattern = '(//. | //@* | //namespace::*)'
    ...
    r = PYE()
    dom = r.fromStream(IN)
    context = Context(dom, processorNss=nsdict)
    nodelist = xpath.Evaluate(query, context=context)
    if exclusive:
        Canonicalize(dom, OUT, subset=nodelist, comments=comments, 
unsuppressedPrefixes=pfxlist)
    else:
        Canonicalize(dom, OUT, subset=nodelist, comments=comments) #    
nsdict=nsdict
    OUT.close()

[1] http://www.w3.org/TR/2001/REC-xml-c14n-20010315#DefaultExpression

----------  Forwarded Message  ----------

Subject: Re: Exclusive C14n in Apache
Date: Tue, 28 May 2002 16:24:03 -0400
From: Joseph Reagle <reagle@w3.org>
To: Christian Geuer-Pollmann <geuer-pollmann@nue.et-inf.uni-siegen.de>, 
jboyer@PureEdge.com
Cc: w3c-ietf-xmldsig@w3.org

On Monday 27 May 2002 11:29 am, Christian Geuer-Pollmann wrote:
> Well, the main difference between inclusive and exclusive c14n is that in
> inclusive c14n, I can simply output the changes to the inscope namespace
> decls. In exclusive c14n, I also have to check whether a namespace is
> visibly utilized. That's the additional overhead. But I have to reduce
> this 1.3-1.6 to a lower level of 1.1-1.2.

Yes, so that should mean for every element one tests to see if the
node.prefix or any of the attributes prefixes are the same as the namespace
prefix avaible in its axis. I haven't dug deeply but in the small tests I
can do, this is a neglible difference in performance (and for other
reasons, exc-c14n tends to be slight faster as the size of the document
grows.) However, I'll also say I'm not expert at this...

-------------------------------------------------------

-- 

Joseph Reagle Jr.                 http://www.w3.org/People/Reagle/
W3C Policy Analyst                mailto:reagle@w3.org
IETF/W3C XML-Signature Co-Chair   http://www.w3.org/Signature/
W3C XML Encryption Chair          http://www.w3.org/Encryption/2001/