[XML-SIG] serializing with xslt with SAX

Paul Tremblay phthenry at earthlink.net
Sun Feb 15 03:20:01 EST 2004


On Sat, Feb 14, 2004 at 07:07:43PM -0700, Mike Brown wrote:

> 
> In the Python world, SAX is not necessarily the most efficient. For example,
> 4Suite uses Expat to do parsing of serialized XML, and it builds Domlette
> documents from Expat's native callbacks (which are somewhat SAX-like, but
> different). It's more efficient to supply a Domlette to the processor than it
> is to supply an unparsed document or even Expat callbacks. The processor does
> support SAX, and Domlette (as Result Tree Fragment) output, though, so we
> could perhaps write a SAX-to-Expat layer for use in conjunction with the SAX
> XSLT output writer, or we could write an Expat XSLT output writer, but we're
> better off just using our Result Tree Fragment writer, which generates
> Domlette nodes that can be fed directly to the next transformation instance.

I had suspected that the advice, from java gura Michael Kay,
was biased towards java.

> 
> We don't yet have a good chaining API or recipe for 4Suite in general, and in
> researching our capabilities in order to answer this question, Jeremy & I
> found some bugs that have since been fixed in CVS. The code sample below is an
> example that should work with a current CVS snapshot, and is pretty fast,
> although Jeremy points out that Processor re-use is not thoroughly tested and
> the overhead of creating a new Processor instance is minimal in comparison to
> going through all the things that happen when the Processor.reset() is called.


So if creating a new Processor is minimal, I can use this code below?


from Ft.Xml import InputSource
from Ft.Xml.Xslt.Processor import Processor

# first run
document = InputSource.DefaultFactory.fromUri(xmlfile)  
stylesheet = InputSource.DefaultFactory.fromUri(xsltfile)     
processor = Processor()
processor.appendStylesheet(stylesheet)
result = processor.run(document)

# second run. And so on.

document = InputSource.DefaultFactory.fromString(result)  
stylesheet = InputSource.DefaultFactory.fromUri(xsltfile)     
processor = Processor()
processor.appendStylesheet(stylesheet)
result2 = processor.run(document)

I'll have to download a CVS snapshot to test the code below. But I think
I need something more standard, since the scripts I'm working with will
be published. 

I'm coming to the realization that xslt isn't absolutely standard. Trax
was supposed to allow a universal interface. But as of now, it only
works with two processors: saxon and xalan. 

That means if you write an application to process XML with xslt
stylesheets, you will be either using Java or perl/pyton (etc) with C++
libraries. 

By the way, do you know how read and write from a string using libsxlt?
I coudn't find anything on the web on that.

Okay, I have a lot of question on this example.

> from Ft.Xml import InputSource, Domlette
> from Ft.Xml.Xslt import Processor, RtfWriter

I actually don't know what Rtf is, though I keep hearing this term.
> 
> class Test:
>     # we're going to try to reuse the processor
>     p = Processor.Processor()
> 
>     def run(self, src_isrc, chain):
>         i = 0
>         if not chain:
>           return ''
>         for (sty, uri) in chain:
>             sty_isrc = InputSource.DefaultFactory.fromString(sty, uri)
>             self.p.appendStylesheet(sty_isrc)
>             # not on last stylesheet in chain?
>             if i < len(chain) - 1:
>                 # use an RtfWriter
>                 w = RtfWriter.RtfWriter(None, 'urn:temp.xml')

You are setting up an RtfWriter--what is that? 
Why the "urn" prefix?

>                 # not on first stylesheet in chain?
>                 if i:
>                     # use last RtfWriter's buffer as source doc
>                     self.p.execute(result, src_isrc, writer=w)

But here you use p.execute.

>                 else:
>                     # use original source doc
>                     self.p.run(src_isrc, writer=w)


		Okay, so the first time you use p.run. Why is that?

>                 # save result to use as source doc next time
>                 result = w.getResult()

Save to a string

>             # last stylesheet in chain
>             else:
>                 if w:

Why wouldn't the Rtf writer be defined?

>                     result = self.p.execute(result, src_isrc)
>                 else:
>                     result = self.p.run(src_isrc)
>             self.p.reset()
>             i += 1
>         return result
> 
> 
> xml_isrc = InputSource.DefaultFactory.fromString(src_xml, 'urn:hamlet.xml')
> 
> # four 6-letter rotations + a 2-letter rotation and uppercasing
> # should result in a full rotation and uppercasing...
> # expected output is an uppercase version of the Hamlet quotation
> #
> chain = [(xslt1, 'urn:lc-rot6.xsl'),
>          (xslt1, 'urn:lc-rot6.xsl'),
>          (xslt1, 'urn:lc-rot6.xsl'),
>          (xslt1, 'urn:lc-rot6.xsl'),
>          (xslt2, 'urn:lc-rot2-uc.xsl'),
>         ]

Sorry to be dense here, but what does each tupple represent? Is the
first item a name or a path? Is the second item some type of uri address?
> 
> t = Test()
> print t.run(xml_isrc, chain)

Thanks for all your help.

Paul

-- 

************************
*Paul Tremblay         *
*phthenry at earthlink.net*
************************



More information about the XML-SIG mailing list