[XML-SIG] lxml iterparse and comments

Stefan Behnel stefan_ml at behnel.de
Tue Mar 25 22:04:02 CET 2008


Hi,

Stuart McGraw wrote:
>> Stuart McGraw wrote:
>> > I am probably mising something elementary (I am new
>> > to both xml and lxml), but I am having problems figuring
>> > out how to get comments when using lxml's iterparse().
>> > When I parse xml with parse() and iterate though the
>> > result, I get the comments.  But when I try to do the
>> > same thing (approximately I think) with iterparse,
>> > I don't see any comments.
>>
>> While the comments end up in the tree that iterparse generates, they
>> do not show up in the events. Now that you mention it, I
>> actually think that should change. There should be events
>>  "comment" and "pi" that yield them if requested.
> 
> That would be ideal, from my perspective.  It also seems
> more consistent with the other interfaces (parse, parse target,
> etc)

Implemented on the trunk, will be in lxml 2.1.


>> Have you tried the parser target interface?
> I am having trouble getting it to work.  Specifically, the test
> code below produces the output I expected when run with
> cElementTree, but with lxml, it is missing "end" callbacks,
> the second "start(entry) " callback, and the resolved entity
> text.  Am I doing something wrong?
> 
> Test code:
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> #import xml.etree.cElementTree as ET
> import lxml.etree as ET
> from cStringIO import StringIO
> 
> # XML data...
> #=============================================
> xmltxt = \
> '''<?xml version="1.0" encoding="UTF-8"?>
> <!-- Rev 1.06
> -->
> <!DOCTYPE Test [
> <!ELEMENT Test (entry*)>
> <!ELEMENT entry (#PCDATA)>
>     <!-- Description of <entry> element.
>     -->
> <!ENTITY ex "an existential entity">
> ]>
> <!-- File created: 2008-02-27 -->
> <Test>
> <!--  Chronosynclastic Infindibulum Listing -->
> <entry>text 1 is &ex;</entry>
> <!-- Deleted:  A1500477 -->
> <entry>text 2</entry>
> </Test>'''
> #=============================================
> 
> print '\nTargetParser:\n-------------'
> 
> try:                   XMLParser = ET.XMLParser
> except AttributeError: XMLParser = ET.XMLTreeBuilder
> 
> class EchoTarget:
>    def comment(self, tag):
>        print "comment", tag
>    def start(self, tag, attrib):
>        print "start", tag, attrib
>    def end(self, tag):
>        print "end", tag
>    def data(self, data):
>        print "data", repr(data)
>    def close(self):
>        print "close"
>        return "closed!"
> 
> parser = XMLParser( target = EchoTarget())
> result = ET.parse( StringIO (xmltxt), parser)

I can reproduce that. Seems to require an entity reference in the data,
though. I'll look into it.

Stefan


More information about the XML-SIG mailing list