[XML-SIG] [ pyxml-Bugs-1165107 ] sgmlop drops trailing partial tokens

Thu Mar 17 11:25:12 CET 2005

Bugs item #1165107, was opened at 2005-03-17 11:25
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=106473&aid=1165107&group_id=6473

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Magnus Lie Hetland (mlh)
Assigned to: Nobody/Anonymous (nobody)
Summary: sgmlop drops trailing partial tokens

Initial Comment:
Partial entities in the middle of the text are
(appropriately) reported as text by sgmlop. However, if
the partial entity is placed at the end of the text, it
isn't reported. This behavior would be understandble
when using the feed method alone, but it also occurs
with the parse method (which closes the parser after
the feed), and that is unfortunate. It means (as far as
I can see) that the tail of the input is simply
ignored. One especially bad example is if the input
contains -- or even begins with -- a stray '<'
character, without later containing a '>' character.
Then everything from that point on is ignored.

The following snippet demonstrates the problem:

from xml.parsers.sgmlop import SGMLParser, XMLParser,
XMLUnicodeParser

class Handler:

    def handle_data(self, data):
        print 'Data:', repr(data)

for text in ['&lt', '&#123', '<foo bar < " ', '</foo',
             '&lt ', '&#123 ', 'frozz <foo bar < " ',
'bar </foo']:
    for parser in [SGMLParser(), XMLParser(),
XMLUnicodeParser()]:
        parser.register(Handler())
        print '%s with %s:' % (repr(parser), repr(text))
        parser.parse(text)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=106473&aid=1165107&group_id=6473