[XML-SIG] Processing instructions in sgmlop

"Walter Dörwald" walter@livinglogic.de
Mon, 28 Aug 2000 15:33:18 +0200


Hello all!

I think I discovered another bug in sgmlop. If I understood the 
XML standard (http://www.w3.org/TR/2000/WD-xml-2e-20000814#sec-pi) 
correctly, a processing instruction terminates with the next 
occurence of '?>':

[16] PI       ::=    '<?' PITarget (S ( Char* - (Char* '?>' Char*)))? '?>' 
[17] PITarget ::=     Name - (('X' | 'x') ('M' | 'm') ('L' | 'l')) 
[2]  Char     ::=    #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]  

this would mean that the pi data may contain literal '>' characters.

But sgmlop seems to end the pi at the next occurrence of '>':

#!/usr/bin/env python

from xml.parsers import sgmlop

class Handler:
   def handle_proc(self,target,data):
      print "pi", target, data
   def handle_data(self,data):
      print "data", data

parser = sgmlop.XMLParser()
parser.register(Handler())
parser.parse('<?echo $foo->bar?>')

The output from this short test is:

> python test.py
pi echo $foo-
data bar?>

Bye,
   Walter Dörwald

-- 
Walter Dörwald · LivingLogic AG · Bayreuth, Germany · www.livinglogic.de