[Python-bugs-list] [ python-Bugs-526390 ] Incomplete list of escape sequences

noreply@sourceforge.net noreply@sourceforge.net
Mon, 17 Jun 2002 08:25:31 -0700


Bugs item #526390, was opened at 2002-03-06 08:55
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=526390&group_id=5470

Category: XML
>Group: Not a Bug
>Status: Closed
>Resolution: Invalid
Priority: 5
Submitted By: Mark Carter (comcol)
Assigned to: Nobody/Anonymous (nobody)
Summary: Incomplete list of escape sequences

Initial Comment:
There exist some special character tags (i.e. 
beginning with &) that cause exceptions in minidom - 
and probably in other modeules, too. Example below:

=== PYTHON CODE ===
import xml.dom.minidom 

def do(text):
	print "Processing: ", text
	dom = xml.dom.minidom.parseString(text)
	print "... ok"

do("<body> this is ok </body>") #ok
do("<body> &lt; &gt; &amp; &quot; </body>") #ok
do("<body> &pound;  </body>") # error


=== STDOUT ===
Processing:  <body> this is ok </body>
... ok
Processing:  <body> &lt; &gt; &amp; &quot; </body>
... ok
Processing:  <body> &pound;  </body>

=== STDERR ===
Traceback (most recent call last):
  File "err.py", line 10, in ?
    do("<body> &pound;  </body>") # exception
  File "err.py", line 5, in do
    dom = xml.dom.minidom.parseString(text)
  File "C:\PYTHON22\lib\xml\dom\minidom.py", line 965, 
in parseString
    return _doparse(pulldom.parseString, args, kwargs)
  File "C:\PYTHON22\lib\xml\dom\minidom.py", line 952, 
in _doparse
    toktype, rootNode = events.getEvent()
  File "C:\PYTHON22\lib\xml\dom\pulldom.py", line 255, 
in getEvent
    self.parser.feed(buf)
  File "C:\PYTHON22\lib\xml\sax\expatreader.py", line 
111, in feed
    self._err_handler.fatalError(exc)
  File "C:\PYTHON22\lib\xml\sax\handler.py", line 38, 
in fatalError
    raise exception
xml.sax._exceptions.SAXParseException: <unknown>:1:7: 
undefined entity

=== COMMENTS ===
It is also my observation that special character tags 
(aka HTML escape 
sequences) translations are scattered "hither and 
thither" throughout 
modules
in the XML subdirectory, and that it would be better 
if they were all
put in one place.

I might be persuaded to help with the maintenance work 
that this 
would require!

----------------------------------------------------------------------

>Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-06-17 11:25

Message:
Logged In: YES 
user_id=3066

"&pound;" is not defined in the XML spec., as Martin points out.

The comment about "special character tags" at the end of the
initial report is not clear; if you still feel that there is
a problem there, please open a new bug report and be
specific.  Examples would be helpful.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-06 16:58

Message:
Logged In: YES 
user_id=21627

I cannot understand the problem. The parser rightfully
complains about &pound; - this is not one of the predefined
entities of XML. Please refer to the XML spec; only amp, lt,
and gt are predefined in XML. Everything else should and
does produce an error.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=526390&group_id=5470