[Python-bugs-list] [ python-Bugs-526390 ] Incomplete list of escape sequences
noreply@sourceforge.net
noreply@sourceforge.net
Mon, 17 Jun 2002 08:25:31 -0700
Bugs item #526390, was opened at 2002-03-06 08:55
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=526390&group_id=5470
Category: XML
>Group: Not a Bug
>Status: Closed
>Resolution: Invalid
Priority: 5
Submitted By: Mark Carter (comcol)
Assigned to: Nobody/Anonymous (nobody)
Summary: Incomplete list of escape sequences
Initial Comment:
There exist some special character tags (i.e.
beginning with &) that cause exceptions in minidom -
and probably in other modeules, too. Example below:
=== PYTHON CODE ===
import xml.dom.minidom
def do(text):
print "Processing: ", text
dom = xml.dom.minidom.parseString(text)
print "... ok"
do("<body> this is ok </body>") #ok
do("<body> < > & " </body>") #ok
do("<body> £ </body>") # error
=== STDOUT ===
Processing: <body> this is ok </body>
... ok
Processing: <body> < > & " </body>
... ok
Processing: <body> £ </body>
=== STDERR ===
Traceback (most recent call last):
File "err.py", line 10, in ?
do("<body> £ </body>") # exception
File "err.py", line 5, in do
dom = xml.dom.minidom.parseString(text)
File "C:\PYTHON22\lib\xml\dom\minidom.py", line 965,
in parseString
return _doparse(pulldom.parseString, args, kwargs)
File "C:\PYTHON22\lib\xml\dom\minidom.py", line 952,
in _doparse
toktype, rootNode = events.getEvent()
File "C:\PYTHON22\lib\xml\dom\pulldom.py", line 255,
in getEvent
self.parser.feed(buf)
File "C:\PYTHON22\lib\xml\sax\expatreader.py", line
111, in feed
self._err_handler.fatalError(exc)
File "C:\PYTHON22\lib\xml\sax\handler.py", line 38,
in fatalError
raise exception
xml.sax._exceptions.SAXParseException: <unknown>:1:7:
undefined entity
=== COMMENTS ===
It is also my observation that special character tags
(aka HTML escape
sequences) translations are scattered "hither and
thither" throughout
modules
in the XML subdirectory, and that it would be better
if they were all
put in one place.
I might be persuaded to help with the maintenance work
that this
would require!
----------------------------------------------------------------------
>Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-06-17 11:25
Message:
Logged In: YES
user_id=3066
"£" is not defined in the XML spec., as Martin points out.
The comment about "special character tags" at the end of the
initial report is not clear; if you still feel that there is
a problem there, please open a new bug report and be
specific. Examples would be helpful.
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2002-03-06 16:58
Message:
Logged In: YES
user_id=21627
I cannot understand the problem. The parser rightfully
complains about £ - this is not one of the predefined
entities of XML. Please refer to the XML spec; only amp, lt,
and gt are predefined in XML. Everything else should and
does produce an error.
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=526390&group_id=5470