[Expat-bugs] [ expat-Bugs-1828723 ] parser cannot handle a comment containing a double dash

SourceForge.net noreply at sourceforge.net
Wed May 14 05:10:55 CEST 2008


Bugs item #1828723, was opened at 2007-11-08 21:40
Message generated for change (Comment added) made by fdrake
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=1828723&group_id=10127

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
Resolution: Rejected
Priority: 5
Private: No
Submitted By: Ryan (rad_rb)
Assigned to: Nobody/Anonymous (nobody)
Summary: parser cannot handle a comment containing a double dash

Initial Comment:
when attempting to parse a comment containing a double dash, the parser throws an exception stating that the double dash is not well formed:

import xml.parsers.expat
p = xml.parsers.expat.ParserCreate()
p.Parse('<!-- -- -->')

Traceback (most recent call last):
  File "test-expat.py", line 3, in ?
    p.Parse('<!-- -- -->')
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 7

I'm using version 1.1, which shipped with python 2.4

----------------------------------------------------------------------

>Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2008-05-13 23:10

Message:
Logged In: YES 
user_id=3066
Originator: NO

Marking this closed since it has been rejected.

There is no way to encode "--" inside a comment such that "--" appears in
the comment content reported by Expat.  If you're relying on being able to
get "--" from a comment, there's something wrong with your data model
(unhappily; I realize dealing with legacy data can be a pain).  If
possible, consider using a processing instruction, which can contain just
about anything, through a pseudo-attribute syntax is often used by
convention.

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2007-11-09 13:41

Message:
Logged In: NO 

wow. OK, I guess expat has the right behaviour.

I want to log a bug on the XML spec though. that's just so wrong... :P

is there some alternate character sequence that would allow us to encode
the data we want, that expat will return as "--"?

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2007-11-08 23:04

Message:
Logged In: YES 
user_id=290026
Originator: NO

This is not allowed - Expat is correct.
See http://www.w3.org/TR/2006/REC-xml-20060816/#sec-comments.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=1828723&group_id=10127


More information about the Expat-bugs mailing list