[Patches] [ python-Patches-1087808 ] sgmllib.SGMLParser does not
unescape attribute values; patch
SourceForge.net
noreply at sourceforge.net
Wed Dec 22 14:16:58 CET 2004
Patches item #1087808, was opened at 2004-12-19 06:42
Message generated for change (Comment added) made by loewis
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1087808&group_id=5470
Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 7
Submitted By: Titus Brown (titus)
Assigned to: Martin v. Löwis (loewis)
Summary: sgmllib.SGMLParser does not unescape attribute values; patch
Initial Comment:
for example, in a form parsed by htmllib.HTMLParser (based on
SGMLParser),
<option value="5" big"> 5" big
the value will not be unescaped to 5" while the printed
option will be.
Note that this behavior differs from that of
HTMLParser.HTMLParser, which does a fine job.
See attached test script/test input for an example.
See attached patch for a fix: essentially I copied the code directly
from HTMLParser.HTMLParser.
I don't think this patch should break anything; I can't imagine
people were relying on this behavior!
----------------------------------------------------------------------
>Comment By: Martin v. Löwis (loewis)
Date: 2004-12-22 14:16
Message:
Logged In: YES
user_id=21627
Not necessarily: self.entityrefs is already updated in
HTMLParser, so this should work for the base class.
One issue might be with encodings, e.g. if the document
encoding is not Latin-1. In this case, one might not want to
replace ä with its Latin-1 equivalent, so you might need
to provide a hook where a subclass can chose not to perform
entity expansion, or perform more of it (perhaps also with a
possibility to perform character reference expansion).
----------------------------------------------------------------------
Comment By: Titus Brown (titus)
Date: 2004-12-22 09:32
Message:
Logged In: YES
user_id=23486
I'm happy to do so -- note that this will expand the patch to include
HTMLParser.py, as well.
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2004-12-22 08:57
Message:
Logged In: YES
user_id=21627
Please use unified or context diffs when submitting patches.
The patch is incorrect: Instead of hard-coding the list of
known entities, sgmllib should use self.entitydefs to
determine the set of entity names that are supported. As a
result, the algorithm should also replace, say, ä if it
occurs in an HTML attribute.
Then the question is what should happen on unknown entity
references. One cannot really call unknown_entityref, since
implementations of that will expect that the entity
reference was in content, not in an attribute. So it would
probably best to leave unknown entity references alone.
Notice that in SGML (and HTML) the semicolon after the
entity name is not mandatory, but can be omitted if the
entity name is not followed by a letter or digit. So you
probably should use the regular expression entityref to find
references.
Please also provide a documentation patch that explains
precisely how the attribute value is created from what is in
the input document (i.e. some entity references replaced, no
character references replaced, etc).
----------------------------------------------------------------------
Comment By: Titus Brown (titus)
Date: 2004-12-22 08:15
Message:
Logged In: YES
user_id=23486
oh, and here's the example.
----------------------------------------------------------------------
Comment By: Titus Brown (titus)
Date: 2004-12-22 08:13
Message:
Logged In: YES
user_id=23486
Whoops. Dangitall.
Also at
http://issola.caltech.edu/~t/transfer/sgmllib.diff
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2004-12-22 08:06
Message:
Logged In: YES
user_id=21627
There's no uploaded file! You have to check the
checkbox labeled "Check to Upload & Attach File"
when you upload a file. In addition, even if you
*did* check this checkbox, a bug in SourceForge
prevents attaching a file when *creating* an issue.
Please try again.
(This is a SourceForge annoyance that we can do
nothing about. :-( )
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1087808&group_id=5470
More information about the Patches
mailing list