[ python-Bugs-1504676 ] Make sgmllib char and entity references pluggable

SourceForge.net noreply at sourceforge.net
Wed Jun 14 15:02:26 CEST 2006


Bugs item #1504676, was opened at 2006-06-12 06:41
Message generated for change (Comment added) made by rubys
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1504676&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Submitted By: Sam Ruby (rubys)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: Make sgmllib char and entity references pluggable

Initial Comment:
The changes being made to sgmllib in Python 2.5 may
break existing applications.  This patch makes it easy
for subclasses to revert to the old behavior. 
Additionally, it makes it easier to provide new
behaviors, like supporting unicode, hexadecimal
character references, and additional entities.

----------------------------------------------------------------------

>Comment By: Sam Ruby (rubys)
Date: 2006-06-14 09:02

Message:
Logged In: YES 
user_id=141556

Note that the pre-existing code transforms tag data twice.

Ideally, the handing for entities in attributes and data
would be unified.

----------------------------------------------------------------------

Comment By: Sam Ruby (rubys)
Date: 2006-06-14 08:59

Message:
Logged In: YES 
user_id=141556

updated patch with test case.

Note that in the pre-existing code tag data values are
transformed twice -- this should be corrected and ideally
the code for handing references should be unified. 

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2006-06-14 01:14

Message:
Logged In: YES 
user_id=3066

This patch certainly makes the subclass interface nicer; I
like that.  There is a case that it breaks (foolishly not
covered by the existing tests, but clear on reading the
patch that it broke).  I've added the relevant test in this
change:

http://mail.python.org/pipermail/python-checkins/2006-June/053975.html

The problem with the patch is that attribute values are
transformed twice (once for entity refs, once for character
refs), instead of just once, so entity ref expansions can
cause character refs to be located that aren't in the markup.

I'm out of time tonight, but should be able to make this
patch work with the additional tests tomorrow night if sruby
doesn't beat me to it.

Documentation and tests for the subclass interface changes
are still needed as well.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1504676&group_id=5470


More information about the Python-bugs-list mailing list