[Patches] [ python-Patches-1486713 ] HTMLParser : A auto-tolerant parsing mode

SourceForge.net noreply at sourceforge.net
Tue May 23 17:11:31 CEST 2006


Patches item #1486713, was opened at 2006-05-11 19:19
Message generated for change (Comment added) made by kxroberto
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1486713&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
>Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: kxroberto (kxroberto)
Assigned to: Nobody/Anonymous (nobody)
Summary: HTMLParser : A auto-tolerant parsing mode

Initial Comment:
Changes:

* Now allows missing spaces between attributes as its
often seen on the web like this :

<script type="text/javascript"language="JavaScript1.1">

That like broke the whole parsing before.


* A fully auto-tolerant mode (HTMLParser.tolerant=1)
was added. It should hopefully NEVER break HTML parsing
on the level of HTMLParser, but recover and continue
the parsing smartly. The mode was tested extensively
with complex pages. The tolerant mode is guaranted to
finish all HTML stuff only during HTMLParser.close() /
goahead(end=True)  - yet that was the same (stucking)
policy before.
Maybe steep: I have  switched ON the tolerant mode by
default, as this is, what in 99.9% of cases one wants
to have.
(I've maybe 20 applications for HTMLParser - None like
the unrecoverable breaks with Exceptions)
During tolerant mode the virtual .warning(message,i,k)
is called instead of error - by default this just
counts .warning_count up. This framework should even
enable to write po HTML checkers

* The patch was generated against py2.3 (still the
"good/base" Python for me) and also fixes a regexp-bug
(which already was fixed in py2.4.2). Yet the patch
works also against py2.4/2.5 - 2 locations where py24
trivially changed to %r/repr may grumble.


-robert


----------------------------------------------------------------------

>Comment By: kxroberto (kxroberto)
Date: 2006-05-23 17:11

Message:
Logged In: YES 
user_id=972995

Python 2.4 version of the patch added.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1486713&group_id=5470


More information about the Patches mailing list