[Patches] [Patch #103202] fix bug parsing nested tags with htmllib

noreply@sourceforge.net noreply@sourceforge.net
Thu, 11 Jan 2001 13:44:16 -0800


Patch #103202 has been updated. 

Project: python
Category: library
Status: Rejected
Submitted by: fbremmer
Assigned to : gvanrossum
Summary: fix bug parsing nested tags with htmllib

Follow-Ups:

Date: 2001-Jan-11 13:44
By: gvanrossum

Comment:
Rejected.

The savedata mechanism is intended only for things like <title>; it
shouldn't be used for tags that may be nested.  Your example program is
invalid.  Your patch would cause all data to be saved all the time, even
when no tag is interested in saving data.

-------------------------------------------------------

Date: 2001-Jan-11 12:15
By: fbremmer

Comment:
<pre>
#! /usr/bin/env python
import htmllib, formatter
  
class MyParser(htmllib.HTMLParser):
    def __init__(self, formatterObject):
        htmllib.HTMLParser.__init__(self, formatterObject)
        self.text = ''
    def start_tag(self, attributes):
        self.save_bgn()
    def end_tag(self):
        self.text = self.save_end()

html = """<tag><tag></tag></tag>"""
parser=MyParser(formatter.NullFormatter())

parser.nofill = 1
parser.feed(html)
parser.close()
print parser.text  # prints None instead of nothing

parser.nofill = 0
parser.feed(html)
parser.close()
print parser.text  # raises exception trying to call None.split()
</pre>
-------------------------------------------------------

-------------------------------------------------------
For more info, visit:

http://sourceforge.net/patch/?func=detailpatch&patch_id=103202&group_id=5470