[New-bugs-announce] [issue8319] HTMLparser does not handle call to handle_data when a tag contains nor data.

Winfried Plappert report at bugs.python.org
Mon Apr 5 20:08:55 CEST 2010


New submission from Winfried Plappert <Winfried.Plappert at gmail.com>:

When parsing HTML and having a string along the lines of <td></td>, a call to handle_data is not issued between handle_starttag and handle_endtag, but afterwards. The problem is in HTMLparser.goahead, where the position i and j are calculated. The code reads
if i < j: self.handle_data(rawdata[i:j]) but it should be
if i <= j: self.handle_data(rawdata[i:j])

If there is data between <td> and </td>, everything works fine.

I just checked the trunk of 2.6, this occurs in line 142 of Lib/HTMLParser.py. The size of HTMLParser.py is 13407 bytes, and is dated 'Feb 26 19:25'.

----------
components: Library (Lib)
messages: 102392
nosy: wplappert
severity: normal
status: open
title: HTMLparser does not handle call to handle_data when a tag contains nor data.
type: behavior
versions: Python 2.6

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8319>
_______________________________________


More information about the New-bugs-announce mailing list