[New-bugs-announce] [issue23144] html.parser.HTMLParser: setting 'convert_charrefs = True' leads to dropped text
Ross
report at bugs.python.org
Thu Jan 1 19:47:08 CET 2015
New submission from Ross:
If convert_charrefs is set to true the final data section is not return by feed(). It is held until the next tag is encountered.
---
from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
def __init__(self):
HTMLParser.__init__(self, convert_charrefs=True)
self.fed = []
def handle_starttag(self, tag, attrs):
print("Encountered a start tag:", tag)
def handle_endtag(self, tag):
print("Encountered an end tag :", tag)
def handle_data(self, data):
print("Encountered some data :", data)
parser = MyHTMLParser()
parser.feed("foo <a>link</a> bar")
print("")
parser.feed("spam <a>link</a> eggs")
---
gives
Encountered some data : foo
Encountered a start tag: a
Encountered some data : link
Encountered an end tag : a
Encountered some data : barspam
Encountered a start tag: a
Encountered some data : link
Encountered an end tag : a
With 'convert_charrefs = False' it works as expected.
----------
components: Library (Lib)
messages: 233291
nosy: xkjq
priority: normal
severity: normal
status: open
title: html.parser.HTMLParser: setting 'convert_charrefs = True' leads to dropped text
type: behavior
versions: Python 3.4
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue23144>
_______________________________________
More information about the New-bugs-announce
mailing list