[New-bugs-announce] [issue30011] HTMLParser class is not thread safe
Alessandro Vesely
report at bugs.python.org
Fri Apr 7 04:26:37 EDT 2017
New submission from Alessandro Vesely:
SYMPTOM:
When used in a multithreaded program, instances of a class derived from HTMLParser may convert an entity or leave it alone, in an apparently random fashion.
CAUSE:
The class has a static attribute, entitydefs, which, on first use, is initialized from None to a dictionary of entity definitions. Initialization is not atomic. Therefore, instances in concurrent threads assume that initialization is complete and catch a KeyError if the entity at hand hasn't been set yet. In that case, the entity is left alone as if it were invalid.
WORKAROUND:
class Dummy(HTMLParser):
"""this class is defined here so that we can initialize its base class"""
def __init__(self):
HTMLParser.__init__(self)
# Initialize HTMLParser by loading htmlentitydefs
dummy = Dummy()
dummy.feed('<a href="&">')
del dummy, Dummy
----------
components: Library (Lib)
messages: 291256
nosy: ale2017
priority: normal
severity: normal
status: open
title: HTMLParser class is not thread safe
type: behavior
versions: Python 2.7
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue30011>
_______________________________________
More information about the New-bugs-announce
mailing list