[New-bugs-announce] [issue30011] HTMLParser class is not thread safe

Alessandro Vesely report at bugs.python.org
Fri Apr 7 04:26:37 EDT 2017


New submission from Alessandro Vesely:

SYMPTOM:
When used in a multithreaded program, instances of a class derived from HTMLParser may convert an entity or leave it alone, in an apparently random fashion.

CAUSE:
The class has a static attribute, entitydefs, which, on first use, is initialized from None to a dictionary of entity definitions.  Initialization is not atomic.  Therefore, instances in concurrent threads assume that initialization is complete and catch a KeyError if the entity at hand hasn't been set yet.  In that case, the entity is left alone as if it were invalid.

WORKAROUND:
class Dummy(HTMLParser):
	"""this class is defined here so that we can initialize its base class"""
	def __init__(self):
		HTMLParser.__init__(self)

# Initialize HTMLParser by loading htmlentitydefs
dummy = Dummy()
dummy.feed('<a href="&">')
del dummy, Dummy

----------
components: Library (Lib)
messages: 291256
nosy: ale2017
priority: normal
severity: normal
status: open
title: HTMLParser class is not thread safe
type: behavior
versions: Python 2.7

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue30011>
_______________________________________


More information about the New-bugs-announce mailing list