[ python-Bugs-1548288 ] sgmllib.sgmlparser is not thread safe
SourceForge.net
noreply at sourceforge.net
Sun Sep 3 21:02:36 CEST 2006
Bugs item #1548288, was opened at 2006-08-28 19:32
Message generated for change (Comment added) made by josiahcarlson
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1548288&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Andres Riancho (andresriancho)
Assigned to: Nobody/Anonymous (nobody)
Summary: sgmllib.sgmlparser is not thread safe
Initial Comment:
Python version:
===============
dz0 at fre3ak:~$ python
Python 2.4.3 (#2, Apr 27 2006, 14:43:58)
[GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
Problem description:
====================
sgmlparser is not thread safe, i discovered this
problem when trying to fetch and parse many html files
at the same time.
An example of this bug can be found attached.
The sgmlparser input html is this string:
'<html></html>'*100 , this was written this way to
simplify the code, please note that if you replace this
string with a "large" html document, it will also fail.
solution:
=========
make the lib thread safe, or add some lines to the
documentation saying that it aint thread safe.
Traceback:
==========
python sgml-not-threadSafe.py
Started all threads
Successfully parsed html
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python2.4/threading.py", line 442, in
__bootstrap
self.run()
File "/usr/lib/python2.4/threading.py", line 422, in run
self.__target(*self.__args, **self.__kwargs)
File "sgml-not-threadSafe.py", line 10, in parseHtml
self._parser.feed( html )
File "/usr/lib/python2.4/sgmllib.py", line 95, in feed
self.goahead(0)
File "/usr/lib/python2.4/sgmllib.py", line 129, in
goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.4/sgmllib.py", line 262, in
parse_starttag
self.error('unexpected call to parse_starttag')
File "/usr/lib/python2.4/sgmllib.py", line 102, in error
raise SGMLParseError(message)
SGMLParseError: unexpected call to parse_starttag
Successfully parsed html
Successfully parsed html
Additional note
===============
To recreate this bug, you should run the sample code
more than one time. Thread handling aint always the
same, the issue is there but sometimes it fails to
appear on the first (second, third...) run.
----------------------------------------------------------------------
Comment By: Josiah Carlson (josiahcarlson)
Date: 2006-09-03 12:02
Message:
Logged In: YES
user_id=341410
The sgmllib makes no claims as to thread safety, which
implies that it is generally not sharable between threads.
You can work around this issue by creating a new parser
instance for each thread that you want to parse.
Suggested close as "Wont Fix".
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1548288&group_id=5470
More information about the Python-bugs-list
mailing list