BeautifulSoup doesn't work with a threaded input queue?
Peter Otten
__peter__ at web.de
Sun Aug 27 14:54:58 EDT 2017
Christopher Reimer via Python-list wrote:
> Greetings,
>
> I have Python 3.6 script on Windows to scrape comment history from a
> website. It's currently set up this way:
>
> Requestor (threads) -> list -> Parser (threads) -> queue -> CVSWriter
> (single thread)
>
> It takes 15 minutes to process ~11,000 comments.
>
> When I replaced the list with a queue between the Requestor and Parser
> to speed up things, BeautifulSoup stopped working.
>
> When I changed BeautifulSoup(contents, "lxml") to
> BeautifulSoup(contents), I get the UserWarning that no parser wasn't
> explicitly set and a reference to line 80 in threading.py (which puts it
> in the RLock factory function).
>
> When I switched back to using list between the Requestor and Parser, the
> Parser worked again.
>
> BeautifulSoup doesn't work with a threaded input queue?
The documentation
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup
says you can make the BeautifulSoup object from a string or file.
Can you give a few more details where the queue comes into play? A small
code sample would be ideal...
More information about the Python-list
mailing list