BeautifulSoup doesn't work with a threaded input queue?

Christopher Reimer christopher_reimer at yahoo.com
Sun Aug 27 13:23:07 EDT 2017


Greetings,

I have Python 3.6 script on Windows to scrape comment history from a 
website. It's currently set up this way:

Requestor (threads) -> list -> Parser (threads) -> queue -> CVSWriter 
(single thread)

It takes 15 minutes to process ~11,000 comments.

When I replaced the list with a queue between the Requestor and Parser 
to speed up things, BeautifulSoup stopped working.

When I changed BeautifulSoup(contents, "lxml") to 
BeautifulSoup(contents), I get the UserWarning that no parser wasn't 
explicitly set and a reference to line 80 in threading.py (which puts it 
in the RLock factory function).

When I switched back to using list between the Requestor and Parser, the 
Parser worked again.

BeautifulSoup doesn't work with a threaded input queue?

Thank you,

Chris Reimer




More information about the Python-list mailing list