BeautifulSoup doesn't work with a threaded input queue?

Peter Otten __peter__ at web.de
Sun Aug 27 14:54:58 EDT 2017


Christopher Reimer via Python-list wrote:

> Greetings,
> 
> I have Python 3.6 script on Windows to scrape comment history from a
> website. It's currently set up this way:
> 
> Requestor (threads) -> list -> Parser (threads) -> queue -> CVSWriter
> (single thread)
> 
> It takes 15 minutes to process ~11,000 comments.
> 
> When I replaced the list with a queue between the Requestor and Parser
> to speed up things, BeautifulSoup stopped working.
> 
> When I changed BeautifulSoup(contents, "lxml") to
> BeautifulSoup(contents), I get the UserWarning that no parser wasn't
> explicitly set and a reference to line 80 in threading.py (which puts it
> in the RLock factory function).
> 
> When I switched back to using list between the Requestor and Parser, the
> Parser worked again.
> 
> BeautifulSoup doesn't work with a threaded input queue?

The documentation

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup

says you can make the BeautifulSoup object from a string or file.
Can you give a few more details where the queue comes into play? A small 
code sample would be ideal...




More information about the Python-list mailing list