BeautifulSoup doesn't work with a threaded input queue?

Christopher Reimer christopher_reimer at yahoo.com
Sun Aug 27 17:14:27 EDT 2017


On 8/27/2017 1:31 PM, Peter Otten wrote:

> Here's a simple example that extracts titles from generated html. It seems
> to work. Does it resemble what you do?
Your example is similar to my code when I'm using a list for the input 
to the parser. You have soup_threads and write_threads, but no read_threads.

The particular website I'm scraping requires checking each page for the 
sentinel value (i.e., "Sorry, no more comments") in order to determine 
when to stop requesting pages. For my comment history that's ~750 pages 
to parse ~11,000 comments.

I have 20 read_threads requesting and putting pages into the output 
queue that is the input_queue for the parser. My soup_threads can get 
items from the queue, but BeautifulSoup doesn't do anything after that.

Chris R.



More information about the Python-list mailing list