BeautifulSoup doesn't work with a threaded input queue?

Paul Rubin no.email at nospam.invalid
Sun Aug 27 17:23:58 EDT 2017


Christopher Reimer <christopher_reimer at yahoo.com> writes:
> I have 20 read_threads requesting and putting pages into the output
> queue that is the input_queue for the parser. 

Given how slow parsing is, you probably want to scrap the pages into
disk files, and then run the parser in parallel processes that read from
the disk.  You could also use something like Redis (redis.io) as a queue.



More information about the Python-list mailing list