BeautifulSoup doesn't work with a threaded input queue?

MRAB python at mrabarnett.plus.com
Sun Aug 27 16:12:10 EDT 2017


On 2017-08-27 20:35, Christopher Reimer via Python-list wrote:
> On 8/27/2017 11:54 AM, Peter Otten wrote:
> 
>> The documentation
>>
>> https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup
>>
>> says you can make the BeautifulSoup object from a string or file.
>> Can you give a few more details where the queue comes into play? A small
>> code sample would be ideal.
> 
> A worker thread uses a request object to get the page and puts it into
> queue as page.content (HTML).  Another worker thread gets the
> page.content from the queue to apply BeautifulSoup and nothing happens.
> 
> soup = BeautifulSoup(page_content, 'lxml')
> print(soup)
> 
> No output whatsoever. If I remove 'lxml', I get the UserWarning that no
> parser wasn't explicitly set and get the reference to threading.py at
> line 80.
> 
> I verified that page.content that goes into and out of the queue is the
> same page.content that goes into and out of a list.
> 
> I read somewhere that BeautifulSoup may not be thread-safe. I've never
> had a problem with threads storing the output into a queue. Using a
> queue (random order) instead of a list (sequential order) to feed pages
> for the input is making it wonky.
> 
What do you mean by "queue (random order)"? A queue is sequential order, 
first-in-first-out.



More information about the Python-list mailing list