Repeatedly crawl website every 1 min

Iuri iurisilvio at gmail.com
Thu May 11 05:27:21 EDT 2017


Unless you are authorized, don't do it. It literally costs a lot of money
to the website you are crawling, in CPU and bandwidth.

Hundreds of concurrent requests can even kill a small server (with bad
configuration).

Look scrapy package, it is great for scraping, but be friendly with the
websites you are crawling.

Em 10 de mai de 2017 23:22, <liyucun2012 at gmail.com> escreveu:

> Hi Everyone,
>
> Thanks for stoping by. I am working on a feature to crawl website content
> every 1 min. I am curious to know if there any good open source project for
> this specific scenario.
>
> Specifically, I have many urls, and I want to maintain a thread pool so
> that each thread will repeatedly crawl content from the given url. It could
> be a hundreds thread at the same time.
>
> Your help is greatly appreciated.
>
> ;)
> --
> https://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list