[Tutor] Question about scraping

ALAN GAULD alan.gauld at btinternet.com
Fri May 30 22:21:27 CEST 2014


On Fri, May 30, 2014 at 7:20 PM, Alan Gauld <alan.gauld at btinternet.com> wrote:
>
>
>> If a site offers an API that returns the data you need then use it,
>> If not you have few alternatives to scraping (although scraping
>> may be 'illegal' anyway due to the impact on other users). But scraping,
>> whether a web page or a GUI or an old mainframe terminal
>> is always a fragile and unsatisfactory solution.
>
>Okay I think learning how to scrap (library or framework) is not worth
>the trouble. Especially if some people consider it illegal. Thanks for
>the input.
>
>
>As I say, sometimes you have no choice but to scrape.
Its only 'illegal' if the site owner says so, in other words if their terms of use 
prohibit web scraping. If they have gone to the effort (and cost)  of providing 
an API then it probably means scraping is prohibited. But many (most!) sites
don't offer APIs  and most smaller sites don't prohibit scraping, so it is still a 
valid technique. But before you try its always worth checking whether an API 
exists and whether scraping is permitted.

And by 'illegal' I mean you are unlikely to be prosecuted in a court but 
you are likely to find your IP address and/or account closed. The systems 
generally monitor activity and if an account is navigating through pages 
too quickly to be a human they often close the account down.

Alan g.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20140530/dfdccf42/attachment.html>


More information about the Tutor mailing list