Raw Data from Website

Bob Martin bob.martin at excite.com
Wed Aug 24 08:04:44 EDT 2016


in 764257 20160823 081439 Steven D'Aprano <steve+comp.lang.python at pearwood.info> wrote:
>On Tuesday 23 August 2016 10:28, adam.j.kerr at gmail.com wrote:
>
>> Hi,
>>
>> I am hoping someone is able to help me.
>>
>> Is there a way to pull as much raw data from a website as possible. The
>> webpage that I am looking for is as follows:
>>
>http://www.homepriceguide.com.au/Research/ResearchSeeFullList.aspx?LocationType=LGA&State=QLD&LgaID=
>632
>>
>> The main variable that is important is the "632" at the end, by adjusting
>> this it changes the postcodes. Each postcode contains a large amount of data.
>> Is there a way this all able to be exported into an excel document?
>
>Ideally, the web site itself will offer an Excel download option. If it
>doesn't, you may be able to screen-scrape the data yourself, but:
>
>(1) it may be against the terms of service of the website;
>(2) it may be considered unethical or possibly even copyright
>infringement or (worst case) even illegal;
>(3) especially if you're thinking of selling the data;
>(4) at the very least, unless you take care not to abuse the service,
>it may be rude and the website may even block your access.
>
>There are many tutorials and examples of "screen scraping" or "web scraping" on
>the internet -- try reading them. It's not something I personally have any
>experience with, but I expect that the process goes something like this:
>
>- connect to the website;
>- download the particular page you want;
>- grab the data that you care about;
>- remove HTML tags and extract just the bits needed;
>- write them to a CSV file.

wget does the hard part.



More information about the Python-list mailing list