Raw Data from Website

Steven D'Aprano steve+comp.lang.python at pearwood.info
Tue Aug 23 03:14:39 EDT 2016


On Tuesday 23 August 2016 10:28, adam.j.kerr at gmail.com wrote:

> Hi,
> 
> I am hoping someone is able to help me.
> 
> Is there a way to pull as much raw data from a website as possible. The
> webpage that I am looking for is as follows:
> 
http://www.homepriceguide.com.au/Research/ResearchSeeFullList.aspx?LocationType=LGA&State=QLD&LgaID=632
> 
> The main variable that is important is the "632" at the end, by adjusting
> this it changes the postcodes. Each postcode contains a large amount of data.
> Is there a way this all able to be exported into an excel document?

Ideally, the web site itself will offer an Excel download option. If it 
doesn't, you may be able to screen-scrape the data yourself, but:

(1) it may be against the terms of service of the website;
(2) it may be considered unethical or possibly even copyright 
    infringement or (worst case) even illegal;
(3) especially if you're thinking of selling the data;
(4) at the very least, unless you take care not to abuse the service, 
    it may be rude and the website may even block your access.

There are many tutorials and examples of "screen scraping" or "web scraping" on 
the internet -- try reading them. It's not something I personally have any 
experience with, but I expect that the process goes something like this:

- connect to the website;
- download the particular page you want;
- grab the data that you care about;
- remove HTML tags and extract just the bits needed;
- write them to a CSV file.


You may find the Beautiful Soup third-party library helpful for this.



-- 
Steve




More information about the Python-list mailing list