Raw Data from Website

Steven D'Aprano steve+comp.lang.python at pearwood.info
Wed Aug 24 04:18:11 EDT 2016


On Wednesday 24 August 2016 17:04, Bob Martin wrote:

> in 764257 20160823 081439 Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:

>>There are many tutorials and examples of "screen scraping" or "web scraping"
>>on the internet -- try reading them. It's not something I personally have any
>>experience with, but I expect that the process goes something like this:
>>
>>- connect to the website;
>>- download the particular page you want;
>>- grab the data that you care about;
>>- remove HTML tags and extract just the bits needed;
>>- write them to a CSV file.
> 
> wget does the hard part.


I don't think so. Just downloading a web page is easy. Parsing the potentially 
invalid HTML (or worse, the content is assembled in the browser by Javascript) 
to extract the actual data you care about is much harder.


-- 
Steve




More information about the Python-list mailing list