Raw Data from Website

Chris Angelico rosuav at gmail.com
Tue Aug 23 03:23:43 EDT 2016


On Tue, Aug 23, 2016 at 5:14 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> There are many tutorials and examples of "screen scraping" or "web scraping" on
> the internet -- try reading them. It's not something I personally have any
> experience with, but I expect that the process goes something like this:
>
> - connect to the website;
> - download the particular page you want;
> - grab the data that you care about;
> - remove HTML tags and extract just the bits needed;
> - write them to a CSV file.

More or less. It's usually more like this:

- import requests and grab the data, nice and easy
- extract some of the info you need
- run into difficulties
- scream in frustration at the stupid inconsistencies in the original site
- mess around with it until your code is nested as deeply as the
site's HTML (minimum 30 levels)
- decide that 90% of the info is good enough
- run the program in production for a month or two, and then discover
that something's been changed and now it doesn't work
- return to step 1, repeat until you run out of hair to pull out

ChrisA



More information about the Python-list mailing list