Suitable Python code to scrape specific details from web pages.

Steven D'Aprano steve+comp.lang.python at pearwood.info
Tue Aug 12 20:04:12 EDT 2014


Simon Evans wrote:

> Dear Programmers, Thank you for your responses. I have installed
> 'Beautiful Soup' and I have the 'Getting Started in Beautiful Soup' book,
> but can't seem to make  any progress with it, I am too thick to make much
> use of it. I was hoping I could scrape specified stuff off Web pages
> without using it.

Yes, you can scrape stuff off web pages without programming. What you do is
you open the web page in your browser, then open a notebook and, with a
pencil or pen, copy the bits you read into the notebook.

If you're very skilled, you can avoid the pencil and paper and type directly
into a text editor on the computer.

But other than that, every website is different, so there is no short-cut to
web scraping. You need to customize the scraping code for each website you
scrape, and that means programming. Do you know how to program? Are you
interested in learning? If the answer is No and No, then I suggestion you
pony up some money and pay somebody who already knows how to program to do
the job for you.

If the answer is No and Yes, then start at the beginning. Do some
programming tutorials, learn to program the basics before moving on to
something moderately difficult like web scraping.

If the answer is that you already know how to program, but just don't know
how to do web scraping, then stick with it and you'll get there. Web
scraping is tricky, but possible, and if you work hard at it you'll
succeed. Unless you're an experienced programmer with all the right skills,
don't expect this to be something you do in a few minutes. Depending on
your level of experience, you could expect to spend dozens of hours to
learn how to scrape a single website. (Fortunately, the second website will
probably be a little easier, and the third easier still. By the time you've
done a dozen, you'll wonder what the fuss was about.) 

By studying how other scraping programs work, and studying how your racing
pages store data, you should be able to put the two together and see how to
get the data you want. There's plenty of information to help you learn how
to web scrape, with or without BeautifulSoup:

https://startpage.com/do/search/?q=beautifulsoup+web+scraping

https://ixquick.com/do/search/?q=python+web+scraping+examples

https://duckduckgo.com/html/?q=requests%20python%20web%20scraping%20example

but no alternative to actually writing code.


> I have installed 'Requests' also, is there any code I 
> can use that you can suggest that can access the sort of Web page values
> that I have referred to ?  such as odds, names of runners, stuff like that
> off the 'inspect element' or 'source' htaml pages, on www.Racingpost.com.

Specifically those pages? Doubtful.

If you are really lucky (1) somebody else has already done the programming,
(2) they've made their program available to others, and (3) you can find
that program on the Internet. Use the search engine of your choice to
search for it.



-- 
Steven




More information about the Python-list mailing list