difference between urllib2.urlopen and firefox view 'page source'?

wesley chun wescpy at gmail.com
Wed Mar 21 02:22:47 EDT 2007


On Mar 20, 8:33 am, kyoso... at gmail.com wrote:
> On Mar 20, 1:56 am, Tina I <tina... at bestemselv.com> wrote:
> > > I am trying to screen scrape some stock data from yahoo, so I am
> > > trying to use urllib2 to retrieve the html and beautiful soup for the
> > > parsing.
>
> You can do this fairly easily. I found a similar program in the book Core
> PythonProgramming. It actually sticks the stocks into an Excel
> spreadsheet.


i'd like to add that the solution that mike proposes from the book is
an *alternative* to what the OP wanted, which was to parse the actual
stock quote web page.  instead of doing that, the code snippet
actually uses Yahoo!'s CSV interface which you can read more about
from their help pages:

http://search.cc.yahoo.com/search?property=finance&question_box=csv

if the data is all that's important to you, then this is a good proxy
for what you proposed, and will be simpler to implement.  however, if
you're looking for a screen-scraping and HTML-parsing exercise, i'd
stick with your original idea and use the generic output that you get.
 as a previous poster has already mentioned, it's probably the
"cleanest" output, filtering out some of the extra browser-specific JS
and stuff that you don't need.

cheers,
-wesley

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
"Core Python Programming", Prentice Hall, (c)2007,2001
    http://corepython.com

wesley.j.chun :: wescpy-at-gmail.com
python training and technical consulting
cyberweb.consulting : silicon valley, ca
http://cyberwebconsulting.com



More information about the Python-list mailing list