Financial time series data

Fri Sep 3 09:45:36 EDT 2010

On Fri, 2010-09-03 at 13:29 +0200, Virgil Stokes wrote:
> A more direct question on accessing stock information from Yahoo.
> 
> First, use your browser to go to:  http://finance.yahoo.com/q/cp?s=%
> 5EGSPC+Components
> 
> Now, you see the first 50 rows of a 500 row table of information on
> S&P 500 index. You can LM click on
> 
>   1 -50 of 500 |First|Previous|Next|Last
> 
> below the table to position to any of the 10 pages.
> 
> I would like to use Python to do the following.
> 
> Loop on each of the 10 pages and for each page extract information for
> each row --- How can this be accomplished automatically in Python?
> 
> Let's take the first page (as shown by default). It is easy to see the
> link to the data for "A" is http://finance.yahoo.com/q?s=A. That is, I
> can just move 
> my cursor over the "A" and I see this URL in the message at the bottom
> of my browser (Explorer 8). If I LM click on "A" then I will go to
> this
> link --- Do this!
> 
> You should now see a table which shows information on this stock and
> this is the information that I would like to extract. I would like to
> do this for all 500 stocks without the need to enter the symbols for
> them (e.g. "A", "AA", etc.). It seems clear that this should be
> possible since all the symbols are in the first column of each of the
> 50 tables --- but it is not at all clear how to extract these
> automatically in Python. 
> 
> Hopefully, you understand my problem. Again, I would like Python to
> cycle through these 10 pages and extract this information for each
> symbol in this table.
> 
> --V
> 
> 
> 

Here's a quick hack to get the SP500 symbols from the visual page with
the index letters. From this collection you can then order fifty at a
time from the download facility. (If you get a better idea from Yahoo,
you'll post it of course.)

def get_SP500_symbols ():
	import urllib
	symbols = []
	url = 'http://finance.yahoo.com/q/cp?s=^GSPC&alpha=%c'
	for c in [chr(n) for n in range (ord ('A'), ord ('Z') + 1)]:				
		print url % c
		f = urllib.urlopen (url % c)
		html = f.readlines ()
		f.close ()
		for line in html:
			if line.lstrip ().startswith ('</script><span id="yfs_params_vcr"'):
				line_split = line.split (':')
				s = [item.strip ().upper () for item in line_split [5].replace ('"',
'').split (',')]
			 	symbols.extend (s [:-3])

	return symbols 
	# Not quite 500 (!?)

Frederic