Financial time series data

Frederic Rentsch anthra.norell at bluewin.ch
Sat Sep 4 05:14:24 EDT 2010


On Fri, 2010-09-03 at 19:58 +0200, Virgil Stokes wrote:
> import urllib2
> import re
> 
> def get_SP500_symbolsX ():
>     symbols = []
>     lsttradestr = re.compile('Last Trade:')
>     k = 0
>     for page in range(10):
>        url = 'http://finance.yahoo.com/q/cp?s=%5EGSPC&c='+str(page)
>        print url
>        f = urllib2.urlopen (url)
>        html = f.readlines ()
>        f.close ()
>        for line in html:
>       if line.lstrip ().startswith ('</script><span id="yfs_params_vcr"'):
>          line_split = line.split (':')
>          s = [item.strip ().upper () for item in line_split [5].replace 
> ('"','').split (',')]
>          for symb in s:
>             url = "http://finance.yahoo.com/q?s="+symb
>             f = urllib2.urlopen(url)
>             html = f.readlines()
>             f.close()
> 
>             for line in html:
>            if lsttradestr.search(line):
>               k += 1
>               print 'k = %3d (%s)' %(k,symb)
>               # Here is where I will extract the numerical values and place
>               # ....
>               #  them in an approrpriate file
>          symbols.extend (s [:-3])
> 
>     return symbols
>     # Not quite 500 -- which is correct (for example p. 2 has only 49 symbols!)
>     # Actually the S&P 500 as shown does not contain 500 stocks (symbols)
> 
> 
> symbols = get_SP500_symbolsX()
> pass
> 
> And thanks for your help Frederic --- Have a good day! :-)
> 
> --V

Good going! You get the idea. 
   Here's my try for a cleaned-up version that makes the best use of the
facility and takes only fifteen seconds to complete (on my machine).
   You may want to look at historical quotes too. Trent Nelson seems to
have a ready-made solution for this.

---------------------------

import urllib2
import re

def get_current_SP500_quotes_from_Yahoo ():

    symbol_reader = re.compile ('([a-z-.]+,)+[a-z-.]+')
    # Make sure you include all characters that may show up in symbols,

    csv_data = ''

    for page in range (10):

       url = 'http://finance.yahoo.com/q/cp?s=%5EGSPC&c=' + str (page)
       print url
       f = urllib2.urlopen (url)
       html = f.readlines ()
       f.close ()

       for line in html:

          if line.lstrip ().startswith ('</script><span
id="yfs_params_vcr"'):
             symbols = symbol_reader.search (line).group ()
             ## symbols = line.split (':')[5][2:-18]
             ## ^ This is an alternative to the regex. It won't stumble
over 
             ## unexpected characters in symbols, but depends on the
line 
             ## line format to stay put. 
             # print symbols.count (',') + 1   # Uncomment to check for
<= 50
             url = 'http://download.finance.yahoo.com/d/quotes.csv?s=%
s&f=sl1d1t1c1ohgv&e=.csv' % symbols  # Regex happens to grab symbols
correctly formatted
             # print url
             f = urllib2.urlopen (url)
             csv_data += f.read ()
             f.close ()

             break

    return csv_data
       
---------------------------

Here is what you get:

"A",29.85,"9/3/2010","4:01pm",+0.64,29.49,29.99,29.49,2263815
"AA",10.88,"9/3/2010","4:00pm",+0.05,11.01,11.07,10.82,16634520
"AEE",28.65,"9/3/2010","4:01pm",+0.19,28.79,28.79,28.46,3029885
... 494 lines in all (today) 

Symbol, Current or close, Date, Time, Change, Open, High, Low, Volume


---------------------------

Good luck to you in the footsteps of Warren Buffet!

Frederic





More information about the Python-list mailing list