Download excel file from web?

patf at well.com patf at well.com
Mon Jul 28 18:33:29 EDT 2008


On Jul 28, 3:29 pm, "Diez B. Roggisch" <de... at nospam.web.de> wrote:
> p... at well.com schrieb:
>
>
>
> > On Jul 28, 3:00 pm, "p... at well.com" <p... at well.com> wrote:
> >> Hi - experienced programmer but this is my first Python program.
>
> >> This URL will retrieve an excel spreadsheet containing (that day's)
> >> msci stock index returns.
>
> >>http://www.mscibarra.com/webapp/indexperf/excel?priceLevel=0&scope=0&...
>
> >> Want to write python to download and save the file.
>
> >> So far I've arrived at this:
>
> >> [quote]
> >> # import pdb
> >> import urllib2
> >> from win32com.client import Dispatch
>
> >> xlApp = Dispatch("Excel.Application")
>
> >> # test 1
> >> # xlApp.Workbooks.Add()
> >> # xlApp.ActiveSheet.Cells(1,1).Value = 'A'
> >> # xlApp.ActiveWorkbook.ActiveSheet.Cells(2,1).Value = 'B'
> >> # xlBook = xlApp.ActiveWorkbook
> >> # xlBook.SaveAs(Filename='C:\\test.xls')
>
> >> # pdb.set_trace()
> >> response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
> >> excel?
> >> priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
> >> +25%2C+2008&export=Excel_IEIPerfRegional')
> >> # test 2 - returns check = False
> >> check_for_data = urllib2.Request('http://www.mscibarra.com/webapp/
> >> indexperf/excel?
> >> priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
> >> +25%2C+2008&export=Excel_IEIPerfRegional').has_data()
>
> >> xlApp = response.fp
> >> print(response.fp.name)
> >> print(xlApp.name)
> >> xlApp.write
> >> xlApp.Close
> >> [/quote]
>
> > Woops hit Send when I wanted Preview.  Looks like the html [quote] tag
> > doesn't work from groups.google.com (nice).
>
> > Anway, in test 1 above, I determined how to instantiate an excel
> > object; put some stuff in it; then save to disk.
>
> > So, in theory, I'm retrieving my excel spreadsheet with
>
> > response = urllib2.urlopen()
>
> > Except what then do I do with this?
>
> > Well for one read some of the urllib2 documentation and found the
> > Request class with the method has_data() on it.  It returns False.
> > Hmm that's not encouraging.
>
> > I supposed the trick to understand what urllib2.urlopen is returning
> > to me; rummage around in there; and hopefully find my excel file.
>
> > I use pdb to debug.  This is interesting:
>
> > (Pdb) dir(response)
> > ['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',
> > 'code', '
> > fileno', 'fp', 'geturl', 'headers', 'info', 'msg', 'next', 'read',
> > 'readline', '
> > readlines', 'url']
> > (Pdb)
>
> > I suppose the members with __*_ are methods; and the names without the
> > underbars are attributes (variables) (?).
>
> No, these are the names of all attributes and methods. read is a method,
> for example.

right - I got it backwards.

>
> > Or maybe this isn't at all the right direction to take (maybe there
> > are much better modules to do this stuff).  Would be happy to learn if
> > that's the case (and if that gets the job done for me).
>
> The docs (http://docs.python.org/lib/module-urllib2.html) are pretty
> clear on this:
>
> """
> This function returns a file-like object with two additional methods:
> """
>
> And then for file-like objects:
>
> http://docs.python.org/lib/bltin-file-objects.html
>
> """
> read(   [size])
>      Read at most size bytes from the file (less if the read hits EOF
> before obtaining size bytes). If the size argument is negative or
> omitted, read all data until EOF is reached. The bytes are returned as a
> string object. An empty string is returned when EOF is encountered
> immediately. (For certain files, like ttys, it makes sense to continue
> reading after an EOF is hit.) Note that this method may call the
> underlying C function fread() more than once in an effort to acquire as
> close to size bytes as possible. Also note that when in non-blocking
> mode, less data than what was requested may be returned, even if no size
> parameter was given.
> """
>
> Diez

Just stumbled upon .read:

response = urllib2.urlopen('http://www.mscibarra.com/webapp/indexperf/
excel?
priceLevel=0&scope=0&currency=15&style=C&size=36&market=1897&asOf=Jul
+25%2C+2008&export=Excel_IEIPerfRegional').read

Now the question is: what to do with this?  I'll look at the
documentation that you point to.

thanx - pat



More information about the Python-list mailing list