How do I 'stat' online files?

Carsten Haese carsten at uniqsys.com
Wed Jul 25 14:11:19 EDT 2007


On Tue, 2007-07-24 at 22:23 -0300, Gabriel Genellina wrote:
> En Tue, 24 Jul 2007 10:47:16 -0300, Carsten Haese <carsten at uniqsys.com>  
> escribió:
> 
> > On Tue, 2007-07-24 at 09:07 -0400, DB Daniel Brown wrote:
> >> I am working on a program that needs to stat files (gif, swf, xml,
> >> dirs, etc) from the web. I know how to stat a local file…
> >> but I can’t figure out how to stat a file that resides on a web
> >> server.
> >
> > That's because urlopen returns a file-like object, not a file. The best
> > you can hope for is to inspect the headers that the web server returns:
> >
> >>>> import urllib
> >>>> f = urllib.urlopen("http://www.python.org")
> >>>> f.headers['last-modified']
> > 'Mon, 23 Jul 2007 20:35:52 GMT'
> >>>> f.headers.items()
> > [('content-length', '14053'), ('accept-ranges', 'bytes'), ('server',
> > 'Apache/2.2.3 (Debian) DAV/2 SVN/1.4.2 mod_ssl/2.2.3 OpenSSL/0.9.8c'),
> > ('last-modified', 'Mon, 23 Jul 2007 20:35:52 GMT'), ('connection',
> > 'close'), ('etag', '"60193-36e5-39089a00"'), ('date', 'Tue, 24 Jul 2007
> > 13:42:57 GMT'), ('content-type', 'text/html')]
> >
> > Maybe that's good enough for your needs.
> 
> This generates an HTTP GET request - transfering the contents too,  
> innecesarily.

Yes, but how much of that content will actually be transferred if I
don't call f.read?

Consider this little test:

# urltest.py
import time, urllib

t1 = time.time()
f = urllib.urlopen("http://data.phishtank.com/data/online-valid/")
print f.headers.items()
t2 = time.time()
f.close()
print t2-t1
# eof

$ python urltest.py
[('content-length', '4390510'), ('accept-ranges', 'bytes'), ('server',
'Apache/2.2.4 (FreeBSD) mod_ssl/2.2.4 OpenSSL/0.9.7e-p1 DAV/2 PHP/5.2.0
with Suhosin-Patch'), ('last-modified', 'Wed, 25 Jul 2007 17:58:04
GMT'), ('connection', 'close'), ('etag', '"5705e1-42fe6e-40612300"'),
('date', 'Wed, 25 Jul 2007 18:07:46 GMT'), ('content-type',
'application/xml')]
0.303626060486

I doubt that my computer just downloaded 4 MB of stuff in 0.3 seconds.

-- 
Carsten Haese
http://informixdb.sourceforge.net





More information about the Python-list mailing list