urllib download insanity

Andrew Dalke dalke at dalkescientific.com
Thu May 12 12:21:38 EDT 2005


Timothy Smith wrote:
> ok what i am seeing is impossible.
> i DELETED the file from my webserver, uploaded the new one. when my app 
> logs in it checks the file, if it's changed it downloads it. the 
> impossible part, is that on my pc is downloading the OLD file i've 
> deleted! if i download it via IE, i get the new file. SO, my only 
> conculsion is that urllib is caching it some where. BUT i'm already 
> calling urlcleanup(), so what else can i do?

Here are some ideas to use in your hunt.

 - If you are getting a cached local file then the returned object
will have a "name" attribute.

   result = urllib.retrieve(".....")
   print result.fp.name

As far as I can tell, this will only occur if you use
a tempcache or a file URL.


  - You can force some debugging of the open calls, to see if
your program is dealing with a local file.

>>> old_open = open
>>> def my_open(*args):
...   print "opening", args
...   return old_open(*args)
... 
>>> open("/etc/passwd")    
<open file '/etc/passwd', mode 'r' at 0x60da0>
>>> import __builtin__
>>> __builtin__.open = my_open
>>> open("/etc/passwd")
opening ('/etc/passwd',)
<open file '/etc/passwd', mode 'r' at 0x60c20>
>>> 

You'll may also need to change os.fdopen because that's used
by retrieve if it needs a tempfile.

If you want to see where the open is being called from,
use one of the functions in the traceback module to print
the stack trace.

  - for surety's sake, also do 

import webbrowser
webbrowser.open(url)

just before you do 

urllib.retrieve(url, filename)

This will double check that your program is using the URL you
expect it to use.

  - beyond that, check that you've got network activity,

You could check the router lights, or use a web sniffer like
ethereal, or set up a debugging proxy

  - check the headers.  If your ISP is using a cache then
it might insert a header into what it returns.  But if
it was caching then your IE view should have seen the cached
version as well.

				Andrew
				dalke at dalkescientific.com
 



More information about the Python-list mailing list