How to organize substite function with side effects
Sin Hang Kin
kentsin at poboxes.com
Mon Jun 18 10:13:14 EDT 2001
The following code download the file from web, and remove missing <img> tags
from it.
Since it need to keep track of the link queue, baseurl and localstorage, I
decide to warp it in a class:
class urlgraber:
def __init__(self, baseurl, localpath):
self.baseurl = baseurl
self.urlq = []
self.localpath = localpath
def retrieveimage(self, url):
try:
(fn, hdr) = urllib.urlretrieve(self.baseurl+url,
self.localpath+url)
return 1
except:
return 0
# retrieve the image, return true if the retrieval success
def downloadpage(self, url):
# download page
....
# now, use re.sub to subst images:
re.sub(imagepat, substfunc, page, 0)
The idea is that substfunc is called with the <img ....>. it then extract
the url, call retrieve image, and return either <img src=localpath> or <!--
img --> as result.
The problem is, it is very confuse to get the localpath and baseurl between
them.
Can somebody help me re-organize this? What is the proper way to do this in
python?
--
Rgs,
Kent Sin
---------------------------------
kentsin at yahoo.com
More information about the Python-list
mailing list