How to organize substite function with side effects

Sin Hang Kin kentsin at poboxes.com
Mon Jun 18 10:13:14 EDT 2001


The following code download the file from web, and remove missing <img> tags
from it.

Since it need to keep track of the link queue, baseurl and localstorage, I
decide to warp it in a class:

class urlgraber:
   def __init__(self, baseurl, localpath):
      self.baseurl = baseurl
      self.urlq = []
      self.localpath = localpath

   def retrieveimage(self, url):
       try:
             (fn, hdr) = urllib.urlretrieve(self.baseurl+url,
self.localpath+url)
             return 1
       except:
             return 0
      # retrieve the image, return true if the retrieval success

    def downloadpage(self, url):
       # download page
       ....
        # now, use re.sub to subst images:
        re.sub(imagepat, substfunc, page, 0)


The idea is that substfunc is called with the <img ....>. it then extract
the url, call retrieve image, and return either <img src=localpath> or <!--
img --> as result.

The problem is, it is very confuse to get the localpath and baseurl between
them.

Can somebody help me re-organize this? What is the proper way to do this in
python?

--
Rgs,

Kent Sin
---------------------------------
kentsin at yahoo.com





More information about the Python-list mailing list