Rookie Speaks

Thu Jan 8 04:58:48 EST 2004

William S. Perrin wrote:

I thinke your function has a sane design :-) XML is slow by design, but in
your case it doesn't really matter, because is probably I/O-bound, as
already pointed out by Samuel Walters.

Below is a slightly different approach, that uses a class:

class Weather(object):
    def __init__(self, url=None, xml=None):
        """ Will accept either a URL or a xml string,
            preferrably as a keyword argument """
        if url:
            if xml:
                # not sure what would be the right exception here
                # (ValueError?), so keep it generic for now
                raise Exception("Must provide either url or xml, not both")
            sock = urllib.urlopen(url)
            try:
                xml = sock.read()
            finally:
                sock.close()
        elif xml is None:
            raise Exception("Must provide either url or xml")
        self._dom = minidom.parseString(xml)

    def getAttrFromDom(self, weatherAttribute):
        a =  self._dom.getElementsByTagName(weatherAttribute)
        return a[0].firstChild.data

    def asRow(self):
        # this will defeat lazy attribute lookup
        return "%13s\t%s\t%s\t%s\t%s\t%s\t%s" % (self.name,
            self.fahrenheit, self.wind, self.barometric_pressure,
            self.dewpoint, self.relative_humidity, self.conditions)
        return

    def __getattr__(self, name):
        try:
            value = self.getAttrFromDom(name)
        except IndexError:
            raise AttributeError(
                "'%.50s' object has no attribute '%.400s'" %
(self.__class__, name))
        # now set the attribute so it need not be looked up
        # in the dom next time
        setattr(self, name, value)
        return value

This has a slight advantage if you are interested only in a subset of the
attributes, say the temperature:

for url in listOfUrls:
    print Weather(url).fahrenheit

Here getAttrFromDom() - the equivalent of your getattrs() - is only called
once per URL. The possibility to print a tab-delimited row is still there, 

print Weather(url).asRow()

but will of course defeat this optimization scheme.

Peter