How to catch exceptions elegantly in this situation?

Robert Brewer fumanchu at amor.org
Wed Oct 13 02:42:01 EDT 2004


Saqib Ali wrote:
> The real issue is that I am doing some screen-scraping from on-line
> white pages (residential telephone directory).
> 
> I have defined a bunch of regular expressions and myFunc() populates a
> dictionary corresponding to each result from the white pages.
> 
> So essentially myDict corresponds to a single record found. See below
> 
> myDict["fullName"] = fullNameRegExp.match(htmlText)[0]
> myDict["telNum"] = telNumRegExp.match(htmlText)[0]
> myDict["streetAddr"] = streetAddrRegExp.match(htmlText)[0]
> myDict["city"] = cityRegExp.match(htmlText)[0]
> myDict["state"] = stateRegExp.match(htmlText)[0]
> myDict["zip"] = zipRegExp.match(htmlText)[0]
> 
> 
> Sometimes one or more of these regexps fails to match. In which Case
> an exception will be raised. I want to catch the exception, print out
> a message..... but then keep on going to the next assignment
> statement.
> 
> How can I do that without wrapping each assignment in its own
> try/except block??

If you're looking to save typing or space or something:

htmlText = file.open(filename).read()

def match_regex(regex):
    try:
        return regex.match(htmlText)[0]
    except Exception:
        logger.exception()
        return ""

myDict["fullName"] = match_regex(fullNameRegExp)
myDict["telNum"] = match_regex(telNumRegExp)
myDict["streetAddr"] = match_regex(streetAddrRegExp)
myDict["city"] = match_regex(cityRegExp)
myDict["state"] = match_regex(stateRegExp)
myDict["zip"] = match_regex(zipRegExp)

But then, that's still ugly. Rather than naming each regex as above,
consider putting them into a dict:

regexes = {"fullName": re.compile(...),
           "telNum": re.compile(...),
           ...
           }

Fill in the ... as appropriate. Then you can write the far cleaner
snippet:

for key, regex in regexes.iteritems():
    try:
        myDict[key] = regex.match(htmlText)[0]
    except Exception:
        logger.exception()
        myDict[key] = ""    # or some other default value



If you're looking to save execution speed or some other concern... I
doubt you'll find a solution.


Robert Brewer
MIS
Amor Ministries
fumanchu at amor.org



More information about the Python-list mailing list