Using dictionary to hold regex patterns?

Vlastimil Brom vlastimil.brom at gmail.com
Sun Nov 23 15:20:04 EST 2008


2008/11/23 Gilles Ganault <nospam at nospam.com>

> Hello
>
> After downloading a web page, I need to search for several patterns,
> and if found, extract information and put them into a database.
>
> To avoid a bunch of "if m", I figured maybe I could use a dictionary
> to hold the patterns, and loop through it:
>
> ======
> pattern = {}
> pattern["pattern1"] = ">.+?</td>.+?>(.+?)</td>"
> for key,value in pattern.items():
>        response = ">whatever</td>.+?>Blababla</td>"
>
>        #AttributeError: 'str' object has no attribute 'search'
>        m = key.search(response)
>        if m:
>                print key + "#" + value
> ======
>
> Is there a way to use a dictionary this way, or am I stuck with
> copy/pasting blocks of "if m:"?
>
> Thank you.
> --
> http://mail.python.org/mailman/listinfo/python-list
>
I'm not quite sure, whether I underestand correctly, what should be
achieved; but it seems, that you should do the searches on dict values,
instead of keys, if you want to access the re patterns.
m = re.search(re_pattern_value, text_to_search_in):
if m:
    print key + "#" + m.group()
...
In case, there could be multiple matches, probably findall or finditer would
be more suitable than search.
But after all, regexes aren't very efficient for dealing with HTML, unless
you know quite exactly, what structure you can expect;
probably e.g. BeautifulSoup could be used.
hth,
  Vlasta
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20081123/839d2221/attachment-0001.html>


More information about the Python-list mailing list