[Tutor] Clarified: Best way to alter sections of a string which match dictionary keys?

Karl Pflästerer sigurd at 12move.de
Fri Jan 2 22:22:27 EST 2004


On  2 Jan 2004, SSokolow <- from_python_tutor at SSokolow.com wrote:

> It does currently work and here's the tested functional code to put a
> checkmark beside each visited link:

>         for url_key in episodesViewed.keys():
>             string = re.sub(r'(?i)(<a.*?href="' + url_key[1:] +
>             r'".*?)', r'<img src="http://_proxy/checkmark.png">\1',
>             string)

> each key in episodesViewed is a URL such as "/10523.html" and the
> variable name string is not my choice. It is a standard convention for
> all transport-level decoding modules in proxy 4 (I haven't figured out
> how to hook this code in at the content level so I'm improvising)
[...]
> The result is that a hyperlink such as <a href="10523.html">Episode
> 10523</a> will become <img src="http://_proxy/checkmark.png"><a
> href="10523.html">Episode 10523</a> but only if /10523.html is a key
> in the episodesViewed dictionary.

> I want is some code that loops through each link in the page (the
> string variable holds the contents of an HTML file) and uses
> everMemory.has_key() to figure out whether it should put <img
> src="http://_proxy/checkmark.png"> beside the link.

> Hope this is a little more understandable

Yes it is. here is an attempt to achieve what you want.  Perhaps you
have to frob the regexp a bit but I think it will match.

********************************************************************
import sre

reg = sre.compile(
                  r"""
                  (?P<preanchor>.*?)
                  (?P<anchor><a.*?href=(?:\"|')
                  (?P<ref>.*?)
                  (?:\"|').*?</a>)
                  """, sre.I | sre.S |sre.X)

episodesViewed = {} # fill it with your values

def sub_if_in_hash(matcho):
    dic = matcho.groupdict()
    if dic['ref'] in episodesViewed:
        return dic['preanchor'] + '<img src="_proxy/checkmark.png" />"' + dic['anchor']
    else:
        return dic['preanchor'] + dic['anchor']
    
string = reg.sub(sub_if_in_hash, string)

********************************************************************

To the regexp
-------------

It has three parts:
(?P<preanchor>.*?)
        Here the part before an anchor is found
(?P<anchor><a.*?href=(?:\"|')
        Here we find an anchor
(?P<ref>.*?)
        Here we grab the reference
(?:\"|').*?</a>)
        This closes our groups

All groups get names; that makes it nicer to work with matching groups
IMO.

The function gets calles with every match found.  It checks if the
reference is in the hash table; if yes it returns a changed version of
the match; if not the match is returned unchanged.

The string is processed only once and the lookups in the dictionary are
pretty fast.  Test it with real data maybe you have to change the regexp
a bit.

You could have one problem: if there was an HTML page without any links
the regexp would fail.  Best is you check that first (but on the other
hand; which HTML site does not have at least one link?).



   Karl
-- 
Please do *not* send copies of replies to me.
I read the list




More information about the Tutor mailing list