What's wrong with this subroutine?

Mon Feb 25 23:25:46 EST 2002

A.Newby <deathtospam43423 at altavista.com> wrote:
: First, here's what I'm trying to do ......

: I'm working on a html, cgi based chat script. It's working pretty well so 
: far. 

: I want to get this subroutine so that the user just has to post a url into 
: the chat box to get a post a picture or a link in chat. For e.g, to post a 
: pic of a walrus, you would simply type ....

: http://www.pbs.org/kratts/world/oceans/walrus/images/walrus.jpg

: ... and the script would take that, and before writing it to the chat log 
: it would convert it to ...

: <img src = http://www.pbs.org/kratts/world/oceans/walrus/images/walrus.jpg>

: ... or to post a link, the user would simply type ... 

: http://www.goodvibes.com/

: ... and the script would convert it to ....

: <a href = "http://www.goodvibes.com/>Link!</a>

Sounds good!  You might want to use a regular expression to detect
this "url" pattern.  Here's a translation of Tom Christiansen's (of
Perl fame) HTTP url regular expression:

###
## This is a regular expression that detects HTTP urls.
##
## This is only a small sample of tchrist's very nice tutorial on
## regular expressions.  See:
##
##     http://www.perl.com/doc/FMTEYEWTK/regexps.html
##
## for more details.

urls = '(%s)' % '|'.join("""http telnet gopher file wais ftp""".split())
ltrs = r'\w'
gunk = '/#~:.?+=&%@!\-'
punc = '.:?\-'
any = "%(ltrs)s%(gunk)s%(punc)s" % { 'ltrs' : ltrs,
                                     'gunk' : gunk,
                                     'punc' : punc }

url = r"""
    \b                            # start at word boundary
    (                             # begin \1 {
        %(urls)s    :             # need resource and a colon
        [%(any)s] +?              # followed by one or more
                                  #  of any valid character, but
                                  #  be conservative and take only
                                  #  what you need to....
    )                             # end   \1 }
    (?=                           # look-ahead non-consumptive assertion
            [%(punc)s]*           # either 0 or more punctuation
            [^%(any)s]            #  followed by a non-url char
        |                         # or else
            $                     #  then end of the string
    )
    """ % {'urls' : urls,
           'any' : any,
           'punc' : punc }

url_re = re.compile(url, re.VERBOSE)

def _test():
    sample = """hello world, this is an url:
                http://python.org.  Can you find it?"""
    match = url_re.search(sample)
    print "Here's what we found: '%s'" % match.group(0)

if __name__ == '__main__':
    _test()
###

Good luck to you!