What's wrong with this subroutine?
Daniel Yoo
dyoo at hkn.eecs.berkeley.edu
Mon Feb 25 23:25:46 EST 2002
A.Newby <deathtospam43423 at altavista.com> wrote:
: First, here's what I'm trying to do ......
: I'm working on a html, cgi based chat script. It's working pretty well so
: far.
: I want to get this subroutine so that the user just has to post a url into
: the chat box to get a post a picture or a link in chat. For e.g, to post a
: pic of a walrus, you would simply type ....
: http://www.pbs.org/kratts/world/oceans/walrus/images/walrus.jpg
: ... and the script would take that, and before writing it to the chat log
: it would convert it to ...
: <img src = http://www.pbs.org/kratts/world/oceans/walrus/images/walrus.jpg>
: ... or to post a link, the user would simply type ...
: http://www.goodvibes.com/
: ... and the script would convert it to ....
: <a href = "http://www.goodvibes.com/>Link!</a>
Sounds good! You might want to use a regular expression to detect
this "url" pattern. Here's a translation of Tom Christiansen's (of
Perl fame) HTTP url regular expression:
###
## This is a regular expression that detects HTTP urls.
##
## This is only a small sample of tchrist's very nice tutorial on
## regular expressions. See:
##
## http://www.perl.com/doc/FMTEYEWTK/regexps.html
##
## for more details.
urls = '(%s)' % '|'.join("""http telnet gopher file wais ftp""".split())
ltrs = r'\w'
gunk = '/#~:.?+=&%@!\-'
punc = '.:?\-'
any = "%(ltrs)s%(gunk)s%(punc)s" % { 'ltrs' : ltrs,
'gunk' : gunk,
'punc' : punc }
url = r"""
\b # start at word boundary
( # begin \1 {
%(urls)s : # need resource and a colon
[%(any)s] +? # followed by one or more
# of any valid character, but
# be conservative and take only
# what you need to....
) # end \1 }
(?= # look-ahead non-consumptive assertion
[%(punc)s]* # either 0 or more punctuation
[^%(any)s] # followed by a non-url char
| # or else
$ # then end of the string
)
""" % {'urls' : urls,
'any' : any,
'punc' : punc }
url_re = re.compile(url, re.VERBOSE)
def _test():
sample = """hello world, this is an url:
http://python.org. Can you find it?"""
match = url_re.search(sample)
print "Here's what we found: '%s'" % match.group(0)
if __name__ == '__main__':
_test()
###
Good luck to you!
More information about the Python-list
mailing list