String Regex problem

Andrei project5 at redrival.net
Tue Nov 25 05:48:41 EST 2003


Skip Montanaro wrote on Mon, 24 Nov 2003 21:35:48 -0600:

>     >> Since I am very poor in regex, can someone show me how to do it using
>     >> a few examples?
> 
<snip>
>     Don> http://kodos.sourceforge.net
> 
> If you're a Mac Python person there's also Dinu Gherman's excellent
> RegexPlor:
> 
>     http://starship.python.net/crew/gherman/RegexPlor.html
<snip>

I'm biased here, but Kiki (but http://project5.freezope.org/kiki) is
cross-platform and doesn't depend on Qt but on wxPy which is much easier
for Windows users.

Anyway, here's a regex I ripped out of my own code - you might want to
simplify it though:

"""Regex for finding URLs:
   URL's start with http(s)/ftp/news ((http)|(ftp)|(news))
   followed by ://
   then any number of non-whitespace characters including
   numbers, dots, forward slashes, commas, question marks,
   ampersands, equality signs, dashes, underscores and plusses,
   but ending in a non-dot and non-plus!
   
   Result:

(?:http|https|ftp|news)://(?:[@a-zA-Z0-9,/%:\&+#\?=\-_~;]+\.*)+[a-zA-Z0-9,/%:\&#\?=\-_]
   
   Tests:
      Plain old link: http://www.mail.yahoo.com. 
      Containing numbers: ftp://bla.com/di~ng/co.rt,39,%93 or other 
      Go to news://bl_a.com/?ha-h+a&query=tb for more info.
      A real link: <a href="http://x.com">http://x.com</a>. 
      ftp://verylong.org/url/must/be/chopped/to/pieces/oritwontfit.html
(long one)
      <IMG src="http://b.com/image.gif" /> (a plain image tag)
      <a href=http://fixedlink.com/orginialinvalid.html>fixed</a> (original
invalid HTML)
      Link containing an anchor
<b>"http://myhomepage.com/index.html#01"</b>.
"""

-- 
Yours,

Andrei

=====
Mail address in header catches spam. Real contact info (decode with rot13):
cebwrpg5 at jnanqbb.ay. Fcnz-serr! Cyrnfr qb abg hfr va choyvp cbfgf. V ernq
gur yvfg, fb gurer'f ab arrq gb PP.






More information about the Python-list mailing list