String Regex problem
Andrei
project5 at redrival.net
Tue Nov 25 05:48:41 EST 2003
Skip Montanaro wrote on Mon, 24 Nov 2003 21:35:48 -0600:
> >> Since I am very poor in regex, can someone show me how to do it using
> >> a few examples?
>
<snip>
> Don> http://kodos.sourceforge.net
>
> If you're a Mac Python person there's also Dinu Gherman's excellent
> RegexPlor:
>
> http://starship.python.net/crew/gherman/RegexPlor.html
<snip>
I'm biased here, but Kiki (but http://project5.freezope.org/kiki) is
cross-platform and doesn't depend on Qt but on wxPy which is much easier
for Windows users.
Anyway, here's a regex I ripped out of my own code - you might want to
simplify it though:
"""Regex for finding URLs:
URL's start with http(s)/ftp/news ((http)|(ftp)|(news))
followed by ://
then any number of non-whitespace characters including
numbers, dots, forward slashes, commas, question marks,
ampersands, equality signs, dashes, underscores and plusses,
but ending in a non-dot and non-plus!
Result:
(?:http|https|ftp|news)://(?:[@a-zA-Z0-9,/%:\&+#\?=\-_~;]+\.*)+[a-zA-Z0-9,/%:\&#\?=\-_]
Tests:
Plain old link: http://www.mail.yahoo.com.
Containing numbers: ftp://bla.com/di~ng/co.rt,39,%93 or other
Go to news://bl_a.com/?ha-h+a&query=tb for more info.
A real link: <a href="http://x.com">http://x.com</a>.
ftp://verylong.org/url/must/be/chopped/to/pieces/oritwontfit.html
(long one)
<IMG src="http://b.com/image.gif" /> (a plain image tag)
<a href=http://fixedlink.com/orginialinvalid.html>fixed</a> (original
invalid HTML)
Link containing an anchor
<b>"http://myhomepage.com/index.html#01"</b>.
"""
--
Yours,
Andrei
=====
Mail address in header catches spam. Real contact info (decode with rot13):
cebwrpg5 at jnanqbb.ay. Fcnz-serr! Cyrnfr qb abg hfr va choyvp cbfgf. V ernq
gur yvfg, fb gurer'f ab arrq gb PP.
More information about the Python-list
mailing list