[Tutor] String searching

André Dahlqvist andre@beta.telenordia.se
Mon, 5 Jun 2000 22:15:32 +0200


> I recommend the documentation on regular expressions available from the
> Python site. It is pretty good at getting you up to speed. The relevant
> chapter in "Python: Essential Reference" (Beazley) is useful as well.

I've read the regex HOWTO on the python site, and it was a quiet well
written piece. Your introduction has also inspired me into learning
more about them.

> Is this text you are searching just plain text or is it marked up in html?

It's just plain text, but I studied the html regex too because that
will probably come in handy some day. You explained it very clearly.
It's not as hard as it looks:-)

> If the text you are seaching is plain text then it is no more difficult. You
> must think about how to specify the different cases. The Url could be
> surrounded by quotes or parenthesis, or brackets... but probably NOT by
> extraneous letters (ie (awordhttp:www.somewhere.comaword)

After investigating the findall method, and reading your intro to regex
I now have this short piece of code that has extracted all the links I
have given to it so far, even if they start with something like a
opening pharentesis:

getURLs = re.compile('((?:http|ftp)://[-.~/_?=#%\w]+\w)')
found_urls = getURL.findall(text)

Thank you so much for your help Craig!
-- 

// André