[Tutor] Re trouble

Alan Gauld alan.gauld at blueyonder.co.uk
Tue Oct 28 02:28:17 EST 2003


> regarding extracting URLs from HTML documents via Python regular
> expressions, this question has been asked many times and the 
> consensus is that you want to use HTMLParser, as re doesn't 
> keep state and is not the right tool for this task.

That's true for parsing html for tag values but if you are looking 
for urls then regex should work just fine. There is no chance of 
a regex containing another regex within it!

So provided you aren't trying to extract the url by looking for
<A></A> pairs or somesuch it should be OK.

Alan G.




More information about the Tutor mailing list