[Tutor] regex problem

Alan Gauld alan.gauld at freenet.co.uk
Wed Jan 5 23:13:55 CET 2005


> > Using regex to remove HTML is usually the wrong approach unless 
> 
> Thanks.  This is one of those projects I've had in mind for a long
> time, decided it was a good way to learn some python.  

It's a good way to write increasingly complex regex! Basically 
because HTML is recursive in nature it is almost impossible 
to reliably use regex to parse HTML files. (The latest regex 
syntax can cope with recursion but its horribly complicated)

So unless you accept the limitations of the method you may 
well become more frustrated by the regex stuff than you 
become experienced in Python.

Alan G.
"When all you have is a hammer everything looks like a nail"


More information about the Tutor mailing list