RegEx conditional search and replace

Blair P. Houghton blair.houghton at gmail.com
Thu Jul 6 16:43:05 EDT 2006


mbstevens wrote:
> In such a case you may need to make the page
> into one string to search if you don't want to use some complex
> method of tracking state with variables as you move from
> string to string.

In general it's a very hard problem to do stateful regexes.

I recall something from last year about the new Perl implementation
that tried to address this sort of problem.  But I may have been
reading old docs and it could have been done years ago.

Parsing the HTML would be the only sure way to accomplish
it.  Let something that already knows the hierarchy tell you
that you're entering a URL and you can skip past all of its
recursive inclusions of strings with URLs with strings that
have URLs and so on...

Of course, that means reconstructing the HTML from the
parse tree afterward...

--Blair




More information about the Python-list mailing list