Regex recursion error example.
Harvey Thomas
hst at empolis.co.uk
Fri Nov 1 11:12:59 EST 2002
Yin wrote:
>
> After tinkering with this issue for a day or so, I've decided to use
> xmllib to solve the problem. But for future reference, I've attached
> the piece of text that is failing and the two approaches that I've
> tried to make the match.
>
> Of course there are other approaches to doing this parse, but I am
> interested in understanding the regex approach I am trying and its
> limitations.
>
> If there are no solutions using regex, I would be interested in seeing
> a reference to articles or books that discuss overcoming particularly
> long string matches.
>
> Approach 1:
> pattern=re.compile('<PubMedArticle>(.*?)</PubMedArticle>',
> re.DOTALL)
> self.citationlist = re.findall(pattern, allinput)
>
> Approach 2:
> comppat=re.compile(r'<PubMedArticle>((?:(?!<PubMedArt
> icle>).)*)</PubMedArticle>',
> re.DOTALL)
> self.citationlist = re.findall(pattern, allinput)
>
> There are three matching to make in this body of text. The above code
> has been failing on the second of the third. This problem has only
> been occuring on linux python and Windows python (the stack in Windows
> is just larger enough to accomadate the matches.
> Text to match:
>
> http://160.129.203.97/1998_xmltest.html
>
> Please let me know by e-mail if the link is down.
>
> Thanks again,
> Yin
> --
How about this (untested); don't think you will get a recursion problem:
pattern=re.compile("""
<PubMedArticle>
(?:[^&]+
|
&(?!lt;/PubMedArticle>)
)*
</PubMedArticle>"""
, re.DOTALL, re.VERBOSE)
self.citationlist = pattern.findall(allinput)
Harvey
_____________________________________________________________________
This message has been checked for all known viruses by the MessageLabs Virus Scanning Service.
More information about the Python-list
mailing list