Regex trouble

akshat agarwal csd01412 at gmail.com
Thu Apr 2 02:35:04 EDT 2009


Thanks Andrew. I was also of the same view that perl handled this via some
special cases.

On Wed, Apr 1, 2009 at 8:32 PM, andrew cooke <andrew at acooke.org> wrote:

>
> more exactly, my guess is perl has a special case for this that avoids
> doing a search over all possible matchers via the pushdown stack.
>
> andrew cooke wrote:
> >
> > ".*?" is a "not greedy" match, which is significantly more difficult to
> > handle than a normal ".*".  so the performance will depend on quite
> > complex details of how the regular expression engine is implemented.  it
> > wouldn't surprise me if perl was better here, because it comes from a
> > background with a much stronger emphasis on regular expressions.
> >
> > i know that not an exact answer, but unless you find the author of the re
> > library i am not sure you will get a much better explanation.  it comes
> > down to whether the regular expression can be implemented using a
> > deterministic finite automaton (rather than an indeterministic one).  if
> > you look at the table at the bottom of
> > http://en.wikipedia.org/wiki/Finite_state_machine then i believe (i am
> not
> > 100% sure) that the ".*?" match requires at least a pushdown automota,
> > while ".*" can be handled with a simple finite automaton.
> >
> > disclaimer: this is all fairly new to me as i just recently implemented a
> > regular expression matcher myself, and i may be wrong on some of the
> > details.
> >
> > andrew
> >
> >
> > akshat agarwal wrote:
> >> Hi,
> >>
> >> I am trying to use the following snippet of code to print a regex match.
> >>
> >> s = '01234567890123456789x012'
> >>
> >> pat = r'(.*?x|[^a]+)*y'
> >>
> >> mo = re.search(pat, s)
> >> if mo is not None:
> >>     print mo.group(0)
> >>
> >> By adding one character before the 'x' in the input string, the time
> >> taken
> >> to print the match doubles. This behaviour is not observed in perl. I am
> >> curious to know about the difference the in regex implementations of
> >> perl
> >> and python which causes this.
> >>
> >> Thanks
> >> --
> >> http://mail.python.org/mailman/listinfo/python-list
> >>
> >
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090402/e5054aff/attachment-0001.html>


More information about the Python-list mailing list