re documentation error

Carlos Gaston Alvarez cgaston at moonqzie.com
Mon Sep 17 14:44:35 EDT 2001


I dont agree.
* means {0,inifite}
so 'x*' matchs '', 'x', 'xx', 'xxxx', ...
it should be greedy. If not when you have
'xxx' you can say that it is
'' + 'x' + '' + 'x' + '' + 'x' + ''

Chau,

Gaston


----- Original Message -----
From: "Heiko Wundram" <heikowu at ceosg.de>
To: "Fredrik Lundh" <fredrik at effbot.org>
Cc: <python-list at python.org>
Sent: Monday, September 17, 2001 8:11 PM
Subject: Re: re documentation error


> On Monday 17 September 2001 18:30, you wrote:
> > looks like a bug in the new (2.0) engine:
>
> Actually, to me it looks like 1.5.2's engine had a bug! ;))
>
> > [snip 1.5.2 output]
>
> > >>> import sre # 2.0's regular expression engine
> > >>> p = sre.compile("x*")
> > >>> p.sub("-", "abxd")
>
> Look what it does (in my oppinion that is correct behaviour).
>
> It starts by trying to mach x* at pos 0:
> nothing machtes x* -> so insert - in output
>
> get next char from input. We now have "-a"
>
> Now matches x* against pos 1:
> nothing matches x* -> so insert -
>
> get next chat from input. We now have "-a-b"
>
> Now matches x* against pos 2:
> matches x -> so replace with -
>
> get no char from input, as there was a match. We now have "-a-b-"
>
> Now comes the crucial point:
>
> Match x* against pos 3:
> nothing matches x* -> so insert -
>
> get next char from input. we now have "-a-b--d"
>
> etc.
>
> And that way we arrive that the output that was specified. What the above
> pseudocode does is move one ahead if one character or none matched in the
> input, otherwise move ahead as many as the match had. And I guess you've
> implemented something quite similar...
>
> I don't think it always makes sense to have a different behaviour, because
> sre.sub used in this fashion is actually quite an interesting way to split
> apart letters in a string and insert letters between them. Just use one
> letter that doesn't appear in the string, and you're off (might be slow
> though...)
>
> Well, I actually think the sre's behaviour is useful. Why not keep it at
> that? Any anyway, people are discouraged to use * that way, but rather +
> (which doesn't produce this kind of "strange behaviour"...)
>
> Just my two cents on this topic.
>
> --
> Yours sincerely,
>
> Heiko Wundram
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>





More information about the Python-list mailing list