re documentation error
Heiko Wundram
heikowu at ceosg.de
Mon Sep 17 14:11:48 EDT 2001
On Monday 17 September 2001 18:30, you wrote:
> looks like a bug in the new (2.0) engine:
Actually, to me it looks like 1.5.2's engine had a bug! ;))
> [snip 1.5.2 output]
> >>> import sre # 2.0's regular expression engine
> >>> p = sre.compile("x*")
> >>> p.sub("-", "abxd")
Look what it does (in my oppinion that is correct behaviour).
It starts by trying to mach x* at pos 0:
nothing machtes x* -> so insert - in output
get next char from input. We now have "-a"
Now matches x* against pos 1:
nothing matches x* -> so insert -
get next chat from input. We now have "-a-b"
Now matches x* against pos 2:
matches x -> so replace with -
get no char from input, as there was a match. We now have "-a-b-"
Now comes the crucial point:
Match x* against pos 3:
nothing matches x* -> so insert -
get next char from input. we now have "-a-b--d"
etc.
And that way we arrive that the output that was specified. What the above
pseudocode does is move one ahead if one character or none matched in the
input, otherwise move ahead as many as the match had. And I guess you've
implemented something quite similar...
I don't think it always makes sense to have a different behaviour, because
sre.sub used in this fashion is actually quite an interesting way to split
apart letters in a string and insert letters between them. Just use one
letter that doesn't appear in the string, and you're off (might be slow
though...)
Well, I actually think the sre's behaviour is useful. Why not keep it at
that? Any anyway, people are discouraged to use * that way, but rather +
(which doesn't produce this kind of "strange behaviour"...)
Just my two cents on this topic.
--
Yours sincerely,
Heiko Wundram
More information about the Python-list
mailing list