leftmost longest match (of disjunctions)
Peter Hansen
peter at engcorp.com
Mon Dec 1 12:36:11 EST 2003
Joerg Schuster wrote:
>
> The program given below returns the lines:
>
> a
> ab
>
> Is there a way to use python regular expressions such that the program
> would return the following lines?
>
> ab
> ab
>
> ########################################################################
>
> import re
>
> rx1 = re.compile("(a|ab)")
> rx2 = re.compile("(ab|a)")
Have you checked the documentation for "re"?
It reads:
"|" A|B, where A and B can be arbitrary REs, creates a regular expression
that will match either A or B. An arbitrary number of REs can be separated
by the "|" in this way. This can be used inside groups (see below) as well.
As the target string is scanned, REs separated by "|" are tried from left to
right. When one pattern completely matches, that branch is accepted. This
means that once A matches, B will not be tested further, even if it would
produce a longer overall match. In other words, the "|" operator is never
greedy.
------
Seems pretty clear and explicit to me. Your example is basically a working
proof of the above code, so I'm not sure what you were expecting differently.
-Peter
More information about the Python-list
mailing list