regex alternation problem
Paul McGuire
ptmcg at austin.rr.com
Fri Apr 17 18:28:39 EDT 2009
On Apr 17, 4:49 pm, Jesse Aldridge <JesseAldri... at gmail.com> wrote:
> import re
>
> s1 = "I am an american"
>
> s2 = "I am american an "
>
> for s in [s1, s2]:
> print re.findall(" (am|an) ", s)
>
> # Results:
> # ['am']
> # ['am', 'an']
>
> -------
>
> I want the results to be the same for each string. What am I doing
> wrong?
Does it help if you expand your RE to its full expression, with '_'s
where the blanks go:
"_am_" or "_an_"
Now look for these in "I_am_an_american". After the first "_am_" is
processed, findall picks up at the leading 'a' of 'an', and there is
no leading blank, so no match. If you search through
"I_am_american_an_", both "am" and "an" have surrounding spaces, so
both match.
Instead of using explicit spaces, try using '\b' meaning word break:
>>> import re
>>> re.findall(r"\b(am|an)\b", "I am an american")
['am', 'an']
>>> re.findall(r"\b(am|an)\b", "I am american an")
['am', 'an']
-- Paul
Your find pattern includes (and consumes) a leading AND trailing space
around each word. In the first string "I am an american", there is a
leading and trailing space around "am", but the trailing space for
"am" is the leading space for "an", so " an "
More information about the Python-list
mailing list