regex alternation problem

Paul McGuire ptmcg at austin.rr.com
Fri Apr 17 18:28:39 EDT 2009


On Apr 17, 4:49 pm, Jesse Aldridge <JesseAldri... at gmail.com> wrote:
> import re
>
> s1 = "I am an american"
>
> s2 = "I am american an "
>
> for s in [s1, s2]:
>     print re.findall(" (am|an) ", s)
>
> # Results:
> # ['am']
> # ['am', 'an']
>
> -------
>
> I want the results to be the same for each string.  What am I doing
> wrong?

Does it help if you expand your RE to its full expression, with '_'s
where the blanks go:

"_am_" or "_an_"

Now look for these in "I_am_an_american".  After the first "_am_" is
processed, findall picks up at the leading 'a' of 'an', and there is
no leading blank, so no match.  If you search through
"I_am_american_an_", both "am" and "an" have surrounding spaces, so
both match.

Instead of using explicit spaces, try using '\b' meaning word break:

>>> import re
>>> re.findall(r"\b(am|an)\b", "I am an american")
['am', 'an']
>>> re.findall(r"\b(am|an)\b", "I am american an")
['am', 'an']

-- Paul




Your find pattern includes (and consumes) a leading AND trailing space
around each word.  In the first string "I am an american", there is a
leading and trailing space around "am", but the trailing space for
"am" is the leading space for "an", so " an "



More information about the Python-list mailing list