unexpected regexp behaviour using 'A|B|C.....'
Peter Otten
__peter__ at web.de
Thu Jul 28 06:57:18 EDT 2011
AlienBaby wrote:
> When using re patterns of the form 'A|B|C|...' the docs seem to
> suggest that once any of A,B,C.. match, it is captured and no further
> patterns are tried. But I am seeing,
>
> st=' Id Name Prov Type CopyOf BsId
> Rd -Detailed_State- Adm Snp Usr VSize'
>
> p='Type *'
> re.search(p,st).group()
> 'Type '
>
> p='Type *| *Type'
> re.search(p,st).group()
> ' Type'
>
>
> Shouldn’t the second search return the same as the first, if further
> patterns are not tried?
>
> The documentation appears to suggest the first match should be
> returned, or am I misunderstanding?
All alternatives are tried at a given starting position in the string before
the algorithm advances to the next position. The second alternative
" *Type", at least one space followed by the character sequence "Type"
matches right after "Prov" in your example, therefore the first
alternative, "Type" and any following spaces, which would match after
"Prov " is never tried.
Maybe you accidentally typed one extra " "? If you didn't " +Type" would be
clearer.
More information about the Python-list
mailing list