unexpected regexp behaviour using 'A|B|C.....'

Peter Otten __peter__ at web.de
Thu Jul 28 06:57:18 EDT 2011


AlienBaby wrote:

> When using re patterns of the form 'A|B|C|...'  the docs seem to
> suggest that once any of A,B,C.. match, it is captured and no further
> patterns are tried.  But I am seeing,
> 
> st='  Id Name                    Prov Type  CopyOf              BsId
> Rd -Detailed_State-    Adm     Snp      Usr VSize'
> 
> p='Type *'
> re.search(p,st).group()
> 'Type  '
> 
> p='Type *|  *Type'
> re.search(p,st).group()
> ' Type'
> 
> 
> Shouldn’t the second search return the same as the first, if further
> patterns are not tried?
> 
> The documentation appears to suggest the first match should be
> returned, or am I misunderstanding?

All alternatives are tried at a given starting position in the string before 
the algorithm advances to the next position. The second alternative 
"  *Type", at least one space followed by the character sequence "Type" 
matches right after "Prov" in  your example, therefore the first 
alternative, "Type" and any following spaces, which would match after 
"Prov " is never tried. 

Maybe you accidentally typed one extra " "? If you didn't " +Type" would be 
clearer.




More information about the Python-list mailing list