Why no RE match of A AND B?

Harvey Thomas hst at empolis.co.uk
Mon Mar 3 03:59:02 EST 2003


Paddy wrote:
> 
> Just looking through the docs to confirm it and their seems 
> to be no regular expression 
> construct that matches 'A and B' - an equivalent to the '|' 
> character that matches A or B
> To paraphrase the documentation it could be documented as:
> 
>    "&"
> A&B, where A and B can be arbitrary REs, creates a regular 
> expression that will match both 
> A and B in any order. An arbitrary number of REs can be 
> separated by the "&" in this way. 
> This can be used inside groups as well. REs separated by "&" 
> are tried in arbitrary order, 
> the beginning of the first matching branch up to the end of 
> the last branch that allows 
> the complete pattern (all branches) to match is considered 
> the span of the match.
> 
> I envision it being used where a string is accepted at 
> present if it matches multiple REs.
> The REs could be rewritten as r"<RE#1>&<RE2>&<RE3>...".
> 
> The use of "&" could be supported by more efficient, 
> (simultaneous?),  matching of all 
> branches of the "&" construct - if such algorithms exist!
> 
>   Just an idea,
> 
>    Paddy.
>
 
One of the simplifications of SGML made in the creation of XML was the dropping of the & operator in content models. I believe there were two reasons for this:

1) & was hardly ever used.
2) Implementation problems. Suppose you have a&b&c&d&e&f&g&h. There are 40320 patterns that match this, but this is more than the number of backtracking states that the Python RE engine supports. 

_____________________________________________________________________
This message has been checked for all known viruses by the MessageLabs Virus Scanning Service.





More information about the Python-list mailing list