Why no RE match of A AND B?
Harvey Thomas
hst at empolis.co.uk
Mon Mar 3 03:59:02 EST 2003
Paddy wrote:
>
> Just looking through the docs to confirm it and their seems
> to be no regular expression
> construct that matches 'A and B' - an equivalent to the '|'
> character that matches A or B
> To paraphrase the documentation it could be documented as:
>
> "&"
> A&B, where A and B can be arbitrary REs, creates a regular
> expression that will match both
> A and B in any order. An arbitrary number of REs can be
> separated by the "&" in this way.
> This can be used inside groups as well. REs separated by "&"
> are tried in arbitrary order,
> the beginning of the first matching branch up to the end of
> the last branch that allows
> the complete pattern (all branches) to match is considered
> the span of the match.
>
> I envision it being used where a string is accepted at
> present if it matches multiple REs.
> The REs could be rewritten as r"<RE#1>&<RE2>&<RE3>...".
>
> The use of "&" could be supported by more efficient,
> (simultaneous?), matching of all
> branches of the "&" construct - if such algorithms exist!
>
> Just an idea,
>
> Paddy.
>
One of the simplifications of SGML made in the creation of XML was the dropping of the & operator in content models. I believe there were two reasons for this:
1) & was hardly ever used.
2) Implementation problems. Suppose you have a&b&c&d&e&f&g&h. There are 40320 patterns that match this, but this is more than the number of backtracking states that the Python RE engine supports.
_____________________________________________________________________
This message has been checked for all known viruses by the MessageLabs Virus Scanning Service.
More information about the Python-list
mailing list