Why no RE match of A AND B?

Tim Peters tim.one at comcast.net
Sun Mar 2 18:31:38 EST 2003


[Paddy]
> Just looking through the docs to confirm it and their seems to be
> no regular expression construct that matches 'A and B' - an equivalent
> to the '|' character that matches A or B.

This is usually done by running two regexp searches.  It can be faked via
positive lookahead assertions, like

    pat = re.compile(r'(?=.*x)(?=.*y)')

Then pat.match(string) succeeds if and only if string contains x and string
contains y (but both before the first newline character (if any), since .
doesn't match a newline by default).

> ..
> The use of "&" could be supported by more efficient, (simultaneous?),
> matching of all branches of the "&" construct - if such algorithms
> exist!

For a so-called DFA algorithm, yes, because the intersection of regular
languages is also regular, and membership in any regular language can be
recognized in one pass over the string in question.  What to do in the
presence of (non-regular) backreferences is clear as mud, though, and
Python's regexp package isn't a DFA engine anyway.  I don't know of any
regexp package that supports an intersection operator, so it would also
suffer from novelty.  All in all, better to do the obvious thing (i.e., run
more than one regexp).






More information about the Python-list mailing list