Why no RE match of A AND B?

Mon Mar 3 16:31:45 EST 2003

John Machin <sjmachin at lexicon.net> wrote:
> Tim Peters <tim.one at comcast.net> wrote in message news:<mailman.1046651583.23834.python-list at python.org>...
>> 
>> For a so-called DFA algorithm, yes, because the intersection of regular
>> languages is also regular, and membership in any regular language can be
>> recognized in one pass over the string in question.  What to do in the
>> presence of (non-regular) backreferences is clear as mud, though, and
>> Python's regexp package isn't a DFA engine anyway.  I don't know of any
>> regexp package that supports an intersection operator, so it would also
>> suffer from novelty.  All in all, better to do the obvious thing (i.e., run
>> more than one regexp).

> I seem to detect two quite different usages of "intersection":

> (1) The regular language (RL) characterised by the RE "abc" produces
> just one string i.e. 'abc'. Ditto the RL described by the RE "xyz"
> produces just one string 'xyz'. I (a mug punter, not an expert) would
> understand the intersection of the two RLs to be empty i.e. [] i.e.
> the two RLs produce no common strings. Same with "a+" and "b+",
> whereas "a*" and "b*" have a non-empty intersection [''].

> (2) The effect that the OP (together with Tim and other posters)
> seemed to expect from the & operator was that (abc)&(xyz) would match
> (abcxyz)|(xyzabc).

I think the difference is that in Python, the RE "abc" matches any
string containing 'abc', not just the exact string itself.  I'd guess
most formalists who aren't familiar with Python would write that
regular expression as ".*abc.*", and expect they's need to write
"(.*abc.*)&(.*xyz.*)" to match both 'abcxyz' and 'xyzabc'.