Whittle it on down

Steven D'Aprano steve+comp.lang.python at pearwood.info
Thu May 5 03:36:47 EDT 2016


Oh, a further thought...


On Thursday 05 May 2016 16:46, Stephen Hansen wrote:

> On Wed, May 4, 2016, at 11:04 PM, Steven D'Aprano wrote:
>> Start by writing a function or a regex that will distinguish strings that
>> match your conditions from those that don't. A regex might be faster, but
>> here's a function version.
>> ... snip ...
> 
> Yikes. I'm all for the idea that one shouldn't go to regex when Python's
> powerful string type can answer the problem more clearly, but this seems
> to go out of its way to do otherwise.
> 
> I don't even care about faster: Its overly complicated. Sometimes a
> regular expression really is the clearest way to solve a problem.

Putting non-ASCII letters aside for the moment, how would you match these 
specs as a regular expression?

- All uppercase ASCII letters (A to Z only), optionally separated into words 
by either a bare ampersand (e.g. "AAA&AAA") or an ampersand with leading and 
trailing spaces (spaces only, not arbitrary whitespace): "AAA   & AAA".

- The number of spaces on either side of the ampersands need not be the 
same: "AAA&   BBB &       CCC" should match.

- Leading or trailing spaces, or spaces not surrounding an ampersand, must 
not match: "AAA BBB" must be rejected.

- Leading or trailing ampersands must also be rejected. This includes the 
case where the string is nothing but ampersands.

- Consecutive ampersands "AAA&&&BBB" and the empty string must be rejected.


I get something like this:

r"(^[A-Z]+$)|(^([A-Z]+[ ]*\&[ ]*[A-Z]+)+$)"


but it fails on strings like "AA   &  A &  A". What am I doing wrong?


For the record, here's my brief test suite:


def test(pat):
    for s in ("", " ", "&" "A A", "A&", "&A", "A&&A", "A& &A"):
        assert re.match(pat, s) is None
    for s in ("A", "A & A", "AA&A", "AA   &  A &  A"):
        assert re.match(pat, s)




-- 
Steve




More information about the Python-list mailing list