Whittle it on down

Random832 random832 at fastmail.com
Thu May 5 09:21:39 EDT 2016


On Thu, May 5, 2016, at 03:36, Steven D'Aprano wrote:
> Putting non-ASCII letters aside for the moment, how would you match these 
> specs as a regular expression?

Well, obviously *your* language (not the OP's), given the cases you
reject, is "one or more sequences of letters separated by
space*-ampersand-space*", and that is actually one of the easiest kinds
of regex to write: "[A-Z]+( *& *[A-Z]+)*".

However, your spec is wrong:

> - Leading or trailing spaces, or spaces not surrounding an ampersand,
> must not match: "AAA BBB" must be rejected.

The *very first* item in OP's list of good outputs is 'PHYSICAL FITNESS
CONSULTANTS & TRAINERS'.

If you want something that's extremely conservative (except for the
*very odd in context* choice of allowing arbitrary numbers of spaces -
why would you allow this but reject leading or trailing space?) and
accepts all of OP's input:

[A-Z]+(( *& *| +)[A-Z]+)*



More information about the Python-list mailing list