Whittle it on down

Thu May 5 01:53:54 EDT 2016

DFS writes:

. .

> Want to keep all elements containing only upper case letters or upper
> case letters and ampersand (where ampersand is surrounded by spaces)
>
> Is it easier to extract elements meeting those conditions, or remove
> elements meeting the following conditions:
>
> * elements with a lower-case letter in them
> * elements with a number in them
> * elements with a period in them
>
> ?
>
>
> So far all I figured out is remove items with a period:
> newlist = [ x for x in oldlist if "." not in x ]
>

Either way is easy to approximate with a regex:

import re
upper = re.compile(r'[A-Z &]+')
lower = re.compile(r'[^A-Z &]')
print([datum for datum in data if upper.fullmatch(datum)])
print([datum for datum in data if not lower.search(datum)])

I've skipped testing that the ampersand is between spaces, and I've
skipped the period. Adjust.

This considers only ASCII upper case letters. You can add individual
letters that matter to you, or you can reach for the documentation to
find if there is some generic notation for all upper case letters.

The newer regex package on PyPI supports POSIX character classes like
[:upper:], I think, and there may or may not be notation for Unicode
character categories in re or regex - LU would be Letter, Uppercase.