Whittle it on down

Steven D'Aprano steve+comp.lang.python at pearwood.info
Thu May 5 02:04:13 EDT 2016


On Thursday 05 May 2016 14:58, DFS wrote:

> Want to whittle a list like this:
[...]
> Want to keep all elements containing only upper case letters or upper
> case letters and ampersand (where ampersand is surrounded by spaces)


Start by writing a function or a regex that will distinguish strings that 
match your conditions from those that don't. A regex might be faster, but 
here's a function version.

def isupperalpha(string):
    return string.isalpha() and string.isupper()

def check(string):
    if isupperalpha(string):
        return True
    parts = string.split("&")
    if len(parts) < 2:
        return False
    # Don't strip leading spaces from the start of the string.
    parts[0] = parts[0].rstrip(" ")
    # Or trailing spaces from the end of the string.
    parts[-1] = parts[-1].lstrip(" ")
    # But strip leading and trailing spaces from the middle parts
    # (if any).
    for i in range(1, len(parts)-1):
        parts[i] = parts[i].strip(" ")
     return all(isupperalpha(part) for part in parts)


Now you have two ways of filtering this. The obvious way is to extract 
elements which meet the condition. Here are two ways:

# List comprehension.
newlist = [item for item in oldlist if check(item)]

# Filter, Python 2 version
newlist = filter(check, oldlist)

# Filter, Python 3 version
newlist = list(filter(check, oldlist))


In practice, this is the best (fastest, simplest) way. But if you fear that 
you will run out of memory dealing with absolutely humongous lists with 
hundreds of millions or billions of strings, you can remove items in place:


def remove(func, alist):
    for i in range(len(alist)-1, -1, -1):
        if not func(alist[i]):
            del alist[i]


Note the magic incantation to iterate from the end of the list towards the 
front. If you do it the other way, Bad Things happen. Note that this will 
use less memory than extracting the items, but it will be much slower.

You can combine the best of both words. Here is a version that uses a 
temporary list to modify the original in place:

# works in both Python 2 and 3
def remove(func, alist):
    # Modify list in place, the fast way.
    alist[:] = filter(check, alist)




-- 
Steve




More information about the Python-list mailing list