Good use for itertools.dropwhile and itertools.takewhile

Terry Reedy tjreedy at udel.edu
Tue Dec 4 15:44:10 EST 2012


On 12/4/2012 8:57 AM, Nick Mellor wrote:

> I have a file full of things like this:
>
> "CAPSICUM RED fresh from Queensland"
>
> Product names (all caps, at start of string) and descriptions (mixed
> case, to end of string) all muddled up in the same field. And I need
> to split them into two fields. Note that if the text had said:
>
> "CAPSICUM RED fresh from QLD"
>
> I would want QLD in the description, not shunted forwards and put in
> the product name. So (uncontrived) list comprehensions and regex's
> are out.
>
> I want to split the above into:
>
> ("CAPSICUM RED", "fresh from QLD")
>
> Enter dropwhile and takewhile. 6 lines later:
>
> from itertools import takewhile, dropwhile
> def split_product_itertools(s):
 >   words = s.split()
 >   allcaps = lambda word: word == word.upper()
 >   product, description =\
 >       takewhile(allcaps, words), dropwhile(allcaps, words)
 >   return " ".join(product), " ".join(description)

If the original string has no excess whitespace, description is what 
remains of s after product prefix is omitted. (Py 3 code)

from itertools import takewhile
def allcaps(word): return word == word.upper()

def split_product_itertools(s):
     product = ' '.join(takewhile(allcaps, s.split()))
     return product, s[len(product)+1:]

print(split_product_itertools("CAPSICUM RED fresh from QLD"))
 >>>
('CAPSICUM RED', 'fresh from QLD')

Without that assumption, the same idea applies to the split list.

def split_product_itertools(s):
     words = s.split()
     product = list(takewhile(allcaps, words))
     return ' '.join(product), ' '.join(words[len(product):])

-- 
Terry Jan Reedy




More information about the Python-list mailing list