Good use for itertools.dropwhile and itertools.takewhile
Nick Mellor
thebalancepro at gmail.com
Tue Dec 4 10:24:07 EST 2012
I love the way you guys can write a line of code that does the same as 20 of mine :)
I can turn up the heat on your regex by feeding it a null description or multiple white space (both in the original file.) I'm sure you'd adjust, but at the cost of a more complex regex.
Meanwhile takewith and dropwith are behaving themselves impeccably but my while loop has fallen over.
Best,
Nick
On Wednesday, 5 December 2012 01:31:48 UTC+11, Vlastimil Brom wrote:
> 2012/12/4 Nick Mellor <thebalancepro at gmail.com>:
>
> > Hi,
>
> >
>
> > I came across itertools.dropwhile only today, then shortly afterwards found Raymond Hettinger wondering, in 2007, whether to drop [sic] dropwhile and takewhile from the itertools module.
>
> >
>
> > Fate of itertools.dropwhile() and itertools.takewhile() - Python
>
> > bytes.com
>
> > http://bit.ly/Vi2PqP
>
> >
>
> > Almost nobody else of the 18 respondents seemed to be using them.
>
> >
>
> > And then 2 hours later, a use case came along. I think. Anyone have any better solutions?
>
> >
>
> > I have a file full of things like this:
>
> >
>
> > "CAPSICUM RED fresh from Queensland"
>
> >
>
> > Product names (all caps, at start of string) and descriptions (mixed case, to end of string) all muddled up in the same field. And I need to split them into two fields. Note that if the text had said:
>
> >
>
> > "CAPSICUM RED fresh from QLD"
>
> >
>
> > I would want QLD in the description, not shunted forwards and put in the product name. So (uncontrived) list comprehensions and regex's are out.
>
> >
>
> > I want to split the above into:
>
> >
>
> > ("CAPSICUM RED", "fresh from QLD")
>
> >
>
> > Enter dropwhile and takewhile. 6 lines later:
>
> >
>
> > from itertools import takewhile, dropwhile
>
> > def split_product_itertools(s):
>
> > words = s.split()
>
> > allcaps = lambda word: word == word.upper()
>
> > product, description = takewhile(allcaps, words), dropwhile(allcaps, words)
>
> > return " ".join(product), " ".join(description)
>
> >
>
> >
>
> > When I tried to refactor this code to use while or for loops, I couldn't find any way that felt shorter or more pythonic:
>
> >
>
> > (9 lines: using for)
>
> >
>
> > def split_product_1(s):
>
> > words = s.split()
>
> > product = []
>
> > for word in words:
>
> > if word == word.upper():
>
> > product.append(word)
>
> > else:
>
> > break
>
> > return " ".join(product), " ".join(words[len(product):])
>
> >
>
> >
>
> > (12 lines: using while)
>
> >
>
> > def split_product_2(s):
>
> > words = s.split()
>
> > i = 0
>
> > product = []
>
> > while 1:
>
> > word = words[i]
>
> > if word == word.upper():
>
> > product.append(word)
>
> > i += 1
>
> > else:
>
> > break
>
> > return " ".join(product), " ".join(words[i:])
>
> >
>
> >
>
> > Any thoughts?
>
> >
>
> > Nick
>
> > --
>
> > http://mail.python.org/mailman/listinfo/python-list
>
>
>
> Hi,
>
> the regex approach doesn't actually seem to be very complex, given the
>
> mentioned specification, e.g.
>
>
>
> >>> import re
>
> >>> re.findall(r"(?m)^([A-Z\s]+) (.+)$", "CAPSICUM RED fresh from QLD\nCAPSICUM RED fresh from Queensland")
>
> [('CAPSICUM RED', 'fresh from QLD'), ('CAPSICUM RED', 'fresh from Queensland')]
>
> >>>
>
>
>
> (It might be necessary to account for some punctuation, whitespace etc. too.)
>
>
>
> hth,
>
> vbr
More information about the Python-list
mailing list