Good use for itertools.dropwhile and itertools.takewhile
Neil Cerutti
neilc at norwich.edu
Tue Dec 4 13:26:07 EST 2012
On 2012-12-04, Nick Mellor <thebalancepro at gmail.com> wrote:
> I love the way you guys can write a line of code that does the
> same as 20 of mine :)
>
> I can turn up the heat on your regex by feeding it a null
> description or multiple white space (both in the original
> file.) I'm sure you'd adjust, but at the cost of a more complex
> regex.
A re.split should be able to handle this without too much hassle.
The simplicity of my two-line version will evaporate pretty
quickly to compensate for edge cases.
Here's one that can handle one of the edge cases you mention, but
it's hardly any shorter than what you had, and it doesn't
preserve non-standard whites space, like double spaces.
def prod_desc(s):
"""split s into product name and product description. Product
name is a series of one or more capitalized words followed
by white space. Everything after the trailing white space is
the product description.
>>> prod_desc("CAR FIFTY TWO Chrysler LeBaron.")
['CAR FIFTY TWO', 'Chrysler LeBaron.']
"""
prod = []
desc = []
target = prod
for word in s.split():
if target is prod and not word.isupper():
target = desc
target.append(word)
return [' '.join(prod), ' '.join(desc)]
When str methods fail I'll usually write my own parser before
turning to re. The following is no longer nice looking at all.
def prod_desc(s):
"""split s into product name and product description. Product
name is a series of one or more capitalized words followed
by white space. Everything after the trailing white space is
the product description.
>>> prod_desc("CAR FIFTY TWO Chrysler LeBaron.")
['CAR FIFTY TWO', 'Chrysler LeBaron.']
>>> prod_desc("MR. JONESEY Saskatchewan's finest")
['MR. JONESEY', "Saskatchewan's finest"]
"""
i = 0
while not s[i].islower():
i += 1
i -= 1
while not s[i].isspace():
i -= 1
start_desc = i+1
while s[i].isspace():
i -= 1
end_prod = i+1
return [s[:end_prod], s[start_desc:]]
--
Neil Cerutti
More information about the Python-list
mailing list