How to do this with groupby (or otherwise)? (Was: iterblocks cookbook example)

Gerard Flanagan grflanagan at yahoo.co.uk
Mon Jun 4 07:52:34 EDT 2007


On Jun 2, 10:47 pm, Raymond Hettinger <pyt... at rcn.com> wrote:
> On Jun 2, 10:19 am, Steve Howell <showel... at yahoo.com> wrote:
>
> > George Sakkis produced the following cookbook recipe,
> > which addresses a common problem that comes up on this
> > mailing list:
>
> ISTM, this is a common mailing list problem because it is fun
> to solve, not because people actually need it on a day-to-day basis.
>
> In that spirit, it would be fun to compare several different
> approaches to the same problem using re.finditer, itertools.groupby,
> or the tokenize module.  To get the ball rolling, here is one variant:
>
> from itertools import groupby
>
> def blocks(s, start, end):
>     def classify(c, ingroup=[0], delim={start:2, end:3}):
>         result = delim.get(c, ingroup[0])
>         ingroup[0] = result in (1, 2)
>         return result
>     return [tuple(g) for k, g in groupby(s, classify) if k == 1]
>
> print blocks('the <quick> brown <fox> jumped', start='<', end='>')
>
> One observation is that groupby() is an enormously flexible tool.
> Given a well crafted key= function, it makes short work of almost
> any data partitioning problem.
>

Can anyone suggest a function that will split text by paragraphs, but
NOT if the paragraphs are contained within a [quote]...[/quote]
construct.  In other words, the following text should yield 3 blocks
not 6:

TEXT = '''
Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
Pellentesque dolor quam, dignissim ornare, porta et,
auctor eu, leo. Phasellus malesuada metus id magna.

[quote]
Only when flight shall soar
not for its own sake only
up into heaven's lonely
silence, and be no more

merely the lightly profiling,
proudly successful tool,
playmate of winds, beguiling
time there, careless and cool:

only when some pure Whither
outweighs boyish insistence
on the achieved machine

will who has journeyed thither
be, in that fading distance,
all that his flight has been.
[/quote]

Integer urna nulla, tempus sit amet, ultrices interdum,
rhoncus eget, ipsum. Cum sociis natoque penatibus et
magnis dis parturient montes, nascetur ridiculus mus.
'''

Other info:

* don't worry about nesting
* the [quote] and [/quote] musn't be stripped.

Gerard




More information about the Python-list mailing list