In need of a binge-and-purge idiom
Alex Martelli
aleax at aleax.it
Mon Mar 24 06:57:44 EST 2003
Magnus Lie Hetland wrote:
Ah, I recognize the outline of our joint contribution to the
printed Cookbook (recipe 4.8...).
> I've noticed that I use the following in several contexts:
[fixing as per followups]
> chunk = []
> for element in iterable:
> if isSeparator(element) and chunk:
> doSomething(chunk)
> chunk = []
else: chunk.append(element)
> if chunk:
> doSomething(chunk)
> chunk = []
First refactoring that comes to mind is:
def maydosomething(chunk):
if chunk:
doSomething(chunk)
chunk[:] = []
chunk = []
for element in iterable:
if isSeparator(element): maydosomething(chunk)
else: chunk.append(element)
maydosomething(chunk)
but this wouldn't work for the specific use case you require:
> If the iterable above is a file, isSeparator(element) is simply
> defined as not element.strip() and doSomething(chunk) is
> yield(''.join(chunk)) you have a paragraph splitter. I've been using
i.e., factoring out a *yield* to maydosomething would NOT work.
So I'll focus on the specific case of yield in the following,
assuming a "munge" function such as
def munge(chunk): return ''.join(chunk)
is also passed as an argument.
> for element in iterable + separator:
> ...
>
> but that isn't possible, of course. (It could be possible with some
> fiddling with itertools etc., I guess.)
Indeed, there ain't much "fiddling" needed at all -- you just
DO need to know SOME acceptable separator, however:
import itertools
def chunkitup(iterable, isSeparator, aSeparator, munge=''.join):
# a sanity check never hurts...
assert isSeparator(aSeparator)
chunk = []
for element in itertools.chain(iterable, [aSeparator]):
if isSeparator(element):
yield munge(chunk)
chunk = []
else: chunk.append(element)
> If it were possible to check whether the iterator extracted from the
> iterable was at an end, that could help too -- but I see no elegant
> way of doing it.
Elegance is in the eye of the beholder, but...:
class iter_with_lookahead:
def __init__(self, iterable):
self.it = iter(iterable)
self.done = False
self.step()
def __iter__(self):
return self
def step(self):
try:
self.lookahead = self.it.next()
except StopIteration:
self.done = True
def next(self):
if self.done: raise StopIteration
result = self.lookahead
self.step()
return result
...I've had occasion to use variants of this in order to be able
to peek ahead, check if an iterator was done, or in small further
variants to give an iterator one level of "pushback", etc, etc.
So, if you have a wrapper such as this one around somewhere, you
might choose to reuse it (though it probably wouldn't be worth
developing for the sole purpose of this use!-):
def chunkitup1(iterable, isSeparator, munge=''.join):
chunk = []
it = iter_with_lookahead(iterable)
for element in it:
issep = isSeparator(element)
if not issep:
chunk.append(element)
if issep or it.done:
yield munge(chunk)
chunk = []
> I can't really see any good way of using the while/break idiom either,
Well, you COULD use a different wrapper class to obtain code such as:
def chunkitup2(iterable, isSeparator, munge=''.join):
wit = wild_thing(iterable, isSeparator)
while wit:
if wit.isSeparator() and wit.hasChunk():
yield munge(wit.getChunk())
but the wrapper wouldn't be all that nice under the covers AND it
would in practice have to embody a bit too much of the control
logic and bury it in a non-obvious place -- so I wouldn't pursue
this tack, myself.
Alex
More information about the Python-list
mailing list