Classical FP problem in python : Hamming problem

Wed Jan 26 18:19:39 EST 2005

Le mardi 25 Janvier 2005 19:52, Steven Bethard a écrit :

Thank you Nick and Steven for the idea of a more generic imerge.

To work with the Hamming problem, the imerge function _must_not_ allow 
duplicates and we can assume all of the specified iteratables are of the same 
size, i.e. infinite !

Therefore, Nick's solution fulfills the need.  But it is admittedly confusing 
to call the function "imerge" when it doesn't merge everything (including 
duplicates). Anyway both solution sheds new light and brings us a bit 
farther.

That's the beauty of many brains from all over the world collabarating.
Really, it makes me emotive thinking about it.

For the imerge function, what we really need to make the formulation clear is 
a way to look at the next element of an iteratable without consuming it. Or 
else, a way to put back "consumed" elements in the front an iteration flow, 
much like the list constructors in FP languages like Haskell.

It is simple to encapsulate an iterator inside another iterator class that 
would do just that. Here's one. The additional "fst" method returns the next 
element to consume without consuming it and the "isBottom" checks if there is 
something left to consume from the iteration flow, without actually consuming 
anything. I named the class "IteratorDeiterator" for lack of imagination :

-- BEGIN SNAP
class IteratorDeiterator:
  def __init__(self, iterator):
    self._iterator = iterator.__iter__()
    self._firstVal = None ## Avoid consuming if not requested from outside
                          ## Works only if iterator itself can't return None

  def __iter__(self): return self

  def next(self):
    valReturn = self._firstVal
    if valReturn is None:
      valReturn = self._iterator.next()
    self._firstVal = None
    return valReturn

  def fst(self):
    if self._firstVal is None:
      self._firstVal = self._iterator.next()
    return self._firstVal

  def isBottom(self):
    if self._firstVal is None:
      try: self._firstVal = self._iterator.next()
      except StopIteration:
        return True
    return False
-- END SNAP

Now the imerge_infinite which merges infinite lists, while allowing 
duplicates, is quite straightforward :

-- BEGIN SNAP
def imerge_infinite(*iterators):
  vItTopable = [IteratorDeiterator(it) for it in iterators]
  while vItTopable:
    yield reduce(lambda itSmallest, it: 
                   itSmallest.fst() > it.fst() and it or itSmallest, 
                   vItTopable).next()
-- END SNAP

To merge finite lists of possibly different lengths is a bit more work as you 
have to eliminate iterators that reached the bottom before the others. This 
is quite easily done by simply filtering them out. 
The imerge_finite function below merges "lists" of possibly different sizes.

-- BEGIN SNAP
def imerge_finite(*iterators):
  vItDeit = [IteratorDeiterator(it) for it in iterators]
  vItDeit = filter(lambda it: not it.isBottom(), vItDeit)
  while vItDeit:
    yield reduce(lambda itSmallest, it: 
                   itSmallest.fst() > it.fst() and it or itSmallest, 
                   vItDeit).next()
    vItDeit = filter(lambda it: not it.isBottom(), vItDeit)
-- END SNAP

Now, we want versions of these two merge functions that do not allow 
duplicates. Building upon what we've already done in a semi FP way, we just 
filter out the duplicates from the iteration streams returned by the above 
functions. The job is the same for the infinite and finite versions, hence 
the imerge_nodup generic function.

-- BEGIN SNAP
def imerge_nodup(fImerge, *iterators):
  it = fImerge(*iterators)
  el = it.next()
  yield el
  while True:
    el = dropwhile(lambda _next: _next == el, it).next()
    yield el

imerge_inf_nodup = \
  lambda *iterators: imerge_nodup(imerge_infinite, *iterators)
imerge_finite_nodup = \
  lambda *iterators: imerge_nodup(imerge_finite, *iterators)
-- END SNAP

I used the lambda notation for imerge_inf_nodup and imerge_finite_nodup to 
avoid another useless for-yield loop. Here the notation really just express 
function equivalence with a bounded argument (it would be great to have a 
notation for this in Python, i.e. binding a function argument to yield a new 
function).

The merge function to use with hamming() is imerge_inf_nodup.

Regards,

Francis Girard
FRANCE

> Nick Craig-Wood wrote:
> > Steven Bethard <steven.bethard at gmail.com> wrote:
> >> Nick Craig-Wood wrote:
> >>>Thinking about this some more leads me to believe a general purpose
> >>>imerge taking any number of arguments will look neater, eg
> >>>
> >>>def imerge(*generators):
> >>>    values = [ g.next() for g in generators ]
> >>>    while True:
> >>>        x = min(values)
> >>>        yield x
> >>>        for i in range(len(values)):
> >>>            if values[i] == x:
> >>>                values[i] = generators[i].next()
> >>
> >> This kinda looks like it dies after the first generator is exhausted,
> >> but I'm not certain.
> >
> > Yes it will stop iterating then (rather like zip() on lists of unequal
> > size). Not sure what the specification should be!  It works for the
> > hamming problem though.
>
> Actually, it stops iterating on lists of equal size too:
>
> py> def imerge(*iterators):
> ...     iterators = [iter(i) for i in iterators]
> ...     values = [i.next() for i in iterators]
> ...     while True:
> ...         x = min(values)
> ...         yield x
> ...         for i, val in enumerate(values):
> ...             if val == x:
> ...                 values[i] = iterators[i].next()
> ...
> py> list(imerge([1, 4, 7], [2, 5, 8], [3, 6, 9]))
> [1, 2, 3, 4, 5, 6, 7]
>
> Note that 8 and 9 are not in the list.
>
> Steve