Python syntax in Lisp and Scheme

Tue Oct 7 12:37:14 EDT 2003

Pascal Costanza wrote:

> David Eppstein wrote:
>> In article <blsbpf$i9n$1 at newsreader2.netcologne.de>,
>>  Pascal Costanza <costanza at web.de> wrote:
>> 
>>>I don't know a lot about Python, so here is a question. Is something
>>>along the following lines possible in Python?
>>>
>>>(with-collectors (collect-pos collect-neg)
>>>   (do-file-lines (l some-file-name)
>>>     (if (some-property l)
>>>       (collect-pos l)
>>>       (collect-neg l))))
>>>
>>>I actually needed something like this in some of my code...
>> 
>> Not using simple generators afaik.  The easiest way would probably be to
>> append into two lists:
>> 
>>     collect_pos = []
>>     collect_neg = []
>>     for l in some_file_name:
>>         if some_property(l):
>>             collect_pos.append(l)
>>         else:
>>             collect_neg.append(l)
> 
> ...but this means that
> 
> collect = []
> for l in some_file_name
>    if some_property:
>        collect.append(l)
> 
> ...is another solution for the single collector case. Now we have two

Of course, it is; it is by definition equivalent to the list comprehension:

    collect = [l for l in some_file_name if some_property(l)]

(you do need to call some_property with l as the argument, which you're
not doing above, but I think that's what you meant to do; also, I suspect
you want to loop on open(some_file_name) -- the lines of the file, as
opposed to its name -- but that's apparently a distraction by David).

> ways to do it. Isn't this supposed to be a bad sign in the context of
> Python? I am confused...

While in Python it's deemed _preferable_ that there be "one obvious way
to do it" (for any given "it"), it's a _preference_, and there are many
cases where it just can't practically hold (practicality beats purity).

It starts with the fact that both 2+3 and 3+2 are equally-obvious ways
to sum these two numbers -- it would be quite impractical to make Python
addition non-commutative to avoid the issue;-).

>> If you needed to do this a lot of times you could encapsulate it into a
>> function of some sort:
>> 
>> def posneg(filter,iter):
>>     results = ([],[])
>>     for x in iter:
>>         results[not filter(x)].append(x)
>>     return results
>> 
>> collect_pos,collect_neg = posneg(some_property, some_file_name)
> 
> What about dealing with an arbitrary number of filters?
   ...
>  > (predicate-collect '(-5 -4 -3 -2 -1 0 1 2 3 4 5)
>      (function evenp)
>      (lambda (n) (< n 0))
>      (lambda (n) (> n 3)))
> (-4 -2 0 2 4)
> (-5 -3 -1)
> (5)
> (1 3)

I think I would code this as follows:

def collect_by_first_predicate(finite_sequence, *predicates):
    # a utility predicate that's always satisfied
    def always(any): return True
    # len(predicates) + 1 lists to collect the various results
    results = [ [] for i in range(len(predicates)+1) ]
    collectors = [ (pred, result.append)
                   for pred, result in zip(predicates+(always,), results) ]
    for item in finite_sequence:
        for pred, collect in collectors:
            if pred(item):
                collect(item)
                break
    return results

print collect_by_first_predicate(range(-5, 6),
    lambda n: n%2==0,
    lambda n: n<0,
    lambda n: n>3)

this does emit, as desired,

[[-4, -2, 0, 2, 4], [-5, -3, -1], [5], [1, 3]]

however, this approach has some very serious limitations for certain
applications: in particular, both the sequence AND the number of
predicates absolutely HAVE to be finite.  There even isn't any way
to _express_ an infinite (unbounded) family of predicates as separate
arguments to a function; so if you wanted to use such an approach to
classify the items of finite_sequence by a family of predicates where, 
e.g., pred(i) asserts that the item is >10**i, for all i>0, you could not
use this approach (nor, I suspect, could you use your predicate-collect
approach for such purposes -- or am I missing something?).  To accept
an infinite family of predicates I would have to change the arguments
(so e.g. the second would be an iterator for said infinite family) to
start with.  The first 'sequence' argument COULD syntactically be an
infinite (unbounded) iterator, e.g. itertools.count() [the sequence
of all naturals, 0 upwards] -- but then the "for item in finite_sequence:"
loop inside the function would not terminate.

Pushing aside, for the moment, the issue of infinite numbers of
predicates, let's consider how best to deal with a finite number N
of predicates applied to a potentially unbounded/infinite sequence.
As such sequences, in Python, are represented by iterators, we
will presumably need to return N+1 iterators to represent the
"subsequences" -- one per predicate plus the "none of the above" one.

This is actually still somewhat tricky in terms of definition.
Suppose the predicates were, e.g.:
    lambda n: n>0
    lambda n: n<0
and the unbounded sequence contained no 0's.  Then, if and when
the called tried to get the "next item" of the "none of the above"
(third) returned iterator -- that would never return, forevermore
looping on the input sequence and looking uselessly for the 0
that would escape both of the predicates.  Alas, lazy computation
on unbound sequences DOES do this kind of things to you, at times.
As we can't solve the halting problem, we can't prevent it either,
in general.  So, I think we just need to live with the possibility
that any of the returned iterators may "hang" forever, depending
on the predicates and the (unbounded) input sequence.

Apart from this problem, there's another interesting issue.  When
the .next method gets called on any of the iterators we return,
which has no knowledge yet of its next item, we'll go through the
next items of the input sequence -- but that will produce items
to be returned as part of OTHER iterators, in general, before it
yields the next item for "this one".  So, SOME mechanism must
remember those for the future (inevitably this may require more
memory than we have, in which case, ka-boom, but this is just a
finite-machine special case of the above paragraph, in a way;-).

The natural way to do this in Python is to use class instances,
one per returned iterator, giving each of them a list to store
the "forthcoming" items of the corresponding iterator.  Might as
well make the instances the iterators themselves (just give them
the .next method, and a "def __iter__(self): return self" to
mark them as iterators).

def collect_by_first_predicate(sequence, *predicates):
    seq_iter = iter(sequence)
    def getanother():
        item = seq_iter.next()
        for collector in collectors:
            if collector.pred(item):
                collector.append(item)
                return
    class prediter(object):
        def __iter__(self): return self
        def __init__(self, pred):
            self.memory = []
            self.append = self.memory.append
            self.pred = pred
        def next(self):
            while not self.memory: getanother()
            return self.memory.pop(0)
    def always(any): return True
    collectors = [ prediter(pred) for pred in predicates+(always,) ]
    return collectors

print map(list, collect_by_first_predicate(range(-5, 6),
    lambda n: n%2==0,
    lambda n: n<0,
    lambda n: n>3)
    )

This, too, emits [[-4, -2, 0, 2, 4], [-5, -3, -1], [5], [1, 3]] , of
course (we do have to turn the returned iterators into lists to be
able to print them easily, but, here, we do know they're finite:-).

Of course, both of these implementations can easily be turned into
a collect_by_all_predicates (where an item is ranged in all collectors
corresponding to a predicate it satisfies, rather than in just one
of them) by simply omitting the return 'cut' after the collector.append
or collect(item) call in the respective for loop over collectors.

Back to the issue of potentially unbounded families of predicates
(and, thus, collectors), I think that sits badly with the concept of a
'catch-all predicate' -- if the family of predicates IS indeed
unbounded, the catch-all will never come into play.  And I'm not
sure I want to _return_ the resulting potentially unbounded family
of collectors as an iterator, necessarily -- I see use cases for
an 'indexable' ("random-access", so to speak, vs the 'sequentially
accessed' nature of iterators) return value.  So I've cobbled
together a sequence class that's basically a potentially expandable
list -- you can change the "return collectors" to
"return iter(collectors)" if you disagree with my qualms, of course.
So, here's a first sketch at a possible approach...:

class pseudo_seq_iter(object):
    def __iter__(self): return self
    def __init__(self, seq):
        self.seq = seq
        self.i = -1
    def next(self):
        self.i += 1
        try: return self.seq[self.i]
        except IndexError: raise StopIteration

def collect_by_first_predicate(sequence, predicates):
    seq_iter = iter(sequence)
    pred_iter = iter(predicates)
    def always(any): return True

    class collectors_sequence(list):
        def __getitem__(self, i):
            if i>=0:
                while i>=len(self) and (not self or self[-1].pred is not
always):
                    addapred()
                if i>=len(self): raise IndexError, i
            return list.__getitem__(self, i)
        def __iter__(self): return pseudo_seq_iter(self)
    collectors = collectors_sequence()

    def addapred():
        try: pred = pred_iter.next()
        except StopIteration: pred = always
        collectors.append(prediter(pred))

    def getanother():
        item = seq_iter.next()
        for collector in collectors:
            if collector.pred(item):
                collector.append(item)
                return

    class prediter(object):
        def __iter__(self): return self
        def __init__(self, pred):
            self.memory = []
            self.append = self.memory.append
            self.pred = pred
        def next(self):
            while not self.memory: getanother()
            return self.memory.pop(0)
    return collectors

print map(list, collect_by_first_predicate(range(-5, 6),
        [ lambda n: n%2==0,
          lambda n: n<0,
          lambda n: n>3
        ]
    ))

class divisible_by(object):
    def __init__(self, maxn=None):
        self.maxn = maxn
    def __getitem__(self, i):
        if i<0 or (self.maxn and i>self.maxn): raise IndexError, i
        return lambda x: (x%(i+2)) == 0
    def __iter__(self): return pseudo_seq_iter(self)

divisibles = collect_by_first_predicate(range(30), divisible_by(100))
for i in range(2,18):
    result = list(divisibles[i-2])
    if result: print i, result

I _have_ put some arbitrary limit on the divisible_by instance I
pass in the example, because, otherwise (as previously indicated)
trying to loop on an empty iterator among the 'divisibles' might
not terminate.  In this case, this can be because the "for collector
in collectors" loop in 'getanother' never terminates, if predicates
doesn't, and some item never satisfies any predicate; or because
the 'while not self.memory' loop in prediter.next does not terminate,
if 'getanother' can never find an item to add to that collector's
memory, and the sequence does not terminate either; i.e., we have
two possible 'infinities' to cause trouble here.  Anyway, this emits:

[alex at lancelot alex]$ python ac.py
[[-4, -2, 0, 2, 4], [-5, -3, -1], [5], [1, 3]]
2 [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28]
3 [3, 9, 15, 21, 27]
5 [5, 25]
7 [7]
11 [11]
13 [13]
17 [17]

which seems correct.  Still, I think this could be enhanced to
avoid _some_ of the avoidable nontermination cases -- e.g. by
letting getanother know _what_ generator is hungry for more,
it can avoid making more generators after that one, and just
stash away as-yet-unclassified items for possible future needs.

I don't particularly like the following, because the management
of stashed_away is finicky and error-prone, but, as a first cut,
it does work to refactor getanother into:

    def try_collecting(item, for_whom):
        for collector in collectors:
            if collector.pred(item):
                collector.append(item)
                return collector
            elif collector is for_whom:
                return None

    stashed_away = []
    def getanother(for_whom):
        for item in stashed_away[:]:
            put_where = try_collecting(item, for_whom)
            if put_where is not None:
                stashed_away.remove(item)
                if put_where is for_whom: return
        for item in seq_iter:
            put_where = try_collecting(item, for_whom)
            if put_where is None:
                stashed_away.append(item)
            elif put_where is for_whom: return
        raise StopIteration

and change the statement calling getanother (in prediter.next) to:

            while not self.memory: getanother(for_whom=self)

This lets us change the divisibles creation, as intended, to:

divisibles = collect_by_first_predicate(range(30), divisible_by())

i.e., with the family of predicates being unbounded (as long as
the sequence isn't also unbounded at the same time), while still
allowing attempts to loop on empty items of divisibles.

If I were to work on this further, I think I'd to so by wrapping
seq_iter into an "iterator with buffering for 'rejected for now,
show again later' items" class instance; encapsulating the
"stashing away" inside said class would simplify getanother again,
while retaining the essential behavior.

Alex