[Python-ideas] Integrate some itertools into the Python syntax

Chris Angelico rosuav at gmail.com
Mon Mar 21 19:36:31 EDT 2016


On Tue, Mar 22, 2016 at 10:06 AM, Michel Desmoulin
<desmoulinmichel at gmail.com> wrote:
> Itertools is great, and some functions in it are more used than others:
>
> - islice;
> - chain;
> - dropwhile, takewhile;
>...
>
> The changes I'm going to propose do not add new syntax to Python, but
> yet would streamline the use of this nice tool and blend it into the
> language core.

You're not the first to ask for something like this :) Let's get
*really* specific about semantics, though - and particularly about the
difference between iterables, iterators, and generators.

> Make slicing accept callables
> =============================
>
> So my first proposal is to be able to do:
>
> def stop(element):
>     return element > 4
> print(numbers[:stop])
>
> It's quite pythonic, easy to understand : the end of the slice is when
> this condition is met. Any not the strange way takewhile work, which is
> "carry on as long as this condition is met".
>
> We could also extend itertools.islice to accept such parameter.

This cannot be defined for arbitrary iterables, unless you're
proposing to mandate it in some way. (It conflicts with the way a list
handles slicing, for instance.) Even for arbitrary iterators, it may
be quite tricky (since iterators are based on a protocol, not a type);
but maybe it would be worth proposing an "iterator mixin" that handles
this for you, eg:

class IteratorOperations:
    def __getitem__(self, thing):
        if isinstance(thing, slice):
            if has_function_in_criteria(slice): return
self.takeuntil(s.start, s.stop)
            return itertools.islice(...)
    def takeuntil(self, start, stop):
        val = next(self)
        while start is not None and not start(val):
            val = next(self)
        while stop is None or not stop(val):
            yield val
            val = next(self)

As long as you inherit from that, you get these operations made
available to you.

Now, if you're asking this about generators specifically, then it
might be possible to add this (since all generators are of the same
type). It wouldn't be as broad as the itertools functions (which can
operate on any iterable), but could be handy if you do a lot with
gens, plus it's hard to subclass them.

> Slicing any iterable
> ======================
>
> So the second proposal is to allow:
>
> def func_accepting_any_iterable(foo):
>     return bar(foo[3:7])
>
> The slicing would then return a list if it's a list, a typle if it's a
> tuple, and a islice(generator) if it's a generator. If somebody uses a
> negative index, it would then raises a ValueError like islice.
>
> This would make duck typing and iteration even easier in Python.

Again, while I am sympathetic to the problem, it's actually very hard;
islice always returns the same kind of thing, but slicing syntax can
return all manner of different things, because it's up to the object
on the left:

>>> range(10)[3:7]
range(3, 7)
>>> "Hello, world!"[3:7]
'lo, '
>>> [1, 4, 2, 8, 5, 7, 1, 4, 2, 8, 5, 7][3:7]
[8, 5, 7, 1]
>>> memoryview(b"Hello, world!")[3:7]
<memory at 0x7fb98dab3f48>

You don't want these to start returning islice objects. You mentioned
lists, but other types will also return themselves when sliced.

Possibly the solution here is actually to redefine object.__getitem__?
Currently, it simply raises TypeError - not subscriptable. Instead, it
could call iter() on itself, and then attempt to islice it. That would
mean that the TypeError would change to "is not iterable"
(insignificant difference), anything that already defines __getitem__
will be unaffected (good), and anything that's iterable but not
subscriptable would automatically islice itself (potentially a trap,
if people don't know what they're doing).

> Chaining iterable
> ==================
>
> Iterating on heterogenous iterable is not clear.
>
> You can add lists with lists and tuples with tuples, but if you need
> more, then you need itertools.chain. Few people know about it, so I
> usually see duplicate loops and conversion to lists/tuples.
>
> So My first proposal is to overload the "&" operator so that anything
> defining __iter__ can be used with it.
>
> Then you can just do:
>
> chaining = "abc" & [True, False] & (x * x for x in range(10))
> for element in chaining:
>     print(element)
>
> Instead of:
>
> from itertools import chain
> chaining = chain("abc", [True, False], (x * x for x in range(10)))
> for element in chaining:
>     print(element)

Again, anything involving operators is tricky, since anything can
override its handling. But if you require that the first one be a
specific iterator class, you can simply add __and__ to it to do what
you want:

class iter:
    iter = iter # snapshot the default 'iter'
    def __init__(self, *args):
        self.iter = self.iter(*args) # break people's minds
    def __iter__(self): return self
    def __next__(self): return next(self.iter)
    def __and__(self, other):
        yield from self.iter
        yield from other

Okay, so you'd probably do it without the naughty bits, but still :)
As long as you call iter() on the first thing in the chain, everything
else will work.

ChrisA


More information about the Python-ideas mailing list