[Python-ideas] Integrate some itertools into the Python syntax

Wed Mar 23 17:46:14 EDT 2016

> On Mar 23, 2016, at 12:24, Michael Selik <mike at selik.org> wrote:
> 
>> On Wed, Mar 23, 2016 at 1:39 PM Andrew Barnert <abarnert at yahoo.com> wrote:
>> On Mar 23, 2016, at 10:13, Michael Selik <mike at selik.org> wrote:
>> >
>> >> On Mar 23, 2016, at 6:37 AM, Chris Angelico <rosuav at gmail.com> wrote:
>> >>
>> >> On Wed, Mar 23, 2016 at 9:04 PM, Michel Desmoulin
>> >> <desmoulinmichel at gmail.com> wrote:
>> >>>> (Whether or not to make slice notation usable outside subscript
>> >>>> operations could then be tackled as an independent question)
>> >>>>
>> >>>> For itertools.chain, it may make sense to simply promote it to the builtins.
>> >>>
>> >>> Same problem as with new keywords : it can be a problem with people
>> >>> using chain as a var name.
>> >>
>> >> Less of a problem though - it'd only be an issue for people who (a)
>> >> use chain as a var name, and (b) want to use the new shorthand. Their
>> >> code will continue to work identically with itertools.chain (or not
>> >> using it at all). With a new keyword, their code would instantly fail.
>> >
>> > I enjoy using ``chain`` as a variable name. It has many meanings outside of iteration tools. Three cheers for namespaces.
>> 
>> As Chris just explained in the message you're replying to, this wouldn't affect you. I've used "vars" and "compile" and "dir" as local variables many times; as long as I don't need to call the builtin in the same scope, it's not a problem. The same would be true if you keep using "chain". Unless you want to chain iterables of your chains together, it'll never arise.
> 
> Depends what you mean by "affect". It'll affect how I read my colleagues' code. I want to see ``from itertools import chain`` at the top of their modules if they're using chain in the sense of chaining iterables.

The advantage of having a small set of builtins is that you know the entire set of builtins. If chain really is useful enough that it belongs as a builtin, you will very quickly adapt to reading that code, and it won't bother you or slow down your comprehension at all, any more than any other builtins do. Adding dozens of builtins would break that; adding one wouldn't. 

There's only room for a handful more builtins in the entire future life of Python, and the question of whether chain deserves to be one of them is a big question (and I suspect the answer is no). But I think it's the only real question, and adding more on top of it doesn't really help us get to the answer.

> Some evidence...

Agreed. And again, my own anecdotal experience is that chain.from_iterable is actually used (or sadly missed) more than chain itself, especially among novices, so I'm not advocating making chain a builtin unless someone proves me wrong on that.

>> Still, I like adding chain (and/or flatten) to builtins a lot more than I like adding sequence behavior to some iterators, or adding a whole new kind of function-like slicing syntax to all iterables, or any of the other proposals here.
> 
> I like the LazyList you wrote. If the need for operator-style slicing on iterators were great, I think we'd find a decent amount of usage of such a wrapper. Its absence is evidence that ``from itertools import islice`` is more pleasant.

The one on my blog? I've never found a use for that in real code[0], hence why (IIRC) I never even put it on PyPI. But I use range all the time, and various other lazy sequences. The problem with LazyList isn't that it's lazy, but that it's a recursive cons-like structure, which is ultimately a different way to solve (mostly) the same set of problems that Python already has one obvious solution for (iterators), and an ill-fitting one at that (given that Python discourages unnecessary recursive algorithms).

I've also written a more Python-style lazy list that wraps up caching an iterator in a list-like object. It's basically just a list and an iterator, along with a method to self.lst.extend(islice(self.it, index - len(self.lst)). The point of that is to show how easy it is to write, but how many different API decisions come up that could be reasonably resolved either way, so there's probably no one-size-fits-all design. Which implies that if there are apps that need something like that, they probably write it themselves.

Anyway, if your point is that iterators having slicing would be a bad thing, I agree with you.

But that doesn't necessarily mean that chain and islice as they are today is the best possible answer, just that it's better than adding operators to the iterator type (even if there were such a thing as "the iterator type", which there isn't). Using islice can still be clumsy, and I'm happy to see what alternatives or improvements people come up with.

> As has been discussed many times, lazy sequences like range cause some confusion by providing both generator behavior and getitem/slicing. Perhaps it's better to keep the two varieties of iterables, lazy and non-lazy, more firmly separated by having ``lst[a:b]`` used exclusively for true sequences and ``islice(g, a, b)`` for generators.

Definitely not. Even if breaking range and friends weren't a huge backward compat issue, it would weaken the language. And, meanwhile, you would gain absolutely nothing. You'd still have sets and dicts and third-party sorted trees and list iterators and itertools iterators and key views and so on, none of which are either sequences or generators. (Also, think about this parallel: do you want to say that dict key views shouldn't have set operators because they're lazy and therefore not "real sets"? If not, what's the difference?)

The problem isn't that Python has things that are neither sequences nor generators, it's that the Python community has people who think it doesn't. Any time someone says range is (like) a generator, or any similar misleading myth, they need to be corrected with extreme prejudice before they set another batch of novices up for confusion.

So, again: range does not provide anything like "generator behavior". It's not only repeatably iterable, it's also randomly-accessible, container-testable, reversible, and all the other things that are true of sequences.

> Just yesterday I got frustrated by the difference of return types from subscripting bytes versus str while trying to keep my code 2/3 compatible:

That's a completely separate problem. I think everyone agrees that there are design mistakes in the bytes class. As of 3.5, most of the ones that can be fixed have been, but some of them we're just unfortunately stuck with. None of those problems have to do with the iteration or sequence protocols.

> A couple questions to help clarify the situation:
> 1. Do you prefer ``a.union(b)`` or ``a | b``?

When I'm doing stuff that's inherently mathematical-set-like, I use the operator. When I'm using sets purely as an optimization, I use the method.[1]

But how does that relate to this thread? I think the implied assumption in the proposal is that people want to do "inherently sequence-like stuff" with iterators; if so, they _should_ want to spell it with brackets. I think the problem is that they're wrong to want that, not that they're trying to spell it wrong.

> 2. Do you prefer ``datetime.now() + timedelta(days=5)`` or ``5.days.from_now``?

Of course the former.[2] But the problem with "5.days" is nothing to do with this discussion. It's that it implies a bizarre ontology where natural numbers are things that know about timespans as intimately as they know about counting,[3] which is ridiculous.[4] If you really wanted numbers to know how to do "inherently day-like stuff", the Ruby way would make sense; you just shouldn't want that.

---

[0]: I have found uses for it in comparing translated Lisp/ML/Haskell idioms to native Python ones, or in demonstrating things to other people, which is why I wrote it in the first place.

[1]:  Or sometimes even the classmethod set.union(a, b). In fact, sometimes I even write it out more explicitly--e.g., {x for x in a if x in b} in place of a & b emphasizes that I could easily change the {...} to [...] if I need to preserve order or duplicates in a.

[2]: I like now() + days(5), or maybe even now() + 5*days, even more. But that's easy to add myself without modifying either int or timedelta, and without confusing my readers.

[3]: Or, only slightly less accurately, it's like saying "5's days" instead of "5 days" in English.

[4]: The Ruby answer to that is that writing an application is really about writing an app-specific language that lets you write the app itself a trivial single-page program, and in an OO context, that means extending types in app-specific ways. The number 5 in general doesn't know how to construct a timespan, but the number 5 in the context of a calendaring web application does. That's an arguably reasonable stance (Paul Graham has some great blog posts making the non-OO version of the same argument for Lisp/Arc), but it's blatantly obvious that it's directly opposed to the Python stance that the language should be simple enough that we can all immediately read each other's code. Python doesn't encourage indefinitely expanding core types for the same reason it doesn't encourage writing your own flow control statements.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160323/42605230/attachment.html>