[Python-ideas] Pre-PEP: adding a statistics module to Python

Andrew Barnert abarnert at yahoo.com
Wed Aug 7 18:01:11 CEST 2013


On Aug 7, 2013, at 4:10, Oscar Benjamin <oscar.j.benjamin at gmail.com> wrote:

> On Aug 6, 2013 11:19 PM, "Andrew Barnert" <abarnert at yahoo.com> wrote:
> >
> > On Aug 6, 2013, at 12:44, Michele Lacchia <michelelacchia at gmail.com> wrote:
> >>
> >> Yes but then you lose all the advantages of iterators. What's the point in that?
> >> Furthermore it's not guaranteed that you can always converting an iterator into a list. As it has already been said, you could run out of memory, for instance.
> >
> > And the places where the stdlib/builtins do that automatic conversion--even when it's well motivated and almost always harmless once you think about it, like str.join--are surprising to most people. (Following up on str.join as an example, just about every question whose answer is str.join([...]) ends up with someone suggesting a genexpr instead of a listcomp, someone else explaining that it doesn't actually save any memory in that case, just wastes a bit of time, then some back and forth until everyone finally gets it.)
> >
> > The question is whether it would be even _more_ surprising to return an error, or a less accurate result. I don't know the answer to that.
> 
> I'm going to make the claim (with no supporting data) that more than 95% of the time, when a user calls variance(iterator) they will be guilty of premature optimisation.
> 
I think you're probably right. In the similar cases that come up with, e.g., str.join(iterator), there is usually no reason whatsoever to believe that any memory or speed cost will make any difference. Often people get into arguments over a half dozen strings (where, even if it _did_ matter, which it doesn't, N is so low that algorithmic complexity isn't even relevant).
> Really the cases where you can't build a collection are rare. People will still do it though just because it's satisfying to do everything with iterators in constant memory (I'm often guilty of this kind of thing).
> 
Or so that a sequence of operations can be pipelined, possibly leading to better cache behavior. Or just because iterators are the pythonic (or python3-ic?) way to do it.
> However unlike str.join there's no one pass algorithm that can be as accurate so it's not purely a performance question.
> 
But the point is that str.join doesn't use a one-pass algorithm, it just constructs a list so it can do it in two passes. And it's been suggested on this thread that variance could easily do the same thing.

So there are three choices. Using a one-pass algorithm would be surprising because it's less accurate. Automatic listification would be surprising because you went out of your way to pass lazy iterators around and variance broke the benefits. An exception would be surprising because almost every other function in the stdlib that takes lists also takes iterators, even when there are good reasons not to.

I think you still may be right that the error is the way to go. You'll learn the problem quickly, and the workaround will be obvious, and the reason for it will be available in the docs. The other two potential surprises may not be as discoverable.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130807/13debf24/attachment.html>


More information about the Python-ideas mailing list