[Python-ideas] __len__() for map()

E. Madison Bray erik.m.bray at gmail.com
Thu Nov 29 06:13:56 EST 2018


On Wed, Nov 28, 2018 at 8:54 PM Terry Reedy <tjreedy at udel.edu> wrote:
>
> On 11/28/2018 9:27 AM, E. Madison Bray wrote:
> > On Mon, Nov 26, 2018 at 10:35 PM Kale Kundert <kale at thekunderts.net> wrote:
> >>
> >> I just ran into the following behavior, and found it surprising:
> >>
> >>>>> len(map(float, [1,2,3]))
> >> TypeError: object of type 'map' has no len()
> >>
> >> I understand that map() could be given an infinite sequence and therefore might not always have a length.  But in this case, it seems like map() should've known that its length was 3.  I also understand that I can just call list() on the whole thing and get a list, but the nice thing about map() is that it doesn't copy data, so it's unfortunate to lose that advantage for no particular reason.
> >>
> >> My proposal is to delegate map.__len__() to the underlying iterable.
>
> One of the guidelines in the Zen of Python is
> "Special cases aren't special enough to break the rules."

This seems to be replying to the OP, whom I was quoting.  On one hand
I would argue that this is cherry-picking the "Zen" since not all
rules are special in the first place.  But in this case I agree that
map should not have a length or possibly even a length hint (although
the latter is more justifiable).

> > As a simple counter-proposal which I believe has fewer issues, I would
> > really like it if the built-in `map()` and `filter()` at least
> > provided a Python-level attribute to access the underlying iterables.
>
> This proposes to make map (and filter) special in a different way, by
> adding other special (dunder) attributes.  In general, built-in
> callables do not attach their args to their output, for obvious reasons.
>   If they do, they do not expose them.  If input data must be saved, the
> details are implementation dependent.  A C-coded callable would not
> necessarily save information in the form of Python objects.

Who said anything about "special", or adding "special (dunder)
attributes"?  Nor did I make any general statement about all
built-ins.  For arbitrary functions it doesn't necessarily make sense
to hold on to their arguments, but in the case of something like map()
its arguments are the only thing that give it meaning at all: The fact
remains that for something like a map in particular it can be treated
in a formal sense as a collection of a function and some sequence of
arguments (possibly unbounded) on which that function is to be
evaluated (perhaps not immediately).  As an analogy, a series in an
object in its own right without having to evaluate the entire series:
lots of information can be gleaned from the properties of a series
without having to evaluate it.  Just because you don't see the use
doesn't mean others can't find one.

The CPython map() implementation already carries this data on it as
"func" and "iters" members in its struct.  It's trivial to expose
those to Python as ".funcs" and ".iters" attributes.  Nothing
"special" about it.  However, that brings me to...

> https://docs.python.org/3/library/functions.html#map says
> "map(function, iterable, ...)
>      Return an iterator [...]"
>
> The wording is intentional.  The fact that map is a class and the
> iterator an instance of the class is a CPython implementation detail.
> Another implementation could use the generator function equivalent given
> in the Python 2 itertools doc, or a translation thereof.  I don't know
> what pypy and other implementations do.  The fact that CPython itertools
> callables are (now) C-coded classes instead Python-coded generator
> functions, or C translations thereof (which is tricky) is for
> performance and ease of maintenance.

Exactly how intentional is that wording though?  If it returns an
iterator it has to return *some object* that implements iteration in
the manner prescribed by map.  Generator functions could theoretically
allow attributes attached to them.  Roughly speaking:

def map(func, *iters):
    def map_inner():
        for args in zip(*iters):
            yield func(*args)

    gen = map_inner()
    gen.func = func
    gen.iters = iters

    return gen

As it happens this won't work in CPython since it does not allow
attribute assignment on generator objects.  Perhaps there's some good
reason for that, but AFAICT--though I may be missing a PEP or
something--this fact is not prescribed anywhere and is also particular
to CPython.  Point being, I don't think it's a massive leap or
imposition on any implementation to go from "Return an iterator [...]"
to "Return an iterator that has these attributes [...]"




P.S.

> > This is necessary because if I have a function that used to take, say,
> > a list as an argument, and it receives a `map` object, I now have to
> > be able to deal with map()s,
>
> If a function is documented as requiring a list, or a sequence, or a
> length object, it is a user bug to pass an iterator.  The only thing
> special about map and filter as errors is the rebinding of the names
> between Py2 and Py3, so that the same code may be good in 2.x and bad in
> 3.x.

It's not a user bug if you're porting a massive computer algebra
application that happens to use Python as its implementation language
(rather than inventing one from scratch) and your users don't need or
want to know too much about Python 2 vs Python 3.  Besides, the fact
that they are passing an iterator now is probably in many cases a good
thing for them, but it takes away my ability as a developer to find
out more about what they're trying to do, as opposed to say just being
given a list of finite size.

That said, I regret bringing up Sage; I was using it as an example but
I think the point stands on its own.


More information about the Python-ideas mailing list