[Python-ideas] __len__() for map()

Terry Reedy tjreedy at udel.edu
Thu Nov 29 15:36:29 EST 2018


On 11/29/2018 6:13 AM, E. Madison Bray wrote:
> On Wed, Nov 28, 2018 at 8:54 PM Terry Reedy <tjreedy at udel.edu> wrote:

> The CPython map() implementation already carries this data on it as
> "func" and "iters" members in its struct.  It's trivial to expose
> those to Python as ".funcs" and ".iters" attributes.  Nothing
> "special" about it.  However, that brings me to...

I will come back to this when you do.

>> https://docs.python.org/3/library/functions.html#map says
>> "map(function, iterable, ...)
>>       Return an iterator [...]"
>>
>> The wording is intentional.  The fact that map is a class and the
>> iterator an instance of the class is a CPython implementation detail.
>> Another implementation could use the generator function equivalent given
>> in the Python 2 itertools doc, or a translation thereof.  I don't know
>> what pypy and other implementations do.  The fact that CPython itertools
>> callables are (now) C-coded classes instead Python-coded generator
>> functions, or C translations thereof (which is tricky) is for
>> performance and ease of maintenance.
> 
> Exactly how intentional is that wording though?

The use of 'iterator' is exactly intended, and the iterator protocol is 
*intentionally minimal*, with one iterator specific __next__ method and 
one boilerplate __iter__ method returning self.  This is more minimal 
than some might like.  An argument against the addition of length_hint 
and __length_hint__ was that it might be seen as extending at least the 
'expected' iterator protocol.  The docs were written to avoid this.

> If it returns an
> iterator it has to return *some object* that implements iteration in
> the manner prescribed by map.

> Generator functions could theoretically
> allow attributes attached to them.  Roughly speaking:
> 
> def map(func, *iters):
>      def map_inner():
>          for args in zip(*iters):
>              yield func(*args)
> 
>      gen = map_inner()
>      gen.func = func
>      gen.iters = iters
> 
>      return gen

> As it happens this won't work in CPython since it does not allow
> attribute assignment on generator objects.  Perhaps there's some good
> reason for that, but AFAICT--though I may be missing a PEP or
> something--this fact is not prescribed anywhere and is also particular
> to CPython.

Instances of C-coded classes generally cannot be augmented.  But set 
this issue aside.

>  Point being, I don't think it's a massive leap or
> imposition on any implementation to go from "Return an iterator [...]"
> to "Return an iterator that has these attributes [...]"

Do you propose exposing the inner struct members of *all* C-coded 
iterators?  (And would you propose that all Python-coded iterators 
should use public names for the equivalents?)  Some subset thereof? 
(What choice rule?)  Or only for map?  If the latter, why do you 
consider map so special?

>>> This is necessary because if I have a function that used to take, say,
>>> a list as an argument, and it receives a `map` object, I now have to
>>> be able to deal with map()s,

In both 2 and 3, the function has to deal with iterator inputs one way 
or another.  In both 2 and 3, possible interator inputs includes maps 
passed as generator comprehensions, '(<expression with x> for x in 
iterable)'.

>> If a function is documented as requiring a list, or a sequence, or a
>> length object, it is a user bug to pass an iterator.  The only thing
>> special about map and filter as errors is the rebinding of the names
>> between Py2 and Py3, so that the same code may be good in 2.x and bad in
>> 3.x.
> 
> It's not a user bug if you're porting a massive computer algebra
> application that happens to use Python as its implementation language
> (rather than inventing one from scratch) and your users don't need or
> want to know too much about Python 2 vs Python 3.

As a former 'scientist who programs' I can understand the desire for 
ignorance of such details.  As a Python core developer, I would say that 
if you want Sage to allow and cater to such ignorance, you have to 
either make Sage a '2 and 3' environment, without burdening Python 3, or 
make future Sage a strictly Python 3 environment (as many scientific 
stack packages are doing or planning to do).

...
> That said, I regret bringing up Sage; I was using it as an example but
> I think the point stands on its own.

Yes, the issues of hiding versus exposing implementation details, and 
that of saving versus deleting and, when needed, recreating 'redundant' 
information, are independent of Sage and 2 versus 3.

-- 
Terry Jan Reedy



More information about the Python-ideas mailing list