[Python-Dev] Impact of Namedtuple on startup time

Giampaolo Rodola' g.rodola at gmail.com
Mon Jul 17 16:31:21 EDT 2017


I completely agree. I love namedtuples but I've never been too happy about
the additional overhead vs. plain tuples (both for creation and attribute
access times), to the point that I explicitly avoid to use them in certain
circumstances (e.g. a busy loop) and only for public end-user APIs
returning multiple values.

To be entirely honest, I'm not even sure why they need to be forcefully
declared upfront in the first place, instead of just having a first-class
function (builtin?) written in C:

>>> ntuple(x=1, y=0)
(x=1, y=0)

...or even a literal as in:

>>> (x=1, y=0)
(x=1, y=0)

Most of the times this is what I really want: quickly returning an
anonymous tuple with named attributes and nothing else, similarly to
os.times() & others. I believe that if something like this would exist we
would witness a big transition from tuple() to ntuple() for all those
functions returning more than 1 value. We witnessed a similar transition in
many parts of the stdlib when collections.namedtuple was first introduced,
but not everywhere, probably because declaring a namedtuple is more work,
it's more expensive, and it still feels like you're dealing with some kind
of too high-level second-class citizen with too much overhead and too many
sugar in terms of API (e.g. "verbose", "rename", "module" and "_source").

If something like this were to happen I expect collections.namedtuple to be
used only by those who want to subclass it in order to attach methods,
whereas the rest would stick and use ntuple() pretty much everywhere (both
in "private" and "public" functions).


On Mon, Jul 17, 2017 at 5:49 PM, Guido van Rossum <guido at python.org> wrote:

> I am firmly with Antoine here. The cumulative startup time of large Python
> programs is a serious problem and namedtuple is one of the major
> contributors -- especially because it is so convenient that it is
> ubiquitous. The approach of generating source code and exec()ing it, is a
> cool demonstration of Python's expressive power, but it's always been my
> sense that whenever we encounter a popular idiom that uses exec() and
> eval(), we should augment the language (or the builtins) to avoid these
> calls -- that's for example how we ended up with getattr().
>
> One of the reasons to be wary of exec()/eval() other than the usual
> security concerns is that in some Python implementations they have a high
> overhead to initialize the parser and compiler. (Even in CPython it's not
> that fast.)
>
> Regarding the argument that it's easier to learn what namedtuple does if
> the generated source is available, while I don't feel this is important,
> supposedly it is important to Raymond. But surely there are other
> approaches possible that work just as well in an educational setting while
> being more efficient in production use. (E.g. the approach taken by
> itertools, where the docs show equivalent Python code.)
>
> Concluding, I think we should move on from the original implementation and
> optimize the heck out of namedtuple. The original has served us well. The
> world is constantly changing. Python should adapt to the (happy) fact that
> it's being used for systems larger than any of us could imagine 15 years
> ago.
>
> --Guido
>
> On Mon, Jul 17, 2017 at 7:59 AM, Raymond Hettinger <
> raymond.hettinger at gmail.com> wrote:
>
>>
>> > On Jul 17, 2017, at 6:31 AM, Antoine Pitrou <antoine at python.org> wrote:
>> >
>> >> I think I understand well enough to say something intelligent…
>> >>
>> >> While actual references to _source are likely rare (certainly I’ve
>> never
>> >> used it), my understanding is that the way namedtuple works is to
>> >> construct _source, and then exec it to create the class. Once that is
>> >> done, there is no significant saving to be had by throwing away the
>> >> constructed _source value.
>>
>> There are considerable benefits to namedtuple being able to generate and
>> match its own source.
>>
>> * It makes it is really easy for a user to generate the code, drop it
>> into another another module, and customize it.
>>
>> * It makes the named tuple factory function completely self-documenting.
>>
>> * The verbose/_source option teaches you exactly what named tuple does.
>> That makes the tool relatively easy to learn, understand, and debug.
>>
>> I really don't want to throw away these benefits to save a couple of
>> milliseconds.   As Nick Coghlan recently posted, "Speed isn't everything,
>> and it certainly isn't adequate justification for breaking public APIs that
>> have been around for years."
>>
>> FWIW, the template/exec implementation has had excellent benefits for
>> maintainability making it very easy to fix and update.  As other parts of
>> Python have changed (limitations on number of arguments, what is allowed as
>> an identifier, etc), it mostly automatically stays in sync with the rest of
>> the language.
>>
>> ISTM this issue is being pressed by micro-optimizers who are being very
>> aggressive and not responding to actual user needs (it is more an invented
>> issue than a real one).  Named tuple has been around for a long time and
>> users have been somewhat happy with it.
>>
>> If someone truly cares about the exec time for a particular named tuple,
>> the _source option makes it trivially easy to just replace the generator
>> call with the expanded code in that particular circumstance.
>>
>>
>> Raymond
>>
>>
>> P.S. I'm fully supportive of Victor's efforts to build-out structseq to
>> make it sufficiently expressive to do more of what collections.namedtuple()
>> does.  That is a perfectly reasonable path to optimization. We've wanted
>> that for a long time and no one has had the spare clock cycles to make it
>> come true.
>>
>>
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%
>> 40python.org
>>
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/g.
> rodola%40gmail.com
>
>


-- 
Giampaolo - http://grodola.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20170717/96a9bb54/attachment-0001.html>


More information about the Python-Dev mailing list