[Python-Dev] Impact of Namedtuple on startup time

Gregory P. Smith greg at krypto.org
Mon Jul 17 12:13:39 EDT 2017


On Mon, Jul 17, 2017 at 8:00 AM Raymond Hettinger <
raymond.hettinger at gmail.com> wrote:

>
> > On Jul 17, 2017, at 6:31 AM, Antoine Pitrou <antoine at python.org> wrote:
> >
> >> I think I understand well enough to say something intelligent…
> >>
> >> While actual references to _source are likely rare (certainly I’ve never
> >> used it), my understanding is that the way namedtuple works is to
> >> construct _source, and then exec it to create the class. Once that is
> >> done, there is no significant saving to be had by throwing away the
> >> constructed _source value.
>
> There are considerable benefits to namedtuple being able to generate and
> match its own source.
>
> * It makes it is really easy for a user to generate the code, drop it into
> another another module, and customize it.
>
> * It makes the named tuple factory function completely self-documenting.
>
> * The verbose/_source option teaches you exactly what named tuple does.
> That makes the tool relatively easy to learn, understand, and debug.
>
> I really don't want to throw away these benefits to save a couple of
> milliseconds.   As Nick Coghlan recently posted, "Speed isn't everything,
> and it certainly isn't adequate justification for breaking public APIs that
> have been around for years."
>
> FWIW, the template/exec implementation has had excellent benefits for
> maintainability making it very easy to fix and update.  As other parts of
> Python have changed (limitations on number of arguments, what is allowed as
> an identifier, etc), it mostly automatically stays in sync with the rest of
> the language.
>
> ISTM this issue is being pressed by micro-optimizers who are being very
> aggressive and not responding to actual user needs (it is more an invented
> issue than a real one).  Named tuple has been around for a long time and
> users have been somewhat happy with it.
>

Raymond, you keep repeating statements similar to "only a millisecond" and
"aggressive micro-optimizers who don't care about user needs" in your
comments on issues like this. That simply isn't true. These issues come up
in the first place *because of* users who need fast startup. Please don't
be so dismissive.

The reason people care about this has been stated many times. It isn't just
"a millisecond", it's 100s or 1000s of milliseconds in any application of
reasonable size where namedtuples were adopted as a design pattern in
various libraries.

Real world use cases for startup time mattering exist: interactive command
line tools are the most obvious one people keep citing. I'll toss another
where Python startup time has raised eyebrows at work: unittest startup and
completion time. When the bulk of a processes time is spent in startup
before hitting unittest.main(), people take notice and consider it a
problem. Developer productivity is reduced. The hacks individual developers
come up with to try and workaround things like this are not pretty.

If someone truly cares about the exec time for a particular named tuple,
> the _source option makes it trivially easy to just replace the generator
> call with the expanded code in that particular circumstance.
>

In real world applications you do not control the bulk of the code that has
chosen to use namedtuple. They're scattered through 100-1000s of other
transitive dependency libraries (not just the standard library), the
modification of each of which faces hurdles both technical and
non-technical in nature.

To me the desired resolution to this is clear: Optimize the default use
case of namedtuple and everybody wins. This isn't just about the stdlib's
namedtuple uses being fast, those a small portion of all uses in any
application where startup time matters. This is about making Python better
for the world.  ie: What Antoine's original write-up suggested in his #3.

I get that namedtuple ._source is a public API. We may need to keep it. If
so, that just means revisiting lazily generating it as a property -
issue19640.

-gps

PS - Good call on the naming hindsight! A trailing underscore would've been
nice. Oh well, too late for that.


>
> Raymond
>
>
> P.S. I'm fully supportive of Victor's efforts to build-out structseq to
> make it sufficiently expressive to do more of what collections.namedtuple()
> does.  That is a perfectly reasonable path to optimization. We've wanted
> that for a long time and no one has had the spare clock cycles to make it
> come true.
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/greg%40krypto.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20170717/bd8498ef/attachment.html>


More information about the Python-Dev mailing list