Trivial performance questions

Geoff Gerrietts geoff at gerrietts.net
Fri Oct 17 15:33:46 EDT 2003


Quoting Peter Hansen (peter at engcorp.com):
> 
> Not unless you add Alex' constraint that the two alternatives under
> consideration are equally readable.  Otherwise the less readable one
> is always going to cost you more at maintenance time.

Yes to your first sentence, not so sure to the second. The implication
is the code will always be touched, and my contention is that if you
don't pay at least trivial attention to writing something optimal --
includes avoiding geometric algorithms -- then you're significantly
increasing the amount of maintenance work necessary.

Example: pulling out list.sort(lambda x, y: cmp(x[0],y[0])) and
putting in an abstract transform_sort is /only responsible/. The
list.sort(callable) idiom might be more readable to a novice -- it has
been to the novices I've worked with -- but its performance
implications on nontrivial lists are astonishing.

> And I'd add my own constraint that you actually have to *need* the
> speed.  Otherwise even the "insignificant" increase in effort that
> it will cost you will not be paying for itself.

Capitalism has bred a real reliance on "good enough": when you hit
your payoff point, you don't go any farther. It's a useful metric to
apply, but a dangerous premise to base all your decisions on. "Good
enough" needs to be critically evaluated for both the short term and
the long term.

A half-million micro-optimizations may not pay for themselves
individually. But in the long term, when confronted with a total
system rewrite because the collected work can no longer perform
adequately, and standard optimization techniques have met with
diminishing returns, you're going to regret not having paid attention
the first time through, when you didn't hafta re-teach yourself what
the code is doing. The little bits where you're just /paying
attention/ to the performance implications of what you're doing
aggregate over time to reduce the maintenance overhead.

>   http://c2.com/cgi/wiki?MakeItWorkMakeItRightMakeItFast

It's an interesting formulation but it stinks of propaganda to me.
When generic catchphrases are re-interpreted by almost every viewer
its a pretty fair bet they're not precise enough to be really useful.
The discussion on this page makes me think of Biblical scholars
debating the meaning of ambiguous passages.

> Making it right means making it readable too.  Optimization should
> always come later, and not at all if you don't actually need it.

I won't disagree with that.

> My group has invested almost thirty person-years writing Python code in
> the last few years.  To the best of my ability to recall, only two of
> the tasks we've worked on in that time was directly related to 
> performance concerns and the resulting optimization for speed.  Given 
> that the combined optimization efforts consumed perhaps a few weeks
> of our time, we spend something like 0.4% of our time focusing on 
> performance.  This seems to me a healthy amount.

My group has invested probably something like 15 person-years writing
Python code in the last few years. We have probably put about one of
those person years into trying to account for performance bottlenecks.
Management is presently of the opinion that a drastic rewrite is the
only way to resolve the remaining issues. Perhaps the most distinct
difference between your group and mine is that many of our developers
are fairly novice, and prone to select solutions that are not
well-informed about performance issues and algorithm complexity. On
the other hand, maybe our code is just more heavily used?

> (Curiously enough, when we coded more in C, I suspect we spent a 
> substantially larger amount of time caught up in performance issues.
> This change is due merely to greater experience, not because of 
> the change in language, though the two are related.)

Yes. Younger engineers tend to emphasize performance too much, because
it's a huge nebulous area that they don't understand, and which may
well bite them in the ass HARD. Older engineers can automatically
navigate through the most dangerous fields of landmines, and tend to
underemphasize performance too much, because the most important
aspects are habit and the less important aspects can be safely
ignored.

At first blush, I thought "maybe there's an equilibrium that needs to
be found". But I don't think so now. I think it's important for
younger (intermediate?) developers to be obsessed with performance, so
they can learn the dangers of bad algorithms, how to recognize them,
how to avoid them. And it's worth building good habits where you
choose an optimal idiom rather than a slower one.

You can disagree, but I've done a lot of reading and thinking on the
matter, in part because my experience and my beliefs have been at odds
in the past. Consequently, you're going to hafta try harder than
invoking the divine authority of Kent Beck (or even Knuth!) to
persuade me. Still, I can yet be persuaded; my mind is quite
tractable.

--G.

-- 
Geoff Gerrietts             "I don't think it's immoral to want to  
<geoff at gerrietts net>     make money."      -- Guido van Rossum





More information about the Python-list mailing list