[Tutor] How to find optimisations for code
Steven D'Aprano
steve at pearwood.info
Sat Oct 20 02:05:16 EDT 2018
On Fri, Oct 19, 2018 at 09:12:54AM -0700, Pat Martin wrote:
> Sorry my first email didn't have a subject line
>
> TLDR; How do you figure out if code is inefficient (if it isn't necessarily
> obvious) and how do you find a more efficient solution?
You know it is inefficient if it takes longer to run than it ought to.
The tricky questions are:
- if you don't *measure* the time, how do you know how long it takes?
- how do you know how long it ought to take? other than wishful thinking?
- how do you find a more efficient solution?
That's comes from experience, practice, and thinking about the problem
you are trying to solve. The last one is especially tricky, because it
may involve coming up with a solution to a problem that has never been
solved before. That requires creativity and insight.
But before you do any of that, remember the two critical rules of
optimization, from the classic 1975 book "Principles of Program Design"
by Michael A. Jackson:
Rule 1: Don’t do it.
Rule 2 (for experts only): Don’t do it yet.
Donald Knuth wrote:
"We should forget about small efficiencies, say about 97% of the
time: premature optimization is the root of all evil. Yet we
should not pass up our opportunities in that critical 3%."
And to quote W.A. Wulf:
"More computing sins are committed in the name of efficiency
(without necessarily achieving it) than for any other single
reason — including blind stupidity."
> I use code wars sometimes to get some practice with Python, there was a
> challenge to compare two strings and if string1 had enough characters to be
> rearranged to make string2 return True, otherwise return False.
Think about the problem to be solved. How do we know that one string
could be rearranged to the other? Each string much have:
- the same individual letters, e.g. "binary" and "brainy";
- the same total number of letters;
- the same number of *each* letter.
We don't need to check that the individual letters are the same, because
checking the counts will suffice. If they are not the same, one string
will have (let's say) two A's while the other will have none, and the
counts will be different.
If the two strings have different length, we know they can't be anagrams
of each other:
if len(string1) != len(string2):
return False
Only then do we need to compare the counts of each letter:
for c in string1:
if string1.count(c) != string2.count(c):
return False
Can that be improved? Perhaps -- for short strings, that ought to be
pretty speedy, but for very long strings it has to inspect every letter
over and over again...
for the first letter in string1:
count how many times it appears:
- compare it to the first letter;
- compare it to the second letter;
- compare it to the third letter; etc
for the second letter in string1:
count how many times it appears:
- compare it to the first letter;
- compare it to the second letter;
- compare it to the third letter; etc
etc.
So you can see, each letter gets inspected over and over again. If the
strings are long, that does a lot of work. True, it is done at the speed
of optimized C code (the string.count() method is pretty fast), but we
can do better by using the Counter, which only needs to inspect each
letter once:
if collections.Counter(string1) != collections.Counter(string2):
return False
Can we do better? Maybe... building the counters only needs to inspect
each letter once. But Counters have a bit of overhead, they're not
trivial code. There's a conceptually simpler way which *might* (or might
not!) be faster in practice:
if sorted(string1) != sorted(string2):
return False
Think about why that test works.
If you have truly huge strings, then the Counter will be better, for
sure. But for moderate-sized, or small, strings, I wouldn't want to
guess which is faster. Even though technically sorted does more work, it
probably has less overhead. It might very well win for some cases.
> Sorry about the long email, not sure if there is even an answer to this
> question except maybe write more code and get more experience?
Indeed. Nothing beats experience.
But you can also stand on the shoulders of those who came before you, in
the words of Sir Isaac Newton, one of the greatest mathematicians and
physicists of any time:
"If I have seen further, it is because I have stood on the
shoulders of giants"
which was a not-at-all veiled dig against his rival and equal genius,
Sir Robert Hooke, who was short and bent. (Newton was a genius, but he
was not a nice guy, and he had many rivalries and feuds. In fairness, so
did Hooke.)
You should start here:
https://www.joelonsoftware.com/2001/12/11/back-to-basics/
and think about how that relates to your original algorithm.
As far as programming in Python goes, some general advice:
- whenever you can do something by calling a built-in function
or method, it will probably be faster than doing it in Python
code you wrote yourself;
- so if you want fast code, try to find a way to use built-ins
whenever possible;
- looking up values in a set or dict is optimized to be very fast;
- appending to the end of a list is fast;
- but inserting at the beginning is slow;
- "fast" and "slow" is relative: don't worry about "slow" operations
until you know that they actually make a difference. (Remember the
quote from W.A. Wulf.)
- and watch out for those Shlemiel the painter’s algorithms!
--
Steve
More information about the Tutor
mailing list