[Tutor] How to find optimisations for code

Steven D'Aprano steve at pearwood.info
Sat Oct 20 02:05:16 EDT 2018


On Fri, Oct 19, 2018 at 09:12:54AM -0700, Pat Martin wrote:
> Sorry my first email didn't have a subject line
> 
> TLDR; How do you figure out if code is inefficient (if it isn't necessarily
> obvious) and how do you find a more efficient solution?

You know it is inefficient if it takes longer to run than it ought to.

The tricky questions are:

- if you don't *measure* the time, how do you know how long it takes?

- how do you know how long it ought to take? other than wishful thinking?

- how do you find a more efficient solution?

That's comes from experience, practice, and thinking about the problem 
you are trying to solve. The last one is especially tricky, because it 
may involve coming up with a solution to a problem that has never been 
solved before. That requires creativity and insight.

But before you do any of that, remember the two critical rules of 
optimization, from the classic 1975 book "Principles of Program Design" 
by Michael A. Jackson:

    Rule 1: Don’t do it.
    Rule 2 (for experts only): Don’t do it yet.


Donald Knuth wrote:

    "We should forget about small efficiencies, say about 97% of the
    time: premature optimization is the root of all evil. Yet we
    should not pass up our opportunities in that critical 3%."


And to quote W.A. Wulf:

    "More computing sins are committed in the name of efficiency
    (without necessarily achieving it) than for any other single
    reason — including blind stupidity."



> I use code wars sometimes to get some practice with Python, there was a
> challenge to compare two strings and if string1 had enough characters to be
> rearranged to make string2 return True, otherwise return False.

Think about the problem to be solved. How do we know that one string 
could be rearranged to the other? Each string much have:

- the same individual letters, e.g. "binary" and "brainy";
- the same total number of letters;
- the same number of *each* letter.

We don't need to check that the individual letters are the same, because 
checking the counts will suffice. If they are not the same, one string 
will have (let's say) two A's while the other will have none, and the 
counts will be different.

If the two strings have different length, we know they can't be anagrams 
of each other:

    if len(string1) != len(string2):
        return False

Only then do we need to compare the counts of each letter:

    for c in string1:
        if string1.count(c) != string2.count(c):
            return False

Can that be improved? Perhaps -- for short strings, that ought to be 
pretty speedy, but for very long strings it has to inspect every letter 
over and over again...

for the first letter in string1:
    count how many times it appears:
    - compare it to the first letter;
    - compare it to the second letter;
    - compare it to the third letter; etc
for the second letter in string1:
    count how many times it appears:
    - compare it to the first letter;
    - compare it to the second letter;
    - compare it to the third letter; etc
etc.

So you can see, each letter gets inspected over and over again. If the 
strings are long, that does a lot of work. True, it is done at the speed 
of optimized C code (the string.count() method is pretty fast), but we 
can do better by using the Counter, which only needs to inspect each 
letter once:

    if collections.Counter(string1) != collections.Counter(string2):
        return False

Can we do better? Maybe... building the counters only needs to inspect 
each letter once. But Counters have a bit of overhead, they're not 
trivial code. There's a conceptually simpler way which *might* (or might 
not!) be faster in practice:

    if sorted(string1) != sorted(string2):
        return False

Think about why that test works.

If you have truly huge strings, then the Counter will be better, for 
sure. But for moderate-sized, or small, strings, I wouldn't want to 
guess which is faster. Even though technically sorted does more work, it 
probably has less overhead. It might very well win for some cases.


> Sorry about the long email, not sure if there is even an answer to this
> question except maybe write more code and get more experience?

Indeed. Nothing beats experience.

But you can also stand on the shoulders of those who came before you, in 
the words of Sir Isaac Newton, one of the greatest mathematicians and 
physicists of any time:

    "If I have seen further, it is because I have stood on the 
    shoulders of giants"

which was a not-at-all veiled dig against his rival and equal genius, 
Sir Robert Hooke, who was short and bent. (Newton was a genius, but he 
was not a nice guy, and he had many rivalries and feuds. In fairness, so 
did Hooke.)

You should start here:

https://www.joelonsoftware.com/2001/12/11/back-to-basics/

and think about how that relates to your original algorithm.

As far as programming in Python goes, some general advice:

- whenever you can do something by calling a built-in function 
  or method, it will probably be faster than doing it in Python
  code you wrote yourself;

- so if you want fast code, try to find a way to use built-ins
  whenever possible;

- looking up values in a set or dict is optimized to be very fast;

- appending to the end of a list is fast;

- but inserting at the beginning is slow;

- "fast" and "slow" is relative: don't worry about "slow" operations 
  until you know that they actually make a difference. (Remember the
  quote from W.A. Wulf.)

- and watch out for those Shlemiel the painter’s algorithms!


-- 
Steve


More information about the Tutor mailing list