[Tutor] How to find optimisations for code

Fri Oct 19 21:26:16 EDT 2018

On 19Oct2018 23:07, Alan Gauld <alan.gauld at yahoo.co.uk> wrote:
>On 19/10/18 17:12, Pat Martin wrote:
>> TLDR; How do you figure out if code is inefficient (if it isn't 
>> necessarily obvious) and how do you find a more efficient solution?
>
>Others have addressed most of the issues but I'd just add the caveat
>that you can spend forever trying to make your code "more efficient".
>Usually, in the real world, you don't need to.
>
>So how do you figure out if your code is inefficient?
>Run it and if it doesn't work fast enough for your purpose then
>try to speed it up. If it does then move onto the next project.
>
>> how do I go about finding a more efficient solution?
>
>Google, a good reference book and experience.

Particularly, get some knowledge of the cost of various algorithms and 
data structures. There are outright inefficient things to do, but many 
others have costs and benefits: things fast in time but costly in space, 
and vice versa, and things efficient for some kinds of data but known to 
be bad for others.

Have a quick read of this:

  https://en.wikipedia.org/wiki/Big_O_notation

which talks about a common notation for costs. Skip the formal section 
with the math and go to the Orders of Common Functions section.

  https://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functions

You'll find people talk about big O notation a lot with algorithms and 
their efficiency. For example, a dict _lookup_ is O(1): constant time 
per lookup.  But that doesn't mean always use a dict, because there's a 
cost to creating one in the first place. What you use it for matters.

>But always, always, measure because there has been more
>time wasted on unnecessary optimisations than on any other
>feature of programming.

I just want to add a little nuance to all of this, because it to me it 
reads a little like (a) if it works and you get things done in a timely 
manner it is efficient and (b) if it isn't fast enough you can always 
make it faster. Alan hasn't actually said that, but one could get that 
impression.

To my mind:

Alan's point about "fast enough" is perfectly valid: in the real world, 
if your problem has been solved then further work on efficiency might 
cost more to implement than you gain after doing it.

His other point about measurement is also very important. It is possible 
to guess incorrectly about where performance problems lie, particularly 
with higher level languages because their internals has costs not 
apparent to the eye (because you can't see the internals). So from an 
engineering point of view, it is important to measure where a 
problematic programme spends its time and devote effort to those parts 
consuming the most time.

The standard example is the tight loop:

  for a in sequence1:
    for b in sequence2:
      do thing A
  do thing B

The cost of "do thing A" is more important than the cost of "do thing B" 
because likely A runs many many times more often than B. So even if B 
has some known inefficiency you can see, which will take some work to 
improve, effort spent on A will probably be more useful (if that's 
feasible).

With a complex programme this isn't always obvious; in the example above 
the 2 pieces of code are side by side.

The point about measurement is that profiling tools are useful here: 
when you _don't_ have an obvious primary target to improve but do need 
to improve efficiency, a profiling tool can measure where a programme 
spends its time in aggregate. That can point the way to code more 
beneficial to inspect.

It at least gets you objective information about where your programme 
spends its time. It is limited by the data you give your programme: toy 
example input data are not as good as real world data.

Finally, some things are as efficient as they get. You _can't_ always 
make things even more efficient.

Cheers,
Cameron Simpson <cs at cskk.id.au>