Strange behavior

Thu Aug 16 13:40:43 EDT 2012

On Thu, 16 Aug 2012 13:18:59 +0200, Virgil Stokes wrote:

> On 15-Aug-2012 02:19, Steven D'Aprano wrote:
>> On Tue, 14 Aug 2012 21:40:10 +0200, Virgil Stokes wrote:
>>
>>> You might find the following useful:
>>>
>>> def testFunc(startingList):
>>>       xOnlyList = []; j = -1
>>>       for xl in startingList:
>>>           if (xl[0] == 'x'):
>> That's going to fail in the starting list contains an empty string. Use
>> xl.startswith('x') instead.
>
> Yes, but this was by design (tacitly assumed that startingList was both
> a list and non-empty).

As Peter already pointed out, I said it would fail if the list contains 
an empty string, not if the list was empty.

>>>               xOnlyList.append(xl)
>>>           else:
>>>               j += 1
>>>               startingList[j] = xl
>>
>> Very cunning, but I have to say that your algorithm fails the "is this
>> obviously correct without needing to study it?" test. Sometimes that is
>> unavoidable, but for something like this, there are simpler ways to
>> solve the same problem.
>
> Sorry, but I do not sure what you mean here.

In a perfect world, you should be able to look at a piece of code, read 
it once, and see whether or not it is correct. That is what I mean by 
"obviously correct". For example, if I have a function that takes an 
argument, doubles it, and prints the result:

def f1(x):
    print(2*x)

that is obviously correct. Whereas this is not:

def f2(x):
    y = (x + 5)**2 - (x + 4)**2
    sys.stdout.write(str(y - 9) + '\n')

because you have to study it to see whether or not it works correctly.

Not all programs are simple enough to be obviously correct. Sometimes you 
have no choice but to write something which requires cleverness to get 
the right result. But this is not one of those cases. You should almost 
always prefer simple code over clever code, because the greatest expense 
in programming (time, effort and money) is to make code correct.

Most code does not need to be fast. But all code needs to be correct.

[...]
> This can meet the requirement that startingList is modified in place via
> the call to this function (see the attached code).

Good grief! See, that's exactly the sort of thing I'm talking about. 
Without *detailed* study of your attached code, how can I possibly know 
what it does or whether it does it correctly?

Your timing code calculates the mean using a recursive algorithm. Why 
don't you calculate the mean the standard way: add the numbers and divide 
by the total? What benefit do you gain from a more complicated algorithm 
when a simple one will do the job just as well?

You have spent a lot of effort creating a complicated, non-obvious piece 
of timing code, with different random seeds for each run, and complicated 
ways of calculating timing statistics... but unfortunately the most 
important part of any timing test, the actually *timing*, is not done 
correctly. Consequently, your code is not correct.

With an average time of a fraction of a second, none of those timing 
results are trustworthy, because they are vulnerable to interference from 
other processes, the operating system, and other random noise. You spend 
a lot of time processing the timing results, but it is Garbage In, 
Garbage Out -- the results are not trustworthy, and if they are correct, 
it is only by accident.

Later in your post, you run some tests, and are surprised by the result:

> Why is algorithm-2A slower than algorithm-2?

It isn't slower. It is physically impossible, since 2A does *less* work 
than 2. This demonstrates that you are actually taking a noisy 
measurement: the values you get have random noise, and you don't make any 
effort to minimise that noise. Hence GIGO.

The right way to test small code snippets is with the timeit module. It 
is carefully written to overcome as much random noise as possible. But 
even there, the authors of the timeit module are very clear that you 
should not try to calculate means, let alone higher order statistics like 
standard deviation. The only statistic which is trustworthy is to run as 
many trials as you can afford, and select the minimum value.

So here is my timing code, which is much shorter and simpler and doesn't 
try to do too much. You do need to understand the timeit.Timer class:

timeit.Timer creates a timer object; timer.repeat does the actual timing. 
The specific arguments to them are not vital to understand, but you can 
read the documentation if you wish to find out what they mean.

First, I define the two functions. I compare similar functions that have 
the same effect. Neither modifies the input argument in place. Copy and 
paste the following block into an interactive interpreter:

# Start block

def f1(startingList):
    return ([x for x in startingList if x[0] == 'x'],
            [x for x in startingList if x[0] != 'x'])

# Note that the above function is INCORRECT, it will fail if a string is
# empty; nevertheless I will use it for timing purposes anyway.

def f2(startingList):
    words_without_x = []
    words_with_x = []
    for word in startingList:
        if word.startswith('x'):
            words_with_x.append(word)
        else:
            words_without_x.append(word)
    return (words_with_x, words_without_x)

# Set up some test data.  There's no point being too clever about this.
# Keep it simple.

import random
data = ['aa', 'bb', 'cb', 'xa', 'xb', 'xc']*1000000
random.shuffle(data)

# Set up two timers.
from timeit import Timer
setup = "from __main__ import data, f1, f2"
t1 = Timer("a, b = f1(data)", setup)
t2 = Timer("a, b = f2(data)", setup)

# and run the timers
best1 = min(t1.repeat(number=1, repeat=10))
best2 = min(t2.repeat(number=1, repeat=10))

# End block

On my computer, here are the results. Yours may differ.

best1: 3.5199968814849854
best2: 3.515479803085327

No significant difference. And that is to be expected: the bulk of the 
time is spent building up two lists of three million items each.

So let's run it again with less data:

data = data[:10000]
best1 = min(t1.repeat(number=200, repeat=10))/200
best2 = min(t2.repeat(number=200, repeat=10))/200

which gives results:

best1: 0.0037816047668457033
best2: 0.005841898918151856

The double list comp solution is faster, but it's also incorrect -- it 
fails if there is an empty string in the list. What happens if we replace 
it with a version that doesn't have the empty string bug?

def f1(startingList):
    return ([x for x in startingList if x.startswith('x')],
            [x for x in startingList if not x.startswith('x')])

best1 = min(t1.repeat(number=200, repeat=10))/200
best2 = min(t2.repeat(number=200, repeat=10))/200

which gives these results:

best1: 0.008604295253753662
best2: 0.005863149166107178

So there's the first lesson: it's easy to be fast if you don't mind 
writing buggy code.

Can we do better? Try this:

def f3(startingList):
    words_with_x = []
    words_without_x = []
    append_with = words_with_x.append
    append_without = words_without_x.append
    for word in iter(startingList):
        if word[:1] == 'x':
            append_with(word)
        else:
            append_without(word)
    return (words_with_x, words_without_x)

t3 = Timer('a, b = f3(data)', 'from __main__ import f3, data')
best3 = min(t3.repeat(number=200, repeat=10))/200

And the result:

best3: 0.0033271098136901855

which is even faster than your original version.

Or is it? No, I can't conclude that. The difference between the original 
f1 function (0.00378s) and my f3 function (0.00332s) is too small to be 
sure it is real from just ten trials of each. A better statistician than 
me could probably estimate the number of trials needed to be confident 
that one is better than the other.

But then, with a difference that small, who cares? In the real world, a 
difference that small is lost in the noise. Because of the noise, 
probably 50% of the time the slower code will finish first.

[...]
> Suppose words was not a list --- you have tacitly assumed that words is
> a list.

Actually, no I have not. I have assumed it is an iterable object, such as 
a list, a tuple, or an iterator. So what? You have done the same thing. 
Doing an isinstance type check at the beginning of both functions will 
just slow them both down by the same amount.

-- 
Steven