Confused compare function :)

Sat Dec 8 02:01:08 EST 2012

On 12/7/2012 5:16 PM, Steven D'Aprano wrote:
> On Thu, 06 Dec 2012 23:14:17 +1100, Chris Angelico wrote:
>
>> Setting up the try/except is a constant time cost,
>
> It's not just constant time, it's constant time and *cheap*. Doing
> nothing inside a try block takes about twice as long as doing nothing:
>
> [steve at ando ~]$ python2.7 -m timeit "try: pass
>> except: pass"
> 10000000 loops, best of 3: 0.062 usec per loop
>
> [steve at ando ~]$ python2.7 -m timeit "pass"
> 10000000 loops, best of 3: 0.0317 usec per loop
>
>
>> while the duplicated
>> search for k inside the dictionary might depend on various other
>> factors.
>
> It depends on the type, size and even the history of the dict, as well as
> the number, type and values of the keys. Assuming a built-in dict, we can
> say that in the absence of many collisions, key lookup can be amortized
> over many lookups as constant time.
>
>
>> In the specific case of a Python dictionary, the membership
>> check is fairly cheap (assuming you're not the subject of a hash
>> collision attack - Py3.3 makes that a safe assumption),
>
> Don't be so sure -- the hash randomization algorithm for Python 3.3 is
> trivially beaten by an attacker.
>
> http://bugs.python.org/issue14621#msg173455
>
> but in general, yes, key lookup in dicts is fast. But not as fast as
> setting up a try block.
>
> Keep in mind too that the "Look Before You Leap" strategy is
> fundamentally unsound if you are using threads:
>
> # in main thread:
> if key in mydict:  # returns True
>      x = mydict[key]  # fails with KeyError
>
> How could this happen? In the fraction of a second between checking
> whether the key exists and actually looking up the key, another thread
> could delete it! This is a classic race condition, also known as a Time
> Of Check To Time Of Use bug.

I generally agree with everything Steven has said here and in previous 
responses and add the following.

There are two reasons to not execute a block of code.

1. It could and would run, but we do not want it to run because a) we do 
not want an answer, even if correct; b) it would return a wrong answer 
(which of course we do not want); or c) it would run forever and never 
give any answer. To not run code, for any of these reasons, requires an 
if statement.

2. It will not run but will raise an exception instead. In this case, we 
can always use try-except. Sometimes we can detect that it would not run 
before running it, and can use an if statement instead. (But as Steven 
points out, this is sometimes trickier than it might seem.) However, 
even if we can reliably detect that code would either run or raise an 
exception, this often or even usually requires doing redundant calculation.

For example, 'key in mydict' must hash the key, mod the hash according 
to the size of the dict, find the corresponding slot in the dict, and do 
an equality comparison with the existing key in the dict. If not equal, 
repeat according to the collision algorithm for inserting keys.

In other words, 'key in mydict' does everything done by 'mydict[key]' 
except to actually fetch the value when the right slot is found or raise 
an exception if there is no right slot.

So why ever use a redundant condition check? A. esthetics. B. 
practicality. Unfortunately, catching exceptions may be and often is as 
slow as the redundant check and even multiple redundant checks.

-- 
Terry Jan Reedy