isFloat: Without Exception-Handling

Sat Oct 5 04:32:15 EDT 2002

Chad Netzer wrote:

> On Friday 04 October 2002 19:31, Jeff Epler wrote:
>> On Fri, Oct 04, 2002 at 06:05:19PM -0700, Chad Netzer wrote:
> 
>> > def isFloat(s):
>> >     try: return float(s) or True
>> >     except (ValueError, TypeError), e: return False
>>
>> You mean
>> def isFloat(s):
>>     try: return (float(s), True)[1]
>>     except (ValueError, TypeError), e: return False
> 
> Is there any instance when these will be different?  I assume float() will

In any case in which s gets coerced into a non-zero float X, the
former version will return X, the latter will return True.  This
may or may not matter (e.g., it does if what you do with the result
is printing it, doesn't if all you do is test the result directly
in an if or while statement, etc, etc) but is certainly a difference
nonetheless.

> always return type< float >, and so not overload the 'and'.  In which

There is no 'and' in either version, and there is no such thing as
"overloading the 'and'" anywhere in Python.  Short-circuiting operators
are not subject to overload.

> case, my version doesn't have to construct a tuple (so possibly it's
> faster).

Yep:

import time

def isFloat1(s):
   try: return float(s) or True
   except (ValueError, TypeError), e: return False

def isFloat2(s):
   try: return (float(s), True)[1]
   except (ValueError, TypeError), e: return False

lotsa = xrange(1000*1000)

def timit(func):
    start = time.clock()
    map(func, lotsa)
    stend = time.clock()
    return '%.2f %s' % (stend-start, func.__name__)

for i in range(3):
    for func in (isFloat1, isFloat2):
        print timit(func)

[alex at lancelot ba]$ python -O me.py
1.77 isFloat1
2.11 isFloat2
1.75 isFloat1
2.13 isFloat2
1.78 isFloat1
2.11 isFloat2

This is with Python 2.3, but the ratio's similar in uniformly-slower
Python 2.2:

[alex at lancelot ba]$ python2.2 -O me.py
1.96 isFloat1
2.26 isFloat2
1.96 isFloat1
2.27 isFloat2
1.95 isFloat1
2.26 isFloat2

version 1 saves about 30 hundredths of a second per million
calls (with float(s) != 0), so about 300 nanoseconds per call
(if you can come up with a case where this saving will matter
I'll be really interested!).

The difference is a bit less (but the same order of magnitude)
when lotsa is a million 0's:

[alex at lancelot ba]$ python2.2 -O me.py
1.89 isFloat1
2.08 isFloat2
1.87 isFloat1
2.06 isFloat2
1.88 isFloat1
2.04 isFloat2
[alex at lancelot ba]$ python2.3 -O me.py
1.78 isFloat1
2.08 isFloat2
1.81 isFloat1
2.07 isFloat2
1.81 isFloat1
2.07 isFloat2
[alex at lancelot ba]$

> Just curious; for the specific case and'ing with a float, I can't think
> where mine might go wrong (not off the top of my head, anyway).

Depends on the function's specs (there's no "and"'ing in the function,
of course).  If the function is specified to return True or False,
rather than to return just any value it wants as long as the value
is (not-uppercased) true or false, it "goes wrong" (breaks specs)
for any s such that float(s) != 0.

Here's a benchmark that seems more representative to me:

import time

def isFloat1(s):
   try: return float(s) or True
   except (ValueError, TypeError), e: return False

def isFloat2(s):
   try: return (float(s), True)[1]
   except (ValueError, TypeError), e: return False

def isFloat3(s):
    try: float(s)
    except (ValueError, TypeError): return False
    else: return True

sampvals = 0.0, 1.0, '0.0', '1.0', 'foo', [1,2]
L = len(sampvals)
lotsa = [ sampvals[i%L] for i in xrange(1000*1000) ]

def timit(func):
    start = time.clock()
    map(func, lotsa)
    stend = time.clock()
    return '%.2f %s' % (stend-start, func.__name__)

for i in range(2):
    for func in (isFloat1, isFloat2, isFloat3):
        print timit(func)

here we test a mix of values that may succeed or fail for
different reasons, giving 0 and non-0 floats as a result.
Given the failures, we have very different timing overall:

[alex at lancelot ba]$ python2.3 -O me.py
7.20 isFloat1
7.41 isFloat2
7.06 isFloat3
7.13 isFloat1
7.40 isFloat2
7.04 isFloat3
[alex at lancelot ba]$ python2.2 -O me.py
9.35 isFloat1
9.70 isFloat2
9.41 isFloat3
9.46 isFloat1
9.64 isFloat2
9.46 isFloat3
[alex at lancelot ba]$

Here, the differences between the 1st and 2nd versions,
though all in all similar to what they were before
(210-270 nanosec/call on 2.3, 180-350 on 2.2) are more
clearly swamped by the overall increase in time.  You
can also see that I snuck in my own favorite version,
which happens to gain 90-140 nanosec/call on 2.3 and
lose 0-60 on 2.2, wrt your version... but it's not for
speed that I prefer it: it's for elegance and clarity.

By having JUST the float(s) in the try clause, we make
it crystal-clear that we DON'T care about the result
(since we're _obviously_ ignoring it, rather than sort
of ignoring it in subtler ways:-) -- the presence of
the try/except/else structure shows that we care about
success or failure (raising or non-raising of one of
the two exceptions we test for -- and we DON'T pick
up then ignore the exception value, since we ONLY care
about the _type_ of exception...).  I consider the
symmetry and clarity paramount -- and I also think that,
more generally, try/except/else is underused -- it's
a clear, robust and effective construct, saves you in
some cases from accidentally catching exceptions you
didn't really expect, and as this example shows in a
few circumstances it may even serendipitously give a
tiny performance plus... not that the latter really
matters, but it seems to help convince some:-).

Lesson to retain: measure, measure, measure.  Once
everybody starts measuring rather than guessing
(that will be the day...) it will be time to start
discussing the finer details of measurement (that
Tim Peters explains SO excruciatingly well in his
introduction to Chapter 17 in the Python Cookbook,
O'Reilly's edition) -- and once everybody's clear
on THAT too, we can get back to pointing out that
it doesn't really matter (mostly:-).

Alex