[Tutor] Limitation of int() in converting strings

Oscar Benjamin oscar.j.benjamin at gmail.com
Mon Dec 17 15:36:04 CET 2012


On 17 December 2012 08:55, Alan Gauld <alan.gauld at btinternet.com> wrote:
>
> On 17/12/12 04:19, boB Stepp wrote:
>
>> It is apparent that int() does not like strings with floating-point
>> formats. None of my books (as far as my flipping can tell) or the
>> below built-in help clarify this:
>> ...
>>
>> Of course if I type int(float('10.0')) I get the desired 10 .
>
>
> as indeed will
>
> int(10.0)
>
> So you are right, there is an inconsistency between how int() converts floating point numbers and how it converts strings. Even stranger since the underlying atoi() C function appears to handle float strings quite happily...

The atoi() function like many of the older C functions has a major
flaw in that it indicates an error by returning zero even though zero
is actually a possible return value for the function. As far as I can
tell it doesn't even set an error code on failure. As a result it is
not safe to use without some additional checking either before or
after the call.

Python's int() function and C's atoi() function also accept and ignore
whitespace around the number in the string:

>>> int('   123   ')
123
>>> int('\t\n   \n   123  \n ')
123
>>> int('\t\n   \n   123  \n 456')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '123  \n 456'

In C, atoi() would have allowed that last example and given 123 as the result.

Also, are you sure that atoi() is used in CPython? The int() function
accepts an optional base argument and can process non-decimal strings:

>>> int('0xff', 16)
255
>>> int('0o377', 8)
255
>>> int('0b11111111', 2)
255
>>> int('11111111', 2)
255


>> So, I am guessing that to convert strings to integers with int() that
>> the string must already be of integer format? What is the rationale
>> for setting up int() in this manner?

I think it's unfortunate that Python's int() function combines two
distinct behaviours in this way. In different situations int() is used
to:
1) Coerce an object of some type other than int into an int without
changing the value of the integer that the object represents.
2) Round an object with a non-integer value to an integer value.

There are situations where behaviour 1) is required but behaviour 2)
is definitely not wanted. The inability to do this safely in Python
resulted in PEP 357 [1] that adds an __index__ method to objects that
represent integers but are not of type int(). Unfortunately, this was
intended for slicing and doesn't help when converting floats and
strings to int().

I have often found myself writing awkward functions to prevent a
rounding error from occurring when coercing an object with int().
Here's one:

def make_int(obj):
    '''Coerce str, float and int to int without rounding error
    Accepts strings like '4.0' but not '4.1'
    '''
    fnum = float('%s' % obj)
    inum = int(fnum)
    assert inum == fnum
    return inum


References:
[1] http://www.python.org/dev/peps/pep-0357/


Oscar


More information about the Tutor mailing list