How Best to Coerce Python Objects to Integers?

Steve D'Aprano steve+python at pearwood.info
Tue Jan 3 20:10:01 EST 2017


On Wed, 4 Jan 2017 11:22 am, Erik wrote:

> On 03/01/17 22:47, Chris Angelico wrote:
>> On Wed, Jan 4, 2017 at 9:42 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
>>> Aside from calling "except Exception" a "naked except"
>>
>> If you read the comments, you'll see that he originally had an actual
>> bare except clause, but then improved the code somewhat in response to
>> a recommendation that SystemExit etc not be caught.
> 
> But, as stated at the top of the article, his brief was: "The strings
> come from a file that a human has typed in, so even though most of the
> values are good, a few will have errors ('25C') that int() will reject.".

Right. And from there he starts worrying about the case where the inputs
aren't strings at all, or they're weird exotic objects with nasty __int__
methods. That's overkill and can only hide programming errors.


> What he *should* have done is just validated his input strings before
> presenting the string to int() - i.e., process the input with knowledge
> that is specific to the problem domain before calling the
> general-purpose function.

That's the Look Before You Leap solution. But in this case, given the
scenario described (a text file with a few typos), the best way is to ask
for forgiveness rather than permission:

def int_or_else(value):
    try:
        return int(value)
    else ValueError:
        pass


Why is this better? In the given scenario, errors are rare. Most values are
good, with only a few typos, so it is wasteful to parse the string twice,
once to validate it and once to generate the int. Besides, there's probably
no Python code you can write which will validate an int as fast as the
int() function itself.


[...]
> Instead, he tried to patch around int() rejecting the strings. And then
> decided that he'd patch around int() rejecting things that weren't even
> strings even though that's not what the function has (apparently) been
> specified to receive.

Indeed.


Another thought: if he is receiving human generated input, there is an
argument to be made for accepting "look alikes" -- e.g. maybe the data was
entered by Aunt Tilly, who was a typist in the 1960s and can't break the
habit of using l or I interchangeably for 1, and O for 0.




-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.




More information about the Python-list mailing list