[Python-Dev] string.ato? and Unicode

M.-A. Lemburg mal@lemburg.com
Mon, 03 Apr 2000 00:11:02 +0200


Mark Hammond wrote:
> 
> Is this an over-sight, or by design?
> 
> >>> string.atoi(u"1")
> ...
> TypeError: argument 1: expected string, unicode found

Probably an oversight... and it may well not be the only
one: there are many explicit string checks in the code
which might need to be fixed for Unicode support.

As for string.ato? I'm not sure: these functions are
obsoleted by int(), float() and long().
 
> It appears easy to support Unicode - there is already an explicit
> StringType check in these functions, and it simply delegates to
> int(), which already _does_ work for Unicode

Right. I fixed the above three APIs to support Unicode.
 
> A patch would leave the following behaviour:
> >>> string.atio(u"1")
> 1
> >>> string.atio(u"1", 16)
> ...
> TypeError: can't convert non-string with explicit base
> 
> IMO, this is better than what we have now.  I'll put together a
> patch if one is wanted...

BTW, the code in string.py for atoi() et al. looks really
complicated:

"""
def atoi(*args):

    """atoi(s [,base]) -> int

    Return the integer represented by the string s in the given
    base, which defaults to 10.  The string s must consist of one
    or more digits, possibly preceded by a sign.  If base is 0, it
    is chosen from the leading characters of s, 0 for octal, 0x or
    0X for hexadecimal.  If base is 16, a preceding 0x or 0X is
    accepted.

    """
    try:
        s = args[0]
    except IndexError:
        raise TypeError('function requires at least 1 argument: %d given' %
                        len(args))
    # Don't catch type error resulting from too many arguments to int().  The
    # error message isn't compatible but the error type is, and this function
    # is complicated enough already.
    if type(s) == _StringType:
        return _apply(_int, args)
    else:
        raise TypeError('argument 1: expected string, %s found' %
                        type(s).__name__)
"""

Why not simply...

def atoi(s, base=10):
    return int(s, base)

dito for atol() and atof()... ?! This would not only give us better
performance, but also Unicode support for free. (I'll fix int()
and long() to accept Unicode when using an explicit base too.)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/