[Python-Dev] int(string) (was: DRAFT: python-dev Summary for 2005-09-01 through 2005-09-16)

Scott David Daniels Scott.Daniels at Acm.Org
Thu Nov 10 03:41:59 CET 2005


Tim Peters wrote:
> ...
> Someone want a finite project that would _really_ help their Uncle
> Timmy in his slow-motion crusade to get Python on the list of "solved
> it!" languages for each problem on that magnificent site?
...
> Turns out it's _not_ input speed that's the problem here, and not even
> mainly the speed of integer mod:  the bulk of the time is spent in
> int(string).... 

OK, I got an idea about how to do this fast.  I started with Python
code, and I now have C code that should beat int(string) always while
getting a lot of speed making long values.  The working tables can be
used to do the reverse transformation (int or long to string in some
base) with a little finesse, but I haven't done that yet in C.

The code is pretty sprawly now (a lot left in for instrumentation and
testing pieces), but can eventually get smaller.  I gave myself time
to do this as a birthday present to myself.  It may take a while to
build a patch, but perhaps you can let me know how much speedup you
get using this code.  if you build this module, I'd suggest using
"from to_int import chomp" to get a function that works like int
(producing a long when needed and so on).

> If you can even track all the levels of C function calls that ends up 
 > invoking <wink>, you find yourself in PyOS_strtoul(), which is a
 > nifty all-purpose routine that accepts inputs in bases 2 thru 36, can
> auto-detect base, and does platform-independent overflow checking at
> the cost of a division per digit.  All those are features, but it
> makes for sloooow conversion.
OK, this code doesn't deal with unicode at all.  The key observations
are:
   A) to figure out the base, you pretty much need to get to the first
      digit; getting to the first non-zero digit is not that much worse.
   B) If you know the length of a string of digits (starting at the
      first non-zero digit) and the base, you know approximately how
      bits the result will have.  You can do a single allocation if
      you are building a long.  You can tell if you need to test for
      overflow in building an int; there is one length per base where
      you must.

So the question becomes, is it worth taking two passes at the digits?
Well, it sure looks like it to me, but I haven't timed one or two-
character integers.  I do longs in "megadigits" -- the largest set
of digits that fits safely in SHIFT bits, so they have no need for
overflow checks.

For further excitement, you can use a similar technique to go from
the number of bits to the string length.  That should make for a
fast convert int/long to string in any of 36 (or more, really) bases.

I pass all of your mentioned test cases (including the one from a
later message).  I'm pretty much out of time for this project at
the moment, but encouraging words would help me steal some time
to finish.  For anyone wanting to look at the code, or try it
themselves:

Installer:
    http://members.dsl-only.net/~daniels/dist/to_int-0.10.win32-py2.4.exe
Just the 2.4 dll:
    http://members.dsl-only.net/~daniels/dist/to_int-0.10.win32.zip
Sources:
    http://members.dsl-only.net/~daniels/dist/to_int-0.10.zip


--Scott David Daniels
Scott.Daniels at Acm.Org



More information about the Python-Dev mailing list