How to get decimal form of largest known prime?

Tim Peters tim.one at comcast.net
Sat Jun 12 16:12:06 EDT 2004


[Grégoire Dooms]
> If speed is the matter, go for C:
>
> with the GNU MP library use
> void mpz_pow_ui (mpz_t rop, mpz_t base, unsigned long int exp) and
> char * mpz_get_str (char *str, int base, mpz_t op)

There are some GMP wrappers for Python.  Using Marc-Andre Lemburg's mxNumber
wrapper (which is maximally convenient for me on Windows, since it includes
a precompiled-for-Windows GMP), the OP's task is spelled like so:

>>> from mx.Number import *
>>> Integer(2)**24036583 - 1

That does call mpz_pow_ui() and mpz_get_str() (with base=10) under the
covers.

Since I typed that in, the process has accumulated 59 CPU minutes, with no
output yet.

Curiously,

>>> x = Integer(2)**24036583 - 1

consumes 112 CPU seconds on its own, while the straight Python

>>> x = 2**24036583 - 1

consumes only 2 CPU seconds -- so Python is 50+ times faster than this GMP
for the computational part.  Python's Karatsuba multiplication gets enormous
benefit out of special-casing zeroes in this particular case (all the
squarings when computing the power see an input whose least-significant half
is entirely zero bits, so huge mounds of needless work get skipped).

The fastest pure-Python solution I've found consumed about 20 CPU minutes to
do the whole job, so is at least 3X as fast as the GMP here (which is still
running as I type this, over an hour of CPU time and climbing).  For that, I
wrote my own multiplication and exponentiation routines in Python, using
digits in base 10**5000.  Computing in a form of decimal from the start
makes "conversion to decimal" a speedy operation.

> I did a small contest about the decimal repr of 2**1000000 a a couple
> years ago. A few languages were compared on a time-to-result basis.
> Python won with 7 minutes (10 sec devel, 6:50 comput.) and C + GMP was
> second: 15 min devel(including reading the doc) + a few sec of comput. I
> bet you can expect a 30-100 fold speedup using C + GMP compared to python
> -c 'print 2**24036583 - 1'

On the box I'm using here, the straight Python 2.3.4:

>>> 2**1000000

consumed 80 CPU seconds (which includes decimal display), and the mxNumber
version:

>>> from mx.Number import *
>>> Integer(2)**1000000

consumed 60 CPU seconds.  I expect Python 2.3.4 is faster at this stuff than
the version you used (in particular, recent Pythons use base 10000
internally when converting to decimal, instead of base 10; it's still
quadratic-time, but the constant factor was slashed).  The box is a 3.2GHz
P4, which is probably also faster than the box you used.  It could also be
that the version of GMP shipped with mxNumber isn't optimally compiled for
this box, and/or getting old.

Lessons to take home <wink>:

+ The speed of bigint operations can be data-dependent in implementation-
  specific ways (Python's native pow() ran circles around GMP's for this
  particular data).

+ That changes over time (recent Pythons are much faster at some of these
  tasks than the one you used).

+ A better algorithm is always the better answer (computing in decimal
  from the start allows pure Python to run circles around GMP computing
  in binary then forced to do a non-trivial 24-million bit base
  conversion).






More information about the Python-list mailing list