How to make Python run as fast (or faster) than Julia

Mon Feb 26 08:42:14 EST 2018

On 2/26/18 7:13 AM, bartc wrote:
> On 26/02/2018 11:40, Chris Angelico wrote:
>> On Mon, Feb 26, 2018 at 10:13 PM, bartc <bc at freeuk.com> wrote:
>>> Below is the first draft of a Python port of a program to do with 
>>> random
>>> numbers. (Ported from my language, which in turned ported it from a C
>>> program by George Marsaglia, the random number guy.)
>>>
>>> However, running it quickly exhausts the memory in my machine. The 
>>> reason is
>>> that Python unhelpfully widens all results to bignums as needed. The 
>>> code
>>> relies on calculations being modulo 2^64.
>>>
>>> Note that restricting integer ops to 64 bits probably still won't 
>>> work, as I
>>> believe the numbers need to be unsigned.
>>
>> No, it's because the original implementation assumed integer
>> wrap-around (at least, I think that's what's happening; I haven't
>> analyzed the code in great detail). That means all your integer
>> operations are doing two jobs: the one they claim to, and then a
>> masking to 64 bits signed. That's two abstract operations that happen,
>> due to the nature of the CPU, to work efficiently together. If you
>> don't implement both halves of that in your Python port, you have
>> failed at porting. What if you were porting a program from a 72-bit
>> chip that assumed Binary Coded Decimal? Would you complain that C's
>> integers are badly behaved?
>>
>> And that's without even asking whether a C program that assumes
>> integer wrap-around counts as portable. At least with Python, you have
>> a guarantee that integer operations are going to behave the same way
>> on all compliant implementations of the language.
>
>
> A C version is given below. (One I may have messed around with, which 
> I'm not sure works properly. For an original, google for Marsaglia and 
> KISS64 or SUPRKISS64.)
>
> Most integers are unsigned, which have well-defined overflow in C 
> (they just wrap as expected). In C, a mixed signed/unsigned op is 
> performed as unsigned.
>
> -----------------------------
>
> /*   SUPRKISS64.c, period 5*2^1320480*(2^64-1)   */
> #include <stdio.h>
> #include <stdlib.h>
> #include "timer.c"
>  static signed long long Q[20632],
>                  carry=36243678541LL,
>                  xcng=12367890123456LL,
>                  xs=521288629546311LL,
>                  indx=20632;
>
>  #define CNG ( xcng=6906969069LL*xcng+123 )
>  #define XS  ( xs^=xs<<13,xs^=xs>>17,xs^=xs<<43 )
>  #define SUPR ( indx<20632 ? Q[indx++] : refill() )
>  #define KISS SUPR+CNG+XS
>
>  signed long long refill(void) {
>   int i; signed long long z,h;
>   for(i=0;i<20632;i++) {
>     h = (carry&1);
>     z = ((Q[i]<<41)>>1)+((Q[i]<<39)>>1)+(carry>>1);
>     carry = (Q[i]>>23)+(Q[i]>>25)+(z>>63);
>     Q[i] = ~((z<<1)+h);
>   }
>   indx=1;
>   return (Q[0]);
>  }
>
>  int main() {
>   int i; signed long long x;
>
>   for(i=0;i<20632;i++) Q[i]=CNG+XS;
>
>   for(i=0;i<1000000000;i++) x=KISS;
>
>   printf("Does x=4013566000157423768\n     x=%llu.\n",x);
> }
>

With proper 64-bit masking (def only64(x): return x & 
0xFFFFFFFFFFFFFFFF), the Python version produces the correct answer 
using a reasonable amount of memory. Well, once you notice that the 
Python code had N=1e5, and the C code had N=1e9 :)   If you want to 
experiment, with N=1e5, the final number should be 5255210926702073855.

Also, I note that you said, "Most integers are unsigned", but the C code 
has them all declared as signed?  It doesn't seem to have mattered to 
your result, but I'm not an expert on C portability guarantees.

--Ned.