ANN: mxNumber -- Experimental Number Types, Version 0.2.0

Tim Peters tim.one at home.com
Sat Apr 28 02:36:55 EDT 2001


[Brian Kelley]
> ...
> This is actually defined from the original api although I am
> probably opening up the sea of silly arguments.

I'm afraid you can't help it <wink/sigh>.

> Function: unsigned long int mpz_popcount (mpz_t op)
>     For non-negative numbers, return the population count of op.
> For negative numbers, return the largest possible value (MAX_ULONG).

That's fine for C, but makes no sense in a Python interface; i.e., wtf is
MAX_ULONG in Python terms?  Python doesn't even have an unsigned integral
type.

So that's where the silly arguments start.  Just pick *something*.  For
example, sys.maxint is closest in spirit to MAX_ULONG, but shares the defect
of the GMP definition that it's ambiguous whether it means "infinity" or "a
whole lot but nevertheless finite" in this context.  -1 would make more sense
for Python, and is not ambiguous; GMP doesn't have that choice, though, since
it returns an unsigned result.

> more good stuff at
> http://www.swox.com/gmp/manual/gmp_6.html#SEC30

Right, they have lots of good stuff.  The functions aren't all well-defined
in Python terms, though, and sometimes not even in C terms; e.g.,

    Function: unsigned long int
              mpz_scan1 (mpz_t op, unsigned long int starting_bit)
    Scan op, starting with bit starting_bit, towards more significant
    bits, until the first set bit is found.  Return the index of the
    found bit.

The docs there really don't define what "starting_bit" or "index" mean
(perhaps 0-based, with index i being bit 2**i?  i.e., starting with 0 "from
the right"?).  Then what do you think mpz_scan1(0, 0) returns?  That is,
there are no 1 bits in 0 for scan1 to find.  I can guess that they return
MAX_ULONG again in such cases, but they don't say so, and as above -1 is
probably a better result for Python to return.

> This is more what I meant:
>
> >>i = mx.Number.Integer("100101011101010")
> >>pickle.dump(i,0)
> "cmx.Number\n_I\np0\n(S'10101010101010'\np1\ntp2\nRp3\n."
>
> The string S'10101010101010' is a fairly wasteful encoding for a
> bit vector.

Sure.  Is it actually a problem for you in practice, or is just something
that offends because it's provably less than optimal?  Note that text-mode
pickles are *meant* to be easily human-readable too, and there's no clearer
way to "encode" the decimal integer 100101011101010 than as the string
"10101010101010" -- Python does the same for its own native long (unbounded
int) pickles.  A mild compromise would be to use a hex string instead (still
easily readable, encodes 4 bits per byte instead of ~3.3, and should be very
much faster for pickle<->internal conversions of very long ints).





More information about the Python-list mailing list