Interesting list Validity (True/False)

Mon May 14 14:41:21 EDT 2007

On May 13, 8:24 am, Steven D'Aprano
<s... at REMOVE.THIS.cybersource.com.au> wrote:
> On Sat, 12 May 2007 21:50:12 -0700, mensana... at aol.com wrote:

I intended to reply to this yesterday, but circumstances
(see timeit results) prevented it.

> >> > Actually, it's this statement that's non-sensical.
>
> >> > <quote>
> >> > "if arg==True" tests whether the object known as arg is equal to the
> >> > object known as True.
> >> > </quote>
>
> >> Not at all, it makes perfect sense. X == Y always tests whether the
> >> argument X is equal to the object Y regardless of what X and Y are.
>
> > Except for the exceptions, that's why the statement is wrong.
>
> But there are no exceptions.

<quote emphasis added>
Sec 2.2.3:
Objects of different types, *--->except<---* different numeric types
and different string types, never compare equal;
</quote>

> X == Y tests for equality. If it returns
> True, then the objects are equal by definition. That's what equal means in
> Python.
>
> One can abuse the technology to give nonsensical results:
>
> class EqualToEverything(object):
>     def __eq__(self, other):
>         return True
>
> >>> x = EqualToEverything()
> >>> x == 1.0
> True
> >>> x == [2.9, "hello world"]
>
> True
>
> but that's no different from any language that allows you to override
> operators.
>
> >> > None of these four examples are "equal" to any other.
>
> >> That's actually wrong, as you show further down.
>
> > No, it's not, as I show further down.
>
> But you show no such thing.
>
> Or, to put it another way:
>
> Did! Did not! Did! Did not! Did! Did not! ...
>
> >> >>>> a = 1
> >> >>>> b = (1,)
> >> >>>> c = [1]
> >> >>>> d = gmpy.mpz(1)
>
> [snip]
>
> >> >>>> a==d
> >> > True
>
> >> See, a and d are equal.
>
> > No, they are not "equal".
>
> Of course they are. It says so right there: "a equals d" is true.

Ok, but they are an exception to the rule "different types compare
False".

>
> > Ints and mpzs should NEVER
> > be used together in loops, even though it's legal.
>
> Why ever not? If you need an mpz value in order to do something, and no
> other data type will do, what would you suggest? Just give up and say
> "Don't do this, because it is Bad, m'kay?"

It's not the mpzs you shouldn't use, its the ints. I also stessed
"in loops". Replacing an integer literal with a variable still
requires a coercion, so it doesn't matter if n + 1 occurs outside
a loop.

>
> > The ints
> > ALWAYS have to be coerced to mpzs to perform arithmetic
> > and this takes time...LOTS of it.
>
> Really? Just how much time?

Can't say, had to abort the following.
Returns the count of n/2 and 3n+1 operations [1531812, 854697].

import gmpy

def collatz(a):
    ONE = gmpy.mpz(1)
    TWO = gmpy.mpz(2)
    TWE = gmpy.mpz(3)
    a = gmpy.mpz(a)
    t = 0
    u = 0
    done = 0
    while done==0:
        f = gmpy.scan1(a,0)
        if f>0:
            a = a >> f
            u += f
        else:
            if a==1:
                done = 1
            else:
                a = a*TWE + ONE
                t += 1
    return [u,t]

def collatz2(a):
    t = 0
    u = 0
    done = 0
    while done==0:
        f = gmpy.scan1(a,0)
        if f>0:
            a = a >> f
            u += f
        else:
            if a==1:
                done = 1
            else:
                a = a*3 + 1
                t += 1
    return [u,t]

def test():
    collatz(2**177149-1)

def test2():
    collatz2(2**177149-1)

if __name__=='__main__':
    from timeit import Timer
    t = Timer("a = test()", "from __main__ import test")
    u = Timer("b = test2()", "from __main__ import test2")
    print t.timeit(10)
    print u.timeit(10)

##    723.430377542
##    *ABORTED after 20 hours*

>
> timeit.Timer("x == y", "import gmpy; x = 1; y = gmpy.mpz(1)").repeat()
> timeit.Timer("x == y", "x = 1; y = 1").repeat()
>
> I don't have gmpy installed here,

Good Lord! How do you solve a Linear Congruence? :-)

> so I can't time it, but I look forward
> to seeing the results, if you would be so kind.

I had a lot of trouble with this, but I think I finally got a
handle on it. I had to abort the previous test after 20+ hours
and abort a second test (once I figured out to do your example)
on another machine after 14+ hours. I had forgotten just how
significant the difference is.

import timeit

##    t = timeit.Timer("a == b", "a = 1; b = 1")
##    u = timeit.Timer("c == d", "import gmpy; c = 1; d =
gmpy.mpz(1)")
##    t.repeat()
##    [0.22317417437132372, 0.22519314605627253, 0.22474588250741367]
##    u.repeat()
##    [0.59943819675405763, 0.5962260566636246, 0.60122920650529466]

Unfortunately, this is not a very useful test, since mpz
coercion appears to vary ny the size of the number involved.
Although changing t to

##    t = timeit.Timer("a == b", "a = 2**177149-1; b = 2**177149-1")

still produces tractable results
##    t.repeat()
##    [36.323597552202841, 34.727026758987506, 34.574566320579862]

the same can't be said for mpz coercion:

##    u = timeit.Timer("c == d", "import gmpy; c = 2**177149-1; d =
gmpy.mpz(2**177149-1)")
##    u.repeat()
##    *ABORTED after 14 hours*

So I changed it to (using yet a third machine)

for i in xrange(8):
    e = 2*i*100
    n = 2**e-1
    r = 'a = %d; b = %d' % (n,n)
    s = 'import gmpy; a = %d; b = gmpy.mpz(%d)' % (n,n)
    print 'For 2**e-1',e
    t = timeit.Timer("a == b",r)
    u = timeit.Timer("a == b",s)
    print t.repeat()
    print u.repeat()
    print

which clearly shows the growth rate of the mpz coercion.

##    int==int vs. int==mpz
##
##    For 2**e-1 0
##    [0.054264941118974445, 0.054553378257723141,
0.054355515455681791]
##    [0.16161957500399435, 0.16188363643198839, 0.16197491752897064]
##
##    For 2**e-1 200
##    [0.093393746299376912, 0.093660961833065492,
0.092977494572419439]
##    [1.0425794607193544, 1.0436544844503342, 1.0451038279715417]
##
##    For 2**e-1 400
##    [0.10496130299527184, 0.10528292779203152, 0.10497603593951155]
##    [2.2687503839249636, 2.2685411490493506, 2.2691453463783233]
##
##    For 2**e-1 600
##    [0.11724617625774236, 0.11701867087715279, 0.11747874550051129]
##    [3.616420796797021, 3.617562537946073, 3.6152373342355801]
##
##    For 2**e-1 800
##    [0.13156379733273482, 0.1310266632832402, 0.13168082630802047]
##    [5.2398534562645089, 5.2389728893525458, 5.2353889230364388]
##
##    For 2**e-1 1000
##    [0.153719968797283, 0.15383679852633492, 0.15352625633217798]
##    [6.967458038928207, 6.9640038947002409, 6.9675019294931388]
##
##    For 2**e-1 1200
##    [0.16716219584402126, 0.16743472335786436, 0.16782637005291434]
##    [11.603391791430532, 11.601063020084396, 11.603106936964878]
##
##    For 2**e-1 1400
##    [0.179120966908215, 0.17908259508838853, 0.17934175430681876]
##    [14.753954507946347, 14.755623642634944, 14.756064585859164]

And, just for laughs, I compared mpzs to mpzs,

    s = 'import gmpy; a = gmpy.mpz(%d); b = gmpy.mpz(%d)' % (n,n)

which ended up faster than comparing ints to ints.

##    int==int vs. mpz==mpz
##
##    For 2**e-1 0
##    [0.054301433257206225, 0.054502401293220933,
0.054274144039999611]
##    [0.12487657446828507, 0.099130500653189346,
0.094799646619862565]
##
##    For 2**e-1 200
##    [0.10013419046813476, 0.10156139134030695, 0.10151083166511599]
##    [0.091683807483012414, 0.091326269489948375,
0.091261281378934411]
##
##    For 2**e-1 400
##    [0.10716937998703036, 0.10704403530042028, 0.10705119312788414]
##    [0.099165500324245093, 0.097540568227742153,
0.10131808159697742]
##
##    For 2**e-1 600
##    [0.12060785142996777, 0.11720683828159517, 0.11800506010281886]
##    [0.11328210449149934, 0.1146064679843235, 0.11307050873582014]
##
##    For 2**e-1 800
##    [0.12996358680839437, 0.13021352430898236, 0.12973684081916526]
##    [0.12344120825932059, 0.11454960385710677, 0.12339954699673861]
##
##    For 2**e-1 1000
##    [0.15328649918703752, 0.15362917265815135, 0.15313422618208516]
##    [0.12753811336359666, 0.12534907002753748, 0.12588097104350471]
##
##    For 2**e-1 1200
##    [0.16756264696760326, 0.16747118166182684, 0.167885034915086]
##    [0.12162660501311073, 0.13368267591470051, 0.13387503876843265]
##
##    For 2**e-1 1400
##    [0.17867761017283623, 0.17829534684824377, 0.17826312158720281]
##    [0.13718761665773815, 0.13779106963280441, 0.13708166276632738]

>
> Even if it is terribly slow, that's just an implementation detail. What
> happens when Python 2.7 comes out (or Python 3.0 or Python 99.78) and
> coercion from int to mpz is lightning fast? Would you then say "Well,
> int(1) and mpz(1) used to be unequal, but now they are equal?".

Are you saying I should be unconcerned about implementation details?
That it's silly of me to be concerned about implementation side
effects
due to mis-matched types?

>
> Me, I'd say they always were equal, but previously it used to be slow to
> coerce one to the other.

So, when you're giving advice to the OP you don't feel any need to
point
this out? That's all I'm trying to do, supply some "yes, but you
should
be aware of..." commentary.

>
> > The absolute stupidest
> > thing you can do (assuming n is an mpz) is:
>
> > while n >1:
> >     if n % 2 == 0:
> >         n = n/2
> >     else:
> >         n = 3*n + 1
>
> Oh, I can think of much stupider things to do.
>
> while len([math.sin(random.random()) for i in range(n)[:]][:]) > 1:
>     if len( "+" * \
>     int(len([math.cos(time.time()) for i in \
>     range(1000, n+1000)[:]][:])/2.0)) == 0:
>         n = len([math.pi**100/i for i in range(n) if i % 2 == 1][:])
>     else:
>         s = '+'
>         for i in range(n - 1):
>             s += '+'
>         s += s[:] + ''.join(reversed(s[:]))
>         s += s[:].replace('+', '-')[0:1]
>         n = s[:].count('+') + s[:].count('-')
>
> > You should ALWAYS do:
>
> > ZED = gmpy.mpz(0)
> > ONE = gmpy.mpz(1)
> > TWO = gmpy.mpz(2)
> > TWE = gmpy.mpz(3)
>
> > while n >ONE:
> >     if n % TWO == ZED:
> >         n = n/TWO
> >     else:
> >         n = TWE*n + ONE
>
> > This way, no coercion is performed.
>
> I know that algorithm, but I don't remember what it is called...

The Collatz Conjecture. If true, it means the while loop
terminates for any n.

>
> In any case, what you describe is a local optimization. Its probably a
> good optimization, but in no way, shape or form does it imply that mpz(1)
> is not equal to 1.

It's a different type. It is an exception to the "different types
compare
False" rule. That exception is not without cost, the type mis-match
causes coercion.

>
> >> > And yet a==d returns True. So why doesn't b==c
> >> > also return True, they both have a 1 at index position 0?
>
> >> Why should they return true just because the contents are the same?
>
> > Why should the int 1 return True when compared to mpz(1)?
>
> Because they both represent the same mathematical number, where as a list
> containing 1 and a tuple containing 1 are different containers. Even if
> the contents are the same, lists aren't equal to tuples.
>
> > a = [1]
> > b = [1]
>
> > returns True for a==b?
>
> That's because both are the same kind of container, and they both have the
> same contents.
>
> > After all, it returns false if b is [2],
> > so it looks at the content in this case. So for numerics,
> > it's the value that matters, not the type. And this creates
> > a false sense of "equality" when a==d returns True.
>
> There's nothing false about it. Ask any mathematician, does 1 equal 1.0,
> and they will say "of course".

And if you ask any mathematician, he'll say that (1,) is equal to [1].
That's the difference between a mathematician and a programmer.
A programmer will say "of course not, the int has to be coered."

>
> >> A bag
> >> of shoes is not the same as a box of shoes, even if they are the same
> >> shoes.
>
> > Exactly. For the very reason I show above. The fact that the int
> > has the same shoes as the mpz doesn't mean the int should be
> > used, it has to be coerced.
>

> Ints are not containers. An int doesn't contain values, an int is the
> value.
>
> Numeric values are automatically coerced because that's more practical.
> That's a design decision, and it works well.

And I'm not saying it shouldn't be that way. But when I wrote my
Collatz Functions library, I wasn't aware of the performance issues
when doing millions of loop cycles with numbers having millions
of digits. I only found that out later. Would I have gotten a
proper answer on this newgroup had I asked here? Sure doesn't look
like it.

BTW, in reviewing my Collatz Functions library, I noticed a coercion
I had overlooked, so as a result of this discussion, my library is
now slightly faster. So some good comes out of this argument after
all.

>
> As for gmpy.mpz, since equality tests are completely under the control of
> the class author, the gmpy authors obviously wanted mpz values to compare
> equal with ints.

And they chose to do a silent coercion rather than raise a type
exception.
It says right in the gmpy documentation that this coercion will be
performed.
What it DOESN'T say is what the implications of this silent coercion
are.

>
> >> Since both lists and tuples are containers, neither are strings or
> >> numeric types, so the earlier rule applies: they are different types, so
> >> they can't be equal.
>
> > But you can't trust a==d returning True to mean a and d are
> > "equal".
>
> What does it mean then?

It means they are mathematically equivalent, which is not the same as
being programatically equivalent. Mathematical equivalency is what
most
people want most of the time. Not all of the people all of the time,
however. For example, I can calculate my Hailstone Function
parameters
using either a list or a tuple:

>>> import collatz_functions as cf
>>> print cf.calc_xyz([1,2])
(mpz(8), mpz(9), mpz(5))
>>> print cf.calc_xyz((1,2))
(mpz(8), mpz(9), mpz(5))

But [1,2]==(1,2) yields False, so although they are not equal,
they ARE interchangeable in this application because they are
mathematically equivalent.

>
> > To say the comparison means the two objects are
> > equal is misleading, in other words, wrong. It only takes one
> > turd to spoil the whole punchbowl.
>
> >> gmpy.mpz(1) on the other hand, is both a numeric type and a custom class.
> >> It is free to define equal any way that makes sense, and it treats itself
> >> as a numeric type and therefore says that it is equal to 1, just like 1.0
> >> and 1+0j are equal to 1.
>
> > They are equal in the mathematical sense, but not otherwise.
>
> Since they are mathematical values, what other sense is meaningful?
>
> > And to think that makes no difference is to be naive.
>
> I never said that there was no efficiency differences. Comparing X with Y
> might take 0.02ms or it could take 2ms depending on how much work needs
> to be done. I just don't understand why you think that has a bearing on
> whether they are equal or not.

The bearing it has matters when you're writing a function library that
you want to execute efficiently.

>
> --
> Steven.