An oddity in list comparison and element assignment

Fri Jun 2 15:40:42 EDT 2006

Perhaps a little background to my original post will defuse some of the
controversy.  While working during an airline flight, I ran into an
unexpected outcome from using the * replication operator to initialize
an array of lists.  When I modified a single element of the array an
entire column changed.  Having no reference books or internet access
available, I tried to understand what was going on by creating some
small arrays on the command line to see if there was a difference
between explicit initialization and initialization with range() and the
* operator.

The arrays looked identical when printed and a == b returned True. Yet
the arrays were clearly not equivalent because mutating the
corresponding elements produced different outcomes.  I put the problem
aside until the next day when I looked at it some more and and created
the example script I posted.  Just as I was about to hit the Send
button, I realized that the * operator must have been creating
references instead of copies.  And then I appended the now much debated
opinion that == should have detected the difference.

(As an aside, may I point out that Python In A Nutshell states on page
46 "The result of S*n or n*S is the concatenation of n copies of S". It
might be well to put a warning in a future edition that this is not
strictly the case.)

My viewpoint is that of a working professional software consultant.
I'm essentially a pragmatist with no strong 'religious' views about
languages and methodologies.  As I noted in an earlier reply, I don't
realistically expect Python to change the behavior of the == operator.
I do think that a problem arose when it was adopted from C and extended
to allow comparison of containers.  In C, you can use it to compare
integers, floats, and pointers and everyone understands that p==q does
not imply *p == *q.  Moreover, compilers issue warnings about
comparisons between different types.

Basically, I'm looking for simple diagnostic tools that make it easy to
understand what's really going on when code produces an unexpected
result.  A 'strengthened equivalence' operator, to use your terminology
would have been useful to me.

As to constructing pseudocode for such an operator, I've appended a
working script below.  The counterexamples and questions from Slawomir,
Maric, and Jim were really useful in sharpening my thinking about the
matter.  I'm sure there are many ways to break it.  For example, tuples
have no index method, so one would have to be written. Still, I hope it
will serve to move the discussion beyond terms like 'crazy' and
'handwaving' and 'ill-founded'.  I haven't used such perjoratives in
any of my posts and would appreciate the same courtesy.

Cheers,
Mike

'''
StrongEquality -- a first cut at the definition proposed by M. Ellis.
Author: Michael F. Ellis, Ellis & Grant, Inc.
'''

def indices(item,seq):
   '''Utility function that returns a list of indices where item occurs
in seq'''
   result=[]
   for i in xrange(len(seq)):
       try:
          result.append(i+seq[i:].index(item))
       except ValueError:
          return result

def StrongEquality(a,b):
     '''True if a and b are numerically and "structurally" equal'''
     if a is b: return True
     if a != b: return False
     ## At this point we know a and b have the same length and
     ## evaluate numerically equivalent.  We now need to figure out
     ## whether there are any references to identical objects in
non-corresponding
     ## positions of a & b (per Slawomir's example). We also need to
inspect
     ## a and b for non-matching patterns of identical references (per
my example)
     ida=[] ; idb=[]
     for i in xrange(len(a)):
         if a[i] is b[i]:
             continue
         if isinstance(a[i], (int, float, str)) and isinstance(b[i],
(int, float, str)):
             continue        ## we already know they're numerically
equal

         ida.append(id(a[i]))
         idb.append(id(b[i]))
         ## We know that ida[n] is not idb[n] for all n because we
omitted all
         ## cases where a is b.  Therefore Slawomir's example is
detected if
         ## any id appears in both lists.
         for n in ida:
             if n in idb: return False
         ## Next we test for my example.  I'm sure this can be coded
more
         ## more elegantly ...
         for j in xrange(len(ida)):
              if indices(ida[j],ida) != indices(idb[j],idb): return
False
         ## Lastly, recurse ...
         if not StrongEquality(a[i],b[i]): return False

     return True

if __name__=='__main__':
   ## Rudimentary test cases
   assert StrongEquality(1,1)
   assert not StrongEquality(0,1)

   ## Slawomir's example
   x, y, z = [1],[1],[1]
   a, b = [x,y], [y,z]
   c, d = [[1],[1]], [[1],[1]]
   assert StrongEquality(c,d)
   assert a == b
   assert not StrongEquality(a,b)

   ## My example
   a =[[[1,2],[1,2]],[[1,2],[1,2]]]
   b = [[range(1,3)]*2]*2
   assert a==b
   assert not StrongEquality(a,b)

   print "All tests ok."

Alex Martelli wrote:
> <michael.f.ellis at gmail.com> wrote:
>
> > Hi Alex,
> > With all due respect to your well-deserved standing in the Python
> > community, I'm not convinced that equality shouldn't imply invariance
> > under identical operations.
>
> So, why aren't you satisfying my request?  Provide a simple concrete
> definition of what your idea of equality WOULD behave like.  I notice
> that your lack of response stands out like a sore thumb -- all you're
> providing is a set of constraints you desire and a collection of
> illfounded analogies and handwaving.  Traditional mathematics does not
> support the concept of "change", nor the distinction between equality
> and identity; the "real world" has no way to define what modifications
> are "identical" except by their effects (if the results differ, either
> the original equality was ill-posited or the modifications were not
> "identical").  But the real world DOES have the concept of "performing
> exactly the same sequence of operational steps", and, by THAT definition
> of "equal modifications", then your assertion:
>
> > make identical modifications to the engines of two identical
> > automobiles, I expect the difference in performance to be identical.
>
> is ill-founded -- or, rather, your *expectation* may be ill-founded.
>
> Take two systems of any significant complexity that are similar enough
> to be called "identical" by ALL observers (because trying to ascertain
> the differences, if any, would inevitably perturb the systems
> irretrievably by Heisenberg's effect -- i.e., there are no OBSERVABLE
> differences, which by Occam's Razor requires you to posit the systems
> are equal, because you cannot prove otherwise -- and entities must not
> be multiplied beyond necessity, so supposing that "observably equal"
> systems are indeed equal is Occam-compliant).
>
> Now, perform "identical" (ditto) modifications: in the real world, due
> to quantum effects, there WILL be sub-observable differences in what
> you're doing to the first one and to the second one.  If the systems are
> unstable to start with, they may well amplify those differences to
> observable proportions -- and there you are: the effect of the "equal"
> change on "equal" system may easily become observably unequal.
> Philosophically, you may classify this as an "observation" of both
> systems, which reasoning backwards lead you to posit that either the
> systems were NOT equal to start with or the modifications weren't...
> that is, IF you also posit determinism, which, as well we know, is an
> unwarrantedly strong hypothesis for systems in which the differences at
> quantum level matter.  Feel free to follow Einstein (and diverse
> light-years away from the last few decades of physics) in positing that
> there MUST exist "hidden variables" (unobservable except maybe in
> destructive, irreversible ways) explaining the difference -- I'll stick
> with the mainstream of physics and claim your expectation was badly
> founded to start with.
>
> I can debate epistemology with the best, but this is not really the
> proper forum for this -- starting with the crucial distinction, what it
> means, in mathematics OR in the real world, to state that two systems
> are "equal but NOT identical"?  In the end, such debates tend to prove
> rather futile and unproductive, however.
>
> In the world of programming languages, we cut through the chase by
> requesting *operational* (Brouwer-ian, mathematically speaking)
> definitions.  Provide the *operational* definition of how you WANT
> equality checking to work, contrast it with my simple two-lines one, and
> THEN we can have a meaningful debate of which one is the correct one to
> use in the core of a programming language that has the (blessing and
> curse of) mutable data objects...
> 
> 
> Alex