Python Mystery Theatre -- Episode 3: Extend this

Raymond Hettinger vze4rx4y at verizon.net
Wed Jul 23 15:02:41 EDT 2003


[Chris Reedy] <
> I tried these out myself after making my guesses. Here's what I found:

[Raymond]
> > def real(z):
> >     return hasattr(z, 'real') and z.real or z
> > def imag(z):
> >     return hasattr(z, 'imag') and z.imag or 0
> > for z in 3+4j, 5, 0+6j, 7.0, 8+0j, 9L:
> >     print real(z), imag(z)
[Chris]
> I spotted the error in this one (I've been bitten by this more than
> once) if z.real or z.imag is false, I get z or 0. That results is two
> problems:
>
> For 0+6j, the answer is 6j and 6.0, definitely not what was expected.
> For 8+0j, the answer is 8.0 and 0, not 8.0 and 0.0. This is close to
> what was expected and, if a bug at all, would be a rather subtle one.
>
> I would certainly expect that Raymond's been bitten by this one. I would
> guess that every experienced Python programmers been bitten by some form
> of this one.

Yes, this snippet came directly from an early version of my matrix algebra
tool that has been in production for several years.  It is essentially the
class mistake with and/or.  The code has been replaced with a try:
except AttributeError:.    The point of the code was to allow complex operators
to be applied to a matrix of mixed types and only coerce to complex when
necessary.


> > ACT II ---------------------
> > uniq = {}
> > for k in (10, 'ten', 10+0j, u'ten', 'Ten', 't'+'en', 10.0):
> >     uniq(k) = 1
> > print len(uniq)
>
> I had to experiment with this one. I guessed 3, which turned out to be
> right. I was a little surprised that 'ten' and u'ten' hash the same.
> However, after thinking about that, I decided I was happy with that result.
>
> I was more surprised that 10 and 10.0 came out the same. One of these
> days I'll take the time to look up the hash algorithm for floats to see
> how this was achieved.

The lesson is that objects that compare equal are expected to hash equal.
Since 10 == 10+0j == 10.0, they will hash to the same value.
Since 'ten' == u'ten' == 't'+'en', they will hash to the same value.
Since 'Ten' does not compare equal to the others (case-sensitive), it is
distinct.



> > ACT III ---------------------
> > s = 'abracadabra'
> > for i in [3, 2, 1, 0]:
> >         print s[i:], s[-i:]
>
> Got this one the first time. -0 == 0, so s[-0:] == s[0:] == s. I'll bet
> that this is one that burned Raymond.

Yes.  This came-up in an early version of random.shuffle.  I was
looping backwards over an array and was choosing samples from
the range [0:-i].  The error did not surface in testing because an
alternate selection method is used in cases that include i==0.  The
aspiring bug was caught by Tim during a code review.

Things like this are particularly hard to spot when your comprehensive
test code runs flawlessly.


>
> > ACT IV ---------------------
> > pets = list('cat ')
> > pets += 'dog '
> > pets.extend('fish ')
> > print pets + 'python'
>
> I guessed ['c', 'a', 't', ' ', 'd', 'o', 'g', ' ', 'f', 'i', 's', 'h', '
> ', 'p', 'y', 't', 'h', 'o', 'n']. I tried it to discover that the print
> statement at the end throws. I decided that not concatenating strings to
>   lists doesn't bother me since I would believe that this is an error
> more often than not and pets + list('python') would hopefully make the
> intent clear.

That is in-fact the reason for it throwing an exception.

> On the other hand, if I saw pets = list('cat') in code I
> was reviewing I suspect that I would want to assure myself that ['c',
> 'a', 't'] was what was intended and not ['cat'].

That is a proper reaction to a code smell.

> I also much bothered by pets += 'dog ' working but pets + 'python' not.
> What's the justification for the checking in one place and not the other?

Because, as you pointed out, the second form is often an error
but the first is usually intended.  Also, I think the += behavior
got set in stone before the difference between the two was noticed.


> > INTERMISSION (with output included, oh yeah! ------------
> >
> >>>>import operator
> >>>>1 - reduce(operator.add, [1e-7]* 10**7)
> >
> > 2.4983004554002264e-010
>
> Round off error. Welcome to the world of floating point arithmetic!

Representation error amplified by the instantaneous relays of the
computer (phrase from StarTrek).

The important point is that errors in the last bit can
accumulate rather than cancel.

> I predicted that the
> result of both print statements would be:
>
> 5 ['shirt', 'bananas', 'nuts', 'socks', 'pretzels']
>
> When I tried it I got:
>
> 2 ['shirt', 'bananas', 'nuts', 'socks', 'pretzels']
> 3 ['shirt', 'bananas', 'nuts', 'socks', 'pretzels']
>
> After thinking about it (and reviewing the Reference Manual), I realized
> that self.contents += items does an in place update of Bin.contents, but
> the augmented assignment self.numitems += 1, is the same as
> self.numitems = self.numitems+1 (since there is no in place update for
> integers or immutable objects in general), which has the result of
> creating the attribute numitems on the laundry and groceries instances.

An in-place add is really a read and a write.  The read finds the name in
the enclosing scope and the write puts the result in the local scope.  That
is why the numitems count works.  But for mutables, the in-place add
modifies the existing copy which happens to be shared by both instances.



> > ACT VI -----------------------------------------
> > print "What is the technical name for this algorithm or transformation?"
> > a = range(0, 100, 10)
> > import random
> > random.shuffle(a)
> > print "Input:", a
> > swaps = 0
> > for i in range(len(a)):
> >     for j in range(i+1, len(a)):
> >     if a[i] > a[j]:
> >         i, j, a[i], a[j], swaps = j, i, a[j], a[i], swaps+1
> > print "Output:", a
> > print "Workload;", swaps

> The assignment i,j, ... = j,i, ... . Kept me guessing. Why do that?

I gave to much a clue here.


> At this point I convinced myself that this was an interchange sort gone
> wrong and would just produce garbage. When I tried it, I discovered that
>   the correct technical name for this program is the identity
> transformation. That is, nothing happened.

Yes.  Identity is real answer.  Selection sort was the bait answer.

>
> Reexamining the code, I realized that flipping the two indices has the
> effect of assigning the two array elements back to themselves, resulting
> in no effect.
>
> About this point in time, I also realized that the sort should work if
> you take away the spurious assignments to i and j. That is,
>
> a[i], a[j], swaps = a[j], a[i], swaps
. . .
> What's the real lesson here?


Unfortunately, the example was too simple in that the i, j interchange
was superfluous.  I should have used the indices in a subsequent
line, so that folks would be guided to the more interesting fix:

    a[i], a[j], swaps, i, j = a[j], a[i], swaps+1, j, i

The point of the exercise was to note that the order of arguments
on the left-hand side matters -- they are assigned left to right.
The right hand side gets evaluated first (in left-to-right order),
the results are saved in a tuple, then the left hand assignments
are made one by one.  If one of the assignments changes an
index or key, the subsequent assignments are affected.


> OK, which three did Raymond personnally stumble over. I'll guess I, III,
> and VI. Why VI? Well, because V, while subtle in some ways, doesn't look
> to me like one you'd run into until you knew what you were doing, at
> which point: why this missing __init__ method. VI looks like something
> you might do by accident while trying to write a sort routine.

You guessed exactly right; however, number VI is a contrived
example.  The actual code was a heap sort demo derived from
from logic assertions and program transformations.  Multiple
assignment is frequently used in the type of design so that
invariants are maintained at every step.


>    Chris


Great job.  Nice analysis.
Also, I enjoyed the playful exploratory style
which captured the spirit of what the mystery
series is all about.


Raymond Hettinger






More information about the Python-list mailing list