[Tutor] Test Question

Dave Angel davea at davea.name
Mon Jul 1 13:01:11 CEST 2013


On 07/01/2013 05:58 AM, John Steedman wrote:
> Good morning all,
>
> A question that I am unsure about.  I THINK I have the basics, but I am not
> sure and remain curious.
>
> 1. What does this mean?
>>>> if my_object in my_sequence:
> ...

We can be sure what 'if' and 'in' mean, but not the other two items. 
They're just names, and the behavior of the test will depend on the 
types of the objects the names are bound to.  By calling it my_sequence, 
you're implying that the object is not only a collection, but an ordered 
one.  So if we trust the names, this will iterate through the sequence, 
testing each item in the sequence against my_object for "==" and stop 
when a match is found.  If one is found the if clause will execute, and 
if the sequence is exhausted without finding one, the else clause (or 
equivalent) will execute.


>
> 2. What can go wrong with this? What should a code review pick up on?

The main thing that can go wrong is that the objects might not match the 
names used.  For example, if the name my_sequence is bound to an int, 
you'll get a runtime exception.

Second, if the items look similar (eg. floating point, but not limited 
to that), but aren't actually equal, you could get a surprise.  For 
example if my_object is a byte string, and one of the items in 
my_sequence is a unicode string representing exactly the same thing. 
Python 2 will frequently consider them the same, and Python 3 will know 
they're different.

Third if my_object is something that doesn't equal anything else, such 
as a floating point NAN.  Two NANs are not equal, and a NAN is not even 
equal to itself.

By a different definition of 'wrong' if the sequence is quite large, and 
if all its items are hashable and it may have been faster to pass a dict 
instead of a sequence.  And if my_sequence is a dict, it'll probably be 
faster, but the name is a misleading one.

>
> I believe that "my_sequence" might be a either container class or a
> sequence type. An effective __hash__ function would be required for each
> "my_object".

"in" doesn't care if there's a __hash__ function.  It just cares if the 
collection has a __contains__() method.  If the collection is a dict, 
the dict will enforce whatever constraints it needs.  If the collection 
is a list, no has is needed, but the __contains__() method will probably 
be slower.  In an arbitrary sequence it won't have a __contains__() 
method, and I believe 'in' will iterate.

> I HTINK you'd need to avoid using floating point variables
> that might round incorrectly.
>

One of the issues already covered.

> Are there other issues?
>

Those are all I can think of off the top of my head.


-- 
DaveA


More information about the Tutor mailing list