[Tutor] Problem with if statements and else statements

Peter Otten __peter__ at web.de
Tue May 30 06:06:47 EDT 2017


Cameron Simpson wrote:

> On 29May2017 01:17, Alan Gauld <alan.gauld at yahoo.co.uk> wrote:
>>On 29/05/17 00:12, Alex Kleider wrote:
>>> Would
>>>>>> if Month in {'January', '1'}:
>>>
>>> be even better?  (regarding efficiency perhaps? Trivial point, I know,
>>> but just wondering.)
>>
>>If in doubt try it out and profile/time it.
>>
>>But I don't think it will make much difference since ultimately
>>it still has to test each value (although a hashing algorithm
>>may be involved that works on partial matches...) But if in
>>doubt...
> 
> Hi Alex,
> 
> As written it should be a bit slower: to construct a set each member get
> tested for presence. The cost is in making the set, not in searching it.

No, CPython is a bit smarter than that:

>>> dis.dis('if m in {"1", "January"}: pass')
  1           0 LOAD_NAME                0 (m)
              2 LOAD_CONST               3 (frozenset({'1', 'January'}))
              4 COMPARE_OP               6 (in)
              6 POP_JUMP_IF_FALSE        8
        >>    8 LOAD_CONST               2 (None)
             10 RETURN_VALUE

However, there seems to be limit:

>>> def check(n):
...     f = io.StringIO()
...     dis.dis("if m in {%s}: pass" % ",".join(map(str, range(n))), file=f)
...     return f.getvalue().count("LOAD_CONST")
... 
>>> check(123)
2
>>> check(124)
125

This is python 3.6; for 3.5 and 3.4 I found a maximum length of 80 using the 
same method. Further tests indicate that you cannot have two sets of length 
80.

> _However_, supposing your program were doing this a lot. You might well
> have a global (or, better, long lived shared object) containing a set that
> has already been constructed. Then:
> 
>   if Month in the_set:
> 
> is very fast; constant time. 

Overall, using globals may still be a good idea. Even if there's often no 
direct performance reward (looking up a global is even a tad slower)
you gain predictability.

> Whereas as you would expect, checking a list
> is linear with the size of the list.
> 
> So, using a list:
> 
>   seen = []
>   for item in some_item_generator():
>     if item in seen:
>       continue
>     seen.append(item)
>     ... do stuff with item, which is new ...
> 
> The cost of the "if" goes up linearly as you add more items.
> 
> Using a set:
> 
>   seen = {}
>   for item in some_item_generator():
>     if item in seen:
>       continue
>     seen.add(item)
>     ... do stuff with item, which is new ...
> 
> The second version will be much more effiient as the "seen" set grows; the
> lookup time on the set is essentially O(1) (constant time).
> 
> But for an ad hoc 2 element list as in your original example the
> difference will be pretty small; making the 2 element set _should_ be
> slightly more expensive, and isn't the common idiom (==> less readable).
> Personally I use:
> 
>   if value in ('a', 'b', 'c'):
> 
> BTW, in Python we tend to use named like "Fred" for classes (or
> factories), and "fred" for regular variables. And "FRED" for things that
> would be constants in other languages. Eg:
> 
>   MAX_THINGS = 16
> 
>   class Foo:
>     ....
> 
>   def FooBah(x):
>     return Foo(x, style="bah")
> 
>   for fred in ....:
> 
> Cheers,
> Cameron Simpson <cs at zip.com.au>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor




More information about the Tutor mailing list