Which type should be used when testing static structure appartenance

Wed Nov 18 08:42:32 EST 2015

On Thu, Nov 19, 2015 at 12:30 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Wed, 18 Nov 2015 11:40 pm, Chris Angelico wrote:
>> All the questions of performance should be
>> secondary to code clarity, though;
>
> "All"? Surely not.

The OP's example was checking if a string was equal to either of two
strings. Even if that's in a tight loop, the performance difference
between the various options is negligible.

The "all" is a little misleading (of course there are times when you
warp your code for the sake of performance), but I was talking about
this example, where it's basically coming down to microbenchmarks.

>> so I would say the choices are: Set
>> literal if available, else tuple. Forget the performance.
>
> It seems rather strange to argue that we should ignore performance when the
> whole reason for using sets in the first place is for performance.

They do perform well, but that's not the main point - not when you're
working with just two strings. Of course, when you can get performance
AND readability, it's perfect. That doesn't happen with Py2 sets, but
it does with Python 3:

rosuav at sikorsky:~$ python -m timeit -s "x='asdf'" "x in {'asdf','qwer'}"
10000000 loops, best of 3: 0.12 usec per loop
rosuav at sikorsky:~$ python -m timeit -s "x='asdf'" "x in ('asdf','qwer')"
10000000 loops, best of 3: 0.0344 usec per loop
rosuav at sikorsky:~$ python -m timeit -s "x='asdf'" "x=='asdf' or x=='qwer'"
10000000 loops, best of 3: 0.0392 usec per loop
rosuav at sikorsky:~$ python3 -m timeit -s "x='asdf'" "x in {'asdf','qwer'}"
10000000 loops, best of 3: 0.0356 usec per loop
rosuav at sikorsky:~$ python3 -m timeit -s "x='asdf'" "x in ('asdf','qwer')"
10000000 loops, best of 3: 0.0342 usec per loop
rosuav at sikorsky:~$ python3 -m timeit -s "x='asdf'" "x=='asdf' or x=='qwer'"
10000000 loops, best of 3: 0.0418 usec per loop

No set construction in Py3 - the optimizer figures out that you don't
need mutability, and uses a constant frozenset. (Both versions do this
with list->tuple.) Despite the performance hit from using a set in
Py2, though, I would still advocate its use (assuming you don't need
to support 2.6 or older), because it accurately represents the
*concept* of "is this any one of these".

ChrisA