Why operations between dict views return a set and not a frozenset?

Chris Angelico rosuav at gmail.com
Wed Jan 5 08:14:10 EST 2022


On Thu, Jan 6, 2022 at 12:05 AM Marco Sulla
<Marco.Sulla.Python at gmail.com> wrote:
>
> On Wed, 5 Jan 2022 at 00:54, Chris Angelico <rosuav at gmail.com> wrote:
> > That's because a tuple is the correct data type when returning two
> > distinct items. It's not a list that has two elements in it; it's a
> > tuple of (key, value). Immutability is irrelevant.
>
> Immutability is irrelevant, speed no. A tuple is faster than a list
> and more compact. Also frozenset is faster than set. Indeed CPython
> optimises internally a
>
> for x in {1, 2, 3}
>
> transforming the set in a frozenset for a matter of speed. That's why
> tuple is usually preferred. I expected the same for frozenset

That's an entirely invisible optimization, but it's more than just
"frozenset is faster than set". It's that a frozenset or tuple can be
stored as a function's constants, which is a massive difference.

In fact, the two data types are virtually identical in performance once created:

rosuav at sikorsky:~$ python3 -m timeit -s "stuff = {1,2,3}" "for x in stuff: pass"
5000000 loops, best of 5: 46.2 nsec per loop
rosuav at sikorsky:~$ python3 -m timeit -s "stuff = frozenset({1,2,3})"
"for x in stuff: pass"
5000000 loops, best of 5: 46.7 nsec per loop
rosuav at sikorsky:~$ python3 -m timeit -s "stuff = set(range(10000))"
"for x in stuff: pass"
5000 loops, best of 5: 82.1 usec per loop
rosuav at sikorsky:~$ python3 -m timeit -s "stuff =
frozenset(range(10000))" "for x in stuff: pass"
5000 loops, best of 5: 81.3 usec per loop

Mutability is irrelevant, and so is the speed of the data type. Having
set operations on keys views return frozensets wouldn't improve
anything.

> > Got any examples of variable-length sequences?
>
> function positional args are tuples, for example.
>
> > Usually a tuple is a
> > structure, not just a sequence.
>
> ....eh? Are you talking about the underlying C code?

No, I'm talking about purpose. In general, a list contains a sequence
of things whose order matters but which can be removed from - you can
remove one item from your shopping list and the rest still mean what
they were. A tuple has a specific set of things in a specific order,
like Cartesian coordinates. You can't just remove the x coordinate
from an (x,y,z) tuple without fundamentally changing what it is.

Function positional arguments aren't interchangeable, so it makes
sense to have them as a tuple. Removing the first argument would
redefine what all the others mean, so a tuple is correct - it's not
just a list that's been made immutable for performance's sake.
(Function *keyword* arguments, on the other hand, are different; as
long as the mapping from keys to values is maintained, you can remove
some of them and pass the rest on, without fundamentally changing
their meaning.)

Do you have any examples of actually variable-length sequences that
are tuples for speed?

Measure before claiming a speed difference.

ChrisA


More information about the Python-list mailing list