Why the list creates in two different ways? Does it cause by the mutability of its elements? Where the Python document explains it?

Avi Gross avigross at verizon.net
Tue Jun 15 22:43:14 EDT 2021


Greg,

My point was not to ASK what python does as much as to ask why it matters to
anyone which way it does it. Using less space at absolutely no real expense
is generally a plus. Having a compiler work too hard, or even ask the code
to work too hard, is often  a minus.

If I initialized the tuples by calling f(5) and g(5) and so on, the compiler
might not even be easily able to figure out that they all return the same
thing. So should it make a run-time addition to the code so that after
calculating the second, it should look around and if it matches the first,
combine them? Again, I have seen languages where the implementation is to
have exactly one copy of each unique string of characters. That can be
useful but if it is done by creating some data structure and searching
through it even when we have millions of unique strings, ...

Heck, one excellent reason to use "constants" in your code (i.e. something
immutable) is to allow such optimizations. In some programs, garbage
collection is very active as things get copied and modified and parts
released. So it can be worth some effort to mark some items as
copy-on-change as they are often abandoned without changes. The common
paradigm I often see is you bring in a structure like a data.frame with
dozens of columns and then make a new variable containing a subset, perhaps
in a different order and perhaps then then add another column made by some
calculation like it being equal to the items in the column called "distance"
divided by the column entries called "time" to make a speed column. You
might graph the result and then make other such structures and graphs all
without doing anything but reading some columns.

If you then decided to just take a subset of the rows, or update even a
single item in one column, sure, you then take a copy but only of the
vectors that changed. 

Does python need something like this. I doubt it. Languages with such lazy
features can do very powerful things but then need even more operators to
force certain evaluations to be done in certain ways that leave other parts
unevaluated. R, for amusement, has an operator called !! (named bang bang)
for some such purposes and another operator !!! (obviously called The Big
Bang) for others. But it has radical differences in philosophy as compared
to something like python and each has a role and places it is easier to
write some kinds of programs than others.

The original question here was why there are different results. I think the
answer to my question is that it does not matter for most purposes. I can
think of one that MAYBE does.

If an application has some kind of ceiling, such as calculating how much
memory is left before deciding if a mail message is too big to send, and it
sometimes runs on machines with different amounts of such resources, you can
end up with a message being shunted along the way towards the destination
(we are talking ages ago) and suddenly hitting a machine where it is tossed
into junkmail as too big. So if your compiler/interpreter is one that allows
you to use less memory in circumstances like we are discussing, you may not
get what you think. Imagine a program that every time it creates a data
structure, adds some measure like sizeof() the new item and halts if the
size reached a gigabyte or perhaps charges extra because you used more of a
resource. The programmer choosing the list versus tuple alternative, would
get different behavior in such a hypothetical scenario. 


-----Original Message-----
From: Python-list <python-list-bounces+avigross=verizon.net at python.org> On
Behalf Of Greg Ewing
Sent: Tuesday, June 15, 2021 9:00 PM
To: python-list at python.org
Subject: Re: Why the list creates in two different ways? Does it cause by
the mutability of its elements? Where the Python document explains it?

On 16/06/21 12:15 pm, Avi Gross wrote:
> May I ask if there are any PRACTICAL differences if multiple immutable 
> tuples share the same address or not?

Only if some piece of code relies on the identity of tuples by using 'is' or
id() on them.

There's rarely any need to write code like that, though.
Normally you should compare tuples using '==', not 'is'.

Exceptions to this usually involve something like cacheing where identity is
only used for optimisation purposes, and the end result doesn't depend in
it.

> I mean if I use a tuple in a set or as the key in a dictionary and a 
> second one comes along, will it result in two entries one time and 
> only one the other time?

Dicts and sets compare keys by equality, not identity, so there is no
problem here.

 >>> a = 1
 >>> b = 2
 >>> t1 = (a, b)
 >>> t2 = (a, b)
 >>> t1 is t2
False
 >>> d = {}
 >>> d[t1] = 'spam'
 >>> t2 in d
True
 >>> s = set()
 >>> s.add(t1)
 >>> t2 in s
True

> Some languages I use often have a lazy evaluation where things are not 
> even copied when doing some things and only copied if absolutely 
> needed

 > So by that argument, you could have the copies of the list also be the  >
same at first and since they are mutable, they might diverge later

Python is not one of those languages, though, and it won't do things like
that. (At least not on its own -- you can certainly implement such a
lazily-copied data structure if you want.)

> Now if you really still want true copies, what ways might fool a compiler?

I've shown one way above that works in CPython, but a smarter implementation
might notice that t1 and t2 will always be equal and merge them.

> NoDup = [(5, 2), (6-1, 6/3), (12%7, 1/1 + 1/1)]

CPython merges the last two of these, but not the first:

 >>> NoDup = [(5, 2), (6-1, 6/3), (12%7, 1/1 + 1/1)]  >>> [id(x) for x in
NoDup] [4387029696, 4386827968, 4386827968]

The reason is that '/' does float division in Python 3. If you use int
division instead, they all get merged:

 >>> NoDup = [(5, 2), (6-1, 6//3), (12%7, 1//1 + 1//1)]  >>> [id(x) for x in
NoDup] [4387030272, 4387030272, 4387030272]

So you need to be tricker than that to fool it!

The bottom line is, don't write code that depends on the identities of
immutable objects.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list



More information about the Python-list mailing list