[Tutor] How Python handles data (was guess-my-number programme)

Wayne Werner waynejwerner at gmail.com
Tue Sep 27 15:58:19 CEST 2011


On Sat, Sep 24, 2011 at 2:58 AM, Kĩnũthia Mũchane <
kinuthia.muchane at gmail.com> wrote:

> **
> On 09/23/2011 11:28 PM, Wayne Werner wrote:
>
> <snip>
>  >>> tries = 1
> >>> tries
> 1
>
>  The variable 'tries' now contains the value 1 (In Python this is not
> technically true, but it's useful to describe it that way).
>
> Why?
>

Python is a true object-oriented language in that everything in Python is an
object. As an aside, one of the great things about Python is that you can
program using almost any paradigm and ignore the object-oreintedness. In a
language such as C, variables refer to memory locations that you store
values in.

When you do something like this in C:

int x = 0;
int y = 0;

What you have actually done behind the scenes is allocated two bytes of
memory(IIRC that's in the C spec, but I'm not 100% sure that it's guaranteed
to be two bytes). Perhaps they are near each other, say at addresses
0xab0fcd and 0xab0fce. And in each of these locations the value of 0 is
stored.
When you create a variable, memory is allocated, and you refer to that
location by the variable name, and that variable name always references that
address, at least until it goes out of scope. So if you did something like
this:

x = 4;
y = x;

Then x and y contain the same value, but they don't point to the same
address.

In Python, things are a little bit more ambiguous because everything is an
object.  So if you do this:

x = 4
y = x

Then it's /possible/ (not guaranteed) that y and x point to the same memory
location. You can test this out by using the 'is' operator, which tells you
if the variables reference the same object:

>>> x = 4
>>> y = x
>>> x is y
True

But this is not guaranteed behavior - this particular time, python happened
to cache the value 4 and set x and y to both reference that location. This
is done for optimization, both space and speed, and possibly other reasons
I'm not aware of. So instead of saying "I stored the value 4 in x and copied
that value into y", the correct statement would be "I gave the value 4 the
name of x, and then I also gave it the name of y". The biggest reason that
we use the first abstraction, storing a value in a "container" (variable),
is that this is the abstraction that you find when talking about almost any
other programming language. So whether out of habit, or simply because it
will be less confusing to folks who encounter other languages, most
Pythonistas will refer to

x = 4

as storing 4 in x.

Python uses caching for a variety of built-in types, so you can see the 'is'
phenomenon on other types as well:

>>> x = 'hi'
>>> y = x
>>> y is x
True
>>> x = 3.14
>>> y = x
>>> x is y
True
>>> x = 'This is a super long sentence'
>>> y = x
>>> x is y
True
>>> y = 'This is a super long sentence'
>>> x is y
False
>>> x == y
True
>>> x = 'hi'
>>> y = 'hi'
>>> x is y
True
>>> y += '!'
>>> x is y
False
>>> x == y
False
>>> x
'hi'
>>> y
'hi!'

One thing that is important to note is that in each of these examples, the
data types are immutable. In C++ if you have a string and you add to the end
of that string, that string is still stored in the same location. In Python
there's this magical string space that contains all the possible strings in
existence[1] and when you "modify" a string using addition, what you're
actually doing is telling the interpreter that you want to point to the
string that is the result of addition, like 'hi' + '!'. Sometimes Python
stores these as the same object, other times they're stored as different
objects. And since you can't change immutable objects in-place (e.g.
'hello'[0] = 'j' raises a TypeError, as does ('hello',)[0] = 'goodbye'),
it's just fine to use the standard "store 4 in x" abstraction.

But there are also mutable types such as lists and dictionaries that will
get you in trouble if you don't understand the difference.

Here is an example that has bitten every Python programmer who's been
programming long enough:

>>> names = ["Werner", "Lancelot", "Bedevere", "Idle", "Chapman"]
>>> def add_something(mylist):
...     knights = mylist
...     for x in range(len(knights)-1):
...         knights[x] = "Sir " + knights[x]
...     return knights
...
>>> knights = add_something(names)
>>> knights
['Sir Werner', 'Sir Lancelot', 'Sir Bedevere', 'Sir Idle', 'Chapman']
>>> names
['Sir Werner', 'Sir Lancelot', 'Sir Bedevere', 'Sir Idle', 'Chapman']

And the question is "wait, why did 'names' change??" Because we thought that
when we wrote 'knights = mylist' that we were storing a copy of mylist in
knights, just like when we ran 'y = x' we stored a copy of 4 in y. But
instead what we did was add the name "knights" to the list that was already
named by "mylist" and "names". So we can modify the list by using any of
those names.

Mostly it's nothing to worry about. You will have no problems writing
programs in Python if you don't understand the name concept when it comes to
immutable types, because the behavior ends out the same from the logic side
of things (as long as you compare with == and not 'is', anyway). When it
comes to mutable types you need to understand references or you'll have
problems. When it comes to numbers and strings (and tuples, AFAIK), I don't
know that there is any other need to understand those concepts other than to
satisfy your curiosity.

If I've made any mistakes, I'm sure others will quickly correct me.
HTH,
Wayne

[1] Not really. It just contains the strings that you have references to, or
that have not been garbage collected yet.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110927/711bf6f9/attachment-0001.html>


More information about the Tutor mailing list