[Python-Dev] PEP 351, the freeze protocol

Noam Raphael noamraph at gmail.com
Mon Oct 31 20:27:30 CET 2005


Hello,

I have slept quite well, and talked about it with a few people, and I
still think I'm right.

About the users-changing-my-internal-data issue:

> Again, user semantics.  Tell your users not to modify entries, and/or
> you can make copies of objects you return.  If your users are too daft
> to read and/or follow directions, then they deserve to have their
> software not work.
...
> When someone complains that something doesn't work, I tell them to read
> the documentation.  If your users haven't been told to RTFM often enough
> to actually make it happen, then you need a RTFM-bat. Want to know how
> you make one?  You start wrapping the objects you return which segfaults
> the process if they change things. When they start asking, tell them it
> is documented quite clearly "do not to modify objects returned, or else".
> Then there's the other option, which I provide below.

I disagree. I read the manual when I don't know what something does.
If I can guess what it does (and this is where conventions are good),
I don't read the manual. And let's say I ought to read the complete
manual for every method that I use, and that I deserve a death
punishment (or a segmentation fault) if I don't. But let's say that I
wrote a software, without reading the manual, and it worked. I have
gone to work on other things, and suddenly a bug arises. When the poor
guy who needs to fix it goes over the code, everything looks
absolutely correct. Should he also read the complete manuals of every
library that I used, in order to fix that bug? And remember that in
this case, the object could have traveled between several places
(including functions in other libraries), before it was changed, and
the original data structure starts behaving weird.

You suggest two ways for solving the problem. The first is by copying
my mutable objects to immutable copies:

> Also from the sounds of it, you are storing both source and destination
> values in the same table...hrm, that sounds quite a bit like a
> spreadsheet.  How does every spreadsheet handle that again?  Oh yeah,
> they only ever store immutables (generally strings which are interpreted).
> But I suppose since you are (of course) storing mutable objects, you
> need to work a bit harder...so store mutables, and return immutable
> copies (which you can cache if you want, and invalidate when your
> application updates the results...like a wx.Grid update on changed).

This is basically ok. It's just that in my solution, for many objects
it's not necessary to make a complete copy just to prevent changing
the value: Making frozen copies of objects which can't reference
nonfrozen objects (sets, for example), costs O(1), thanks to the
copy-on-write.

Now, about the graph:

> So let me get this straight: You've got a graph.  You want to be able to
> change the graph, but you don't want your users to accidentally change
> the graph. Sounds to me like an API problem, not a freeze()/mutable problem.
> Want an API?
>
> class graph:
>    ...
>    def IterOutgoing(self, node):
>        ...
>    def IterIncoming(self, node):
>        ...
>    def IsAdjacent(self, node1, node2):
>        ...
>    def IterNodes(self):
>        ...
>    def AddEdge(self, f_node, t_node):
>        ...
>    def RemEdge(self, node1, node2):
>        ...
>    def AddNode(self):
>        ...
>
> If you are reasonable in your implementation, all of the above
> operations can be fast, and you will never have to worry about your
> users accidentally mucking about with your internal data structures:
> because you aren't exposing them.  If you are really paranoid, you can
> take the next step and implement it in Pyrex or C, so that only a
> malicous user can muck about with internal structures, at which point
> you stop supporting them.

This will work. It's simply... well, not very beautiful. I have to
invent a lot of names, and my users need to remember them all. If I
give them a frozen set, with all the vertices than a vertex points to
(which is an absolutely reasonable API), they will know how to deal
with it without learning a lot of method names, thanks to the fact
that they are already familiar with sets, and that a lot of thought
has gone into the set interface.

> > Now, about copy-on-write:
...
>
> What you have written here is fairly unintelligible, but thankfully you
> clarify yourself...pity it still doesn't work, I explain below.

I'm sorry if I am sometimes hard to understand. English is not my
mother tongue, and it degrades as the hour gets later - and sometimes,
things are hard to explain. If I don't explain myself, please say so
and I'll try again. This is an excellent example - I wrote about
callbacks, and went to sleep. Let me try to explain again how it
*does* work.

The frozen() function, and the __frozen__ protocol, would get another
optional argument - an object to be notified when the *nonfrozen*
object has changed. It may be called at most once - only on the first
change to the object, and only if the object which requested to be
notified is still alive. I now introduce a second protocol, which I
will call __changed__. Objects would be notified by calling their
__changed__ method.

You say that every frozen() call takes O(n), because it needs to
verify that objects hadn't changed since the last call:

> Oh hrm.  This invalidates x[0]'s cached frozen object, which would
> suggest that x's cached frozen object was also invalidated, even though
> Python objects tend to know nothing about the objects which point to
> them.  Well, that's a rub. In order to /validate/ that an object's
> cached object is valid, you must validate that the contents of your
> cached frozen object points to the cached frozen objects of your
> contents.  Or in code (for this two-level example)...
>
>    def frozen(x):
>        if x.frozen_cache:
>            for i,j in zip(x.contents, x.frozen_cache):
>                if i.frozen_cache is not j:
>                    x.frozen_cache = None
>                    x.frozen_cache = frozen(x)
>                    return x.frozen_cache
>        x.frozen_cache = tuple(map(frozen, x.contents))
>        return x.frozen_cache

This is not so. When a list creates its frozen copy, it gives itself
to all those frozen() calls. This means that it will be notified if
one of its members is changed. In that case, it has to do two simple
actions: 1. release the reference to its frozen copy, so that
subsequent freezes of the list would create a new frozen copy, and: 2.
notify about the change any object which froze the list and requested
notification.

This frees us of any validation code. If we haven't been notified
about a change, there was no change, and the frozen copy is valid.

In case you ask, the cost of notification is O(1), amortized. The
reason is that every frozen() call can cause at most one callback in
the future.

> Like I said before, it's not so easy.  In order to actually implement
> the system, every object in an object heirarchy would need to know about
> its parent, but such cannot work because...
>
>    a = range(10)
>    b = [a]
>    c = [a]
>    bf = frozen(b)
>    cf = frozen(c)
>    a[0] = 10       #oops!
>
> That last line is the killer.  Every object would necessarily need to
> know about all containers which point to it, and would necessarily need
> to notify them all that their contents had changed.  I personally don't
> see Python doing that any time soon.
>
This is not the case. Every object has to know only on the objects
which created frozen copies of it, and requested notifications.
Actually, the object itself doesn't have to store anything. I thought
about it, and you can create a module for handling those
change-callbacks. It would store only weak references to objects. It
would have two functions:

def register_reference(x, y):
    '''Register the fact that if object x changes, it means that
object y changes too.'''

def changed(x):
    '''Notify all objects that change with x that they are changed.'''

When an object is changed, this module would call the __changed__
method of all the objects that have a reference to it, and haven't
changed since the connection was created.

I hope I have clarified my idea. Tell me if you still think I'm wrong.

> Hope you sleep/slept well,
>  - Josiah
>

Thanks! indeed, a good sleep is something very important. Sleep well
too (when the time comes, of course),
Noam


More information about the Python-Dev mailing list