set and dict iteration

Aaron Brady castironpi at gmail.com
Thu Aug 23 12:49:41 EDT 2012


On Saturday, August 18, 2012 9:28:32 PM UTC-5, Aaron Brady wrote:
> On Saturday, August 18, 2012 5:14:05 PM UTC-5, MRAB wrote:
> 
> > On 18/08/2012 21:29, Aaron Brady wrote:
> 
> > > On Friday, August 17, 2012 4:57:41 PM UTC-5, Chris Angelico wrote:
> 
> > >> On Sat, Aug 18, 2012 at 4:37 AM, Aaron Brady <castironpi at gmail.com> wrote:
> 
> > >> > Is there a problem with hacking on the Beta?
> 
> > >> Nope. Hack on the beta, then when the release arrives, rebase your
> 
> > >> work onto it. I doubt that anything of this nature will be changed
> 
> > >> between now and then.
> 
> > >> ChrisA
> 
> > > Thanks Chris, your post was encouraging.
> 
> > > I have a question about involving the 'tp_clear' field of the types.
> 
> > > http://docs.python.org/dev/c-api/typeobj.html#PyTypeObject.tp_clear
> 
> > > '''
> 
> > > ...The tuple type does not implement a tp_clear function, because it’s possible to prove that no reference cycle can be composed entirely of tuples.
> 
> > > '''
> 
> > > I didn't follow the reasoning in the proof; the premise is necessary but IMHO not obviously sufficient.  Nevertheless, the earlier diagram contains an overt homogeneous reference cycle.
> 
> > > Reposting: http://home.comcast.net/~castironpi-misc/clpy-0062%20set%20iterators.png
> 
> > > In my estimate, the 'tp_traverse' and 'tp_clear' fields of the set doesn't need to visit the auxiliary collection; the same fields of the iterators don't need to visit the primary set or other iterators; and references in the linked list don't need to be included in the iterators' reference counts.
> 
> > > Can someone who is more familiar with the cycle detector and cycle breaker, help prove or disprove the above?
> 
> > In simple terms, when you create an immutable object it can contain
> 
> > only references to pre-existing objects, but in order to create a cycle
> 
> > you need to make an object refer to another which is created later, so
> 
> > it's not possible to create a cycle out of immutable objects.
> 
> > However, using Python's C API it _is_ possible to create such a cycle,
> 
> > by mutating an otherwise-immutable tuple (see PyTuple_SetItem and
> 
> > PyTuple_SET_ITEM).
> 
> Are there any precedents for storing uncounted references to PyObject's?
> 
> One apparent problematic case is creating an iterator to a set, then adding it to the set.  However the operation is a modification, and causes the iterator to be removed from the secondary list before the set is examined for collection.
> 
> Otherwise, the iterator keeps a counted reference to the set, but the set does not keep a counted reference to the iterator, so the iterator will always be freed first.  Therefore, the set's secondary list will be empty when the set is freed.
> 
> Concurrent addition and deletion of iterators should be disabled, and the iterators should remove themselves from the set's secondary list before they decrement their references to the set.
> 
> Please refresh the earlier diagram; counted references are distinguished separately.  Reposting: http://home.comcast.net/~castironpi-misc/clpy-0062%20set%20iterators.png

The patch for the above is only 40-60 lines.  However it introduces two new concepts.

The first is a "linked list", a classic dynamic data structure, first developed in 1955, cf. http://en.wikipedia.org/wiki/Linked_list .  Linked lists are absent in Python, including the standard library and CPython implementation, beyond the weak reference mechanism and garbage collector.  The "collections.deque" structure shares some of the linked list interface but uses arrays.

The second is "uncounted references".  The uncounted references are references to "set iterators" exclusively, exist only internally to "set" objects, and are invisible to the rest of the program.  The reason for the exception is that iterators are unique in the Python Data Model; iterators consist of a single immutable reference, unlike both immutable types such as strings and numbers, as well as container types.  Counted references could be used instead, but would be consistently wasted work for the garbage collector, though the benefit to programmers' peace of mind could be significant.

Please share your opinion!  Do you agree that the internal list resolves the inconsistency?  Do you agree with the strategy?  Do you agree that uncounted references are justified to introduce, or are counted references preferable?



More information about the Python-list mailing list