set and dict iteration

Aaron Brady castironpi at gmail.com
Mon Sep 3 20:24:02 EDT 2012


On Monday, September 3, 2012 3:28:28 PM UTC-5, Dave Angel wrote:
> On 09/03/2012 04:04 PM, Aaron Brady wrote:
> 
> > On Monday, September 3, 2012 2:30:24 PM UTC-5, Ian wrote:
> 
> >> On Sun, Sep 2, 2012 at 11:43 AM, Aaron Brady <castironpi at gmail.com> wrote:
> 
> >>
> 
> >>> We could use a Python long object for the version index to prevent overflow.  Combined with P. Rubin's idea to count the number of open iterators, most use cases still wouldn't exceed a single word comparison; we could reset the counter when there weren't any.
> 
> >>
> 
> >>
> 
> >> We could use a Python long; I just don't think the extra overhead is
> 
> >>
> 
> >> justified in a data structure that is already highly optimized for
> 
> >>
> 
> >> speed.  Incrementing and testing a C int is *much* faster than doing
> 
> >>
> 
> >> the same with a Python long.
> 
> > I think the technique would require two python longs and a bool in the set, and a python long in the iterator.
> 
> >
> 
> > One long counts the number of existing (open) iterators.  Another counts the version.  The bool keeps track of whether an iterator has been created since the last modification, in which case the next modification requires incrementing the version and resetting the flag.
> 
> 
> 
> I think you're over-engineering the problem.  it's a bug if an iterator
> 
> is used after some change is made to the set it's iterating over.  We
> 
> don't need to catch every possible instance of the bug, that's what
> 
> testing is for.  The point is to "probably" detect it, and for that, all
> 
> we need is a counter in the set and a counter in the open iterator. 
> 
> Whenever changing the set, increment its count.  And whenever iterating,
> 
> check the two counters.  if they don't agree, throw an exception, and
> 
> destroy the iterator.  i suppose that could be done with a flag, but it
> 
> could just as easily be done by zeroing the pointer to the set.
> 
> 
> 
> I'd figure a byte or two would be good enough for the counts, but a C
> 
> uint would be somewhat faster, and wouldn't wrap as quickly.
> 
> 
> 
> -- 
> 
> 
> 
> DaveA

Hi D. Angel,

The serial index constantly reminds me of upper limits.  I have the same problem with PHP arrays, though it's not a problem with the language itself.

The linked list doesn't have a counter, it invalidates iterators when a modification is made, therefore it's the "correct" structure in my interpretation.  But it does seem more precarious comparatively, IMHO.

Both strategies solve the problem I posed originally, they both involve trade-offs, and it's too late to include either in 3.3.0.



More information about the Python-list mailing list