[Python-Dev] Retrieve an arbitrary element from a set without removing it

Sat Oct 24 17:41:18 CEST 2009

Hi,

someone on this list mentioned that much of the s.get() time is spend on the 
name lookup for get(). That is indeed the case:

===================
from timeit import *

stats = ["for i in xrange(1000): iter(s).next()   ",
         "for i in xrange(1000): \n\tfor x in s: \n\t\tbreak",
         "for i in xrange(1000): s.add(s.pop())   ",
         "for i in xrange(1000): s.get()          ",
         "g=s.get;\nfor i in xrange(1000): g()          "]

for stat in stats:
    t = Timer(stat, setup="s=set(range(1000))")
    print "Time for %s:\t %f"%(stat, t.timeit(number=1000))
==================

Time for for i in xrange(1000): iter(s).next()   :       0.448227
Time for for i in xrange(1000):
        for x in s:
                break:   0.141669
Time for for i in xrange(1000): s.add(s.pop())   :       0.348055
Time for for i in xrange(1000): s.get()          :       0.148580
Time for g=s.get;
for i in xrange(1000): g()          :    0.080563

So, now set.get() is indeed the fastest and preferable solution if you need 
massive amounts of retrieving elements from a set without removing them.

wr

Am Freitag, 23. Oktober 2009 22:53:24 schrieb Willi Richert:
> Hi,
> 
> surprised about the performance of for/break provided by Vitor, I did some
> more testing. It revealed that indeed  we can forget the get() (which was
> implemented as a stripped down pop()):
> 
> from timeit import *
> stats = ["for i in xrange(1000): iter(s).next()   ",
>          "for i in xrange(1000): \n\tfor x in s: \n\t\tbreak            ",
>          "for i in xrange(1000): s.add(s.pop())   ",
>          "for i in xrange(1000): s.get()          "]
> 
> for stat in stats:
>     t = Timer(stat, setup="s=set(range(100))")
>     try:
>         print "Time for %s:\t %f"%(stat, t.timeit(number=1000))
>     except:
>         t.print_exc()
> 
> 
> 
> $ ./test_get.py
> Time for for i in xrange(1000): iter(s).next()   :       0.433080
> Time for for i in xrange(1000):
>         for x in s:
>                 break                            :       0.148695
> Time for for i in xrange(1000): s.add(s.pop())   :       0.317418
> Time for for i in xrange(1000): s.get()          :       0.146673
> 
> 
> In some tests, for/break was even slightly faster then get().
> 
> As always, intuition regarding performance bottlenecks is flawed ;-)
> 
> Anyway, thanks for all the helpful comments, especially to Stefan for the
> http://comments.gmane.org/gmane.comp.python.ideas/5606 link.
> 
> Regards,
> wr
> 
> Am Freitag, 23. Oktober 2009 19:25:48 schrieb John Arbash Meinel:
> > Vitor Bosshard wrote:
> > > 2009/10/23 Willi Richert <w.richert at gmx.net>:
> > >> Hi,
> > >>
> > >> recently I wrote an algorithm, in which very often I had to get an
> > >> arbitrary element from a set without removing it.
> > >>
> > >> Three possibilities came to mind:
> > >>
> > >> 1.
> > >> x = some_set.pop()
> > >> some_set.add(x)
> > >>
> > >> 2.
> > >> for x in some_set:
> > >>        break
> > >>
> > >> 3.
> > >> x = iter(some_set).next()
> > >>
> > >>
> > >> Of course, the third should be the fastest. It nevertheless goes
> > >> through all the iterator creation stuff, which costs some time. I
> > >> wondered, why the builtin set does not provide a more direct and
> > >> efficient way for retrieving some element without removing it. Is
> > >> there any reason for this?
> > >>
> > >> I imagine something like
> > >>
> > >> x = some_set.get()
> > >
> > > I see this as being useful for frozensets as well, where you can't get
> > > an arbitrary element easily due to the obvious lack of .pop(). I ran
> > > into this recently, when I had a frozenset that I knew had 1 element
> > > (it was the difference between 2 other sets), but couldn't get to that
> > > element easily (get the pun?)
> >
> > So in my testing (2) was actually the fastest. I assumed because .next()
> > was a function call overhead, while:
> > for x in some_set:
> >   break
> >
> > Was evaluated inline. It probably still has to call PyObject_GetIter,
> > however it doesn't have to create a stack frame for it.
> >
> > This is what "timeit" tells me. All runs are of the form:
> > python -m timeit -s "s = set([10])" ...
> >
> > 0.101us	"for x in s: break; x"
> > 0.130us "for x in s: pass; x"
> > 0.234us -s "n = next; i = iter" "x = n(i(s)); x"
> > 0.248us "x = next(iter(s)); x"
> > 0.341us "x = iter(s).next(); x"
> >
> > So 'for x in s: break' is about 2x faster than next(iter(s)) and 3x
> > faster than (iter(s).next()).
> > I was pretty surprised that it was 30% faster than "for x in s: pass". I
> > assume it has something to do with a potential "else:" statement?
> >
> > Note that all of these are significantly < 1us. So this only matters if
> > it is something you are doing often.
> >
> > I don't know your specific timings, but I would guess that:
> >   for x in s: break
> >
> > Is actually going to be faster than your
> >   s.get()
> >
> > Primarily because s.get() requires an attribute lookup. I would think
> > your version might be faster for:
> >   stat2 = "g = s.get; for i in xrange(100): g()"
> >
> > However, that is still a function call, which may be treated differently
> > by the interpreter than the for:break loop. I certainly suggest you try
> > it and compare.
> >
> > John
> > =:->
>