Balanced trees
Daniel Stutzbach
stutzbach at google.com
Mon Mar 17 21:01:24 EDT 2014
On Mon, Mar 17, 2014 at 5:08 PM, Joshua Landau <joshua at landau.ws> wrote:
> Thanks. First, I want to state that there are two aspects to my
> claim. The first is that these benchmarks to not represent typical
> use-cases. I will not go too far into this, though, because it's
> mostly obvious.
>
I would love to have include macro-benchmarks. I keep waiting for the PyPy
benchmark suite to get ported to Python 3...
> "Create from an iterator" gives me relatively different results when I
> run it (Python 3).
>
The graphs were originally created to compare vanilla Python with a Python
modified to use blist as the built-in list type. I think I used Python
3.1, but I'm not certain. As I recall, the built-in type has a few small
advantages over any third-party extension type, so that might be what
you're seeing. Alternately, something may have changed between Python
versions.
> "Delete a slice" is fudged from its inclusion of multiplication, which
> is far faster on blists. I admit that it's not obvious how to fix
> this.
>
I could move the initialization into the timed part, similar to what I did
for sort (see below). That has downsides too, of course, but it might be
an improvement.
> "First in, first out (FIFO)" should be "x.append(0); x.pop(0)".
>
Wow, I mangled that one badly.
> "Last in, first out (LIFO)" should use "pop()" over "pop(-1)",
> although I admit it shouldn't make a meaningful difference.
>
I like pop(-1) because it's explicit rather than implicit. I agree it
shouldn't make a meaningful difference.
> "Sort *" are really unfair because they put initialisation in the
> timed part
That's a limitation of timeit. The setup step is only executed once. If I
put the initialization there, every sort after the first one would be
sorting a pre-sorted list. If you compare the "Create form an iterator"
and "Sort a random list", you'll see that the initialization cost is
dwarfed by the sorting cost for n > 15 or so.
> and all have keys.
If you use classes with __lt__ methods instead of keys, the cost is
dominated by the calls to __lt__. You're right that I should include both,
though.
> >>> python -m timeit -s "from random import choice; import blist; lst =
> blist.blist(range(10**0))" "choice(lst)"
> 1000000 loops, best of 3: 1.18 usec per loop
>
> >>> python -m timeit -s "from random import choice; import blist; lst =
> blist.blist(range(10**8))" "choice(lst)"
> 1000000 loops, best of 3: 1.56 usec per loop
>
> Lower size ranges are hidden by the function-call overhead.
> Perhaps this effect is to do with caching, in which case the limits of
> the cache should be explained more readily.
>
That's definitely a cache issue, which is always a risk with
micro-benchmarks. I see growth even for the built-in list:
gnusto:~$ python -m timeit -s "from random import choice; lst =
list(range(10**0))" "choice(lst)"
1000000 loops, best of 3: 0.349 usec per loop
gnusto:~$ python -m timeit -s "from random import choice; lst =
list(range(10**8))" "choice(lst)"
1000000 loops, best of 3: 0.634 usec per loop
I agree it's more interesting to pick items randomly instead of always
querying the same index. The overhead of choice() is kind of a problem,
though. Since I'm only plotting up to 10**5, I'd expect these to look more
or less flat.
Thanks for all of the feedback. I filed a bug with myself to improve the
metrics:
https://github.com/DanielStutzbach/blist/issues/64
--
Daniel Stutzbach
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20140317/491a2043/attachment.html>
More information about the Python-list
mailing list