[Python-Dev] Optimizing list.sort() by checking type in advance

Elliot Gorokhovsky elliot.gorokhovsky at gmail.com
Mon Oct 10 22:15:25 EDT 2016


Right - sorry about that last bit!

I couldn't figure out how to use timeit/perf for this because the list has
to be reinitialized each time it gets sorted. So with time.time() I can
just wrap the sorting and cut the initialization out (and I could always
throw it in a for loop to do it as many times as necessary). But with
timeit/perf, as I understand, you just give it a number of iterations and a
code snippet and it loops for you. So I don't see how I could get it to cut
out the initialization. There's an option to provide setup code, of course,
but I need to set up before each trial, not just before the loop.

Regardless, one could always wrap the whole contribution with a length test
and disable for lists with less than, say, 1000 elements. And though for
the above reason I couldn't figure out how to benchmark properly for small
lists, it's clear that for large lists this gives a real improvement with
very little change in the code: the fact is that it really doesn't make
sense to do all this typechecking nlogn times when we can just get it over
with once at the beginning. The cost is very, very small, as demonstrated
by the last benchmark (which is for a 1e7 list, so it is probably more
valid with my crude technique), so heterogeneous lists don't get slowed
down perceptibly.

On Mon, Oct 10, 2016 at 8:06 PM Guido van Rossum <guido at python.org> wrote:

> So maybe someone should explain to Elliott *why* his own benchmarks
> are not trustworthy, rather than just repeat "use perf or timeit".
> Actually, there are two things: (a) when something new comes along it
> *always* needs to prove beyond a shadow of a doubt that it is actually
> an improvement and not a timing artifact or a trick; (b) you can't
> time sorting 10 values *once* and get a useful result. You have to do
> it many times. And you have to make sure that creating a list of 10
> random values isn't taken as part of your test -- that's tricky since
> random() isn't all that fast; but it has to be done.
>
> Although Elliott had it coming when he used needlessly offensive
> language in his first post.
>
> On Mon, Oct 10, 2016 at 5:09 PM, Steven D'Aprano <steve at pearwood.info>
> wrote:
> > On Mon, Oct 10, 2016 at 09:16:32PM +0000, Elliot Gorokhovsky wrote:
> >
> >> Anyway, benchmarking technique aside, the point is that it it works well
> >> for small lists (i.e. doesn't affect performance).
> >
> > You've been shown that there is something suspicious about your
> > benchmarking technique, something that suggests that the timing results
> > aren't trustworthy. Until you convince us that your timing results are
> > reliable and trustworthy, you shouldn't be drawing *any* conclusions
> > about your fastsort versus the standard sort.
> >
> >
> > --
> > Steve
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20161011/da51ba70/attachment-0001.html>


More information about the Python-Dev mailing list