[issue21592] Make statistics.median run in linear time

Tim Peters report at bugs.python.org
Sat May 31 04:53:09 CEST 2014


Tim Peters added the comment:

I suggest this needs a measurable goal, like "minimize expected-case time" or "minimize worst-case time".  "Use a state of the art implementation" isn't a measurable goal - it's an abstract wish with no measurable merit on its own ;-)

Note that I wrote the "median-of-k" code referenced in the first post here (it was in reply to David Eppstein).  Even then it was hard to beat sort() + index.

It's harder now, and for an important reason:  Python's _current_ sort can exploit many kinds of partial order, and can often complete in linear time.

This makes picking "typical" input for timing crucial.  If you want to skew it to put sort() + index at maximum disadvantage, use only shuffled (random order) pairwise-unequal inputs.  But most streams of data do have some kinds of partial order (which is why the newer sort() implementation has been so successful), and "typical" is a much tougher thing to capture than "shuffled".

----------
nosy: +tim.peters

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue21592>
_______________________________________


More information about the Python-bugs-list mailing list