Surprising (for me) benchmark results...
Daniel Berlin
dan at www.cgsoftware.com
Tue May 1 22:57:30 EDT 2001
On Tue, 1 May 2001, Tim Peters wrote:
> [E. Mark Ping]
> > In Perl and Java, the builtin sort, and in C++ std::sort. Why would
> > you expect this to be best in Python?
>
> [Michael Hudson]
> > 'cause Tim Peters wrote the code that sorts lists in Python,
> > basically.
>
> I think that's an excellent reason <wink> -- but for believing a Python sort
> written by C++, Java or Perl nerds would be slower than the one Python uses.
>
> Python's sort is highly non-obvious, and beat a dozen variations of
> quicksort, another dozen of mergesort, and a scattering of more esoteric
> ones.
It doesn't beat a very well engineered radix sort, or even multikey
quicksort (a merger of quicksort and radix sort, basically), for sorting
strings.
I tested it.
Though you are talking about the difference between 1 second and .25
second, on *very* large numbers of strings.
I.E. You'd need to be sorting an array of probably 8 million strings at
least, to see any significant difference, because 75% faster than "really
fast" is just "really really fast".
:)
> When you're just sorting strings, though, Python endures considerable
> work on each comparison to figure out all over from scratch again *how* to
> compare two strings. It's still an O(N log N) method overall, and the
> constant factor is boosted by the repeated "how to compare?" problem since
> the number of comparisons is of the same order.
>
> So while Python's sort wins by minimizing the total number of comparisons, it
> loses by doing more work per comparison. Especially since rich comparisons
> got added, "a comparison" in Python is an all-singing all-dancing entire OO
> subsystem of its own.
Right. For things where accesses to a particular portion aren't O(1), or
are even O(some large constant), i'm sure multikey quicksort and radix
sort don't do well at all. Strings just are almost always implemented in a
way where accessing a particular position is O(1), and very fast.
>
> > ...
> > Fair enough. I wonder where the speed is then? Might be in creating
> > the list/vector before sorting? The code in
> > Objects/fileobject.c:file_readlines seems quite clever, so that might
> > be it.
>
> Gets my vote! I bet the sort doesn't hurt it, though <wink>.
>
Same here.
More information about the Python-list
mailing list