Surprising (for me) benchmark results...

Daniel Berlin dan at www.cgsoftware.com
Tue May 1 22:57:30 EDT 2001


On Tue, 1 May 2001, Tim Peters wrote:

> [E. Mark Ping]
> > In Perl and Java, the builtin sort, and in C++ std::sort.  Why would
> > you expect this to be best in Python?
>
> [Michael Hudson]
> > 'cause Tim Peters wrote the code that sorts lists in Python,
> > basically.
>
> I think that's an excellent reason <wink> -- but for believing a Python sort
> written by C++, Java or Perl nerds would be slower than the one Python uses.
>
> Python's sort is highly non-obvious, and beat a dozen variations of
> quicksort, another dozen of mergesort, and a scattering of more esoteric
> ones.

It doesn't beat a very well engineered radix sort, or even multikey
quicksort (a merger of quicksort and radix sort, basically), for sorting
strings.
I tested it.
Though you are talking about the difference between 1 second and .25
second, on *very* large numbers of strings.
I.E. You'd need to be sorting an array of probably 8 million strings at
least, to see any significant difference, because 75% faster than "really
fast" is just "really really fast".
:)

> When you're just sorting strings, though, Python endures considerable
> work on each comparison to figure out all over from scratch again *how* to
> compare two strings.  It's still an O(N log N) method overall, and the
> constant factor is boosted by the repeated "how to compare?" problem since
> the number of comparisons is of the same order.
>
> So while Python's sort wins by minimizing the total number of comparisons, it
> loses by doing more work per comparison.  Especially since rich comparisons
> got added, "a comparison" in Python is an all-singing all-dancing entire OO
> subsystem of its own.

Right. For things where accesses to a particular portion aren't O(1), or
are even O(some large constant), i'm sure multikey quicksort and radix
sort don't do well at all. Strings just are almost always implemented in a
way where accessing a particular position is O(1), and very fast.
 >
> > ...
> > Fair enough.  I wonder where the speed is then?  Might be in creating
> > the list/vector before sorting?  The code in
> > Objects/fileobject.c:file_readlines seems quite clever, so that might
> > be it.
>
> Gets my vote!  I bet the sort doesn't hurt it, though <wink>.
>
Same here.





More information about the Python-list mailing list