Python's simplicity philosophy

Andrew Dalke adalke at mindspring.com
Wed Nov 19 18:55:44 EST 2003


Paul Rubin:
> It's also a barrier to anyone who implements list sort, not just
> listalike sort.  Of course that doesn't affect CPython or Jython
> users, who already have a built-in stable list sort.  But what about
> PyPy or maybe someone trying to reimplement Python from scratch in C?

My recent statement on this hasn't changed -- this will affect
very few people, and the code to do a simple (perhaps not optimally
efficient for Python) sort is available in many books and libraries, so
should have very little influence on this decision.

> I'm not a sorting whiz but my feeling is that if an application needs
> sort to be stable, that's already a sign that something isn't really
> right about it.

The standard example for stable sorts is in a spreadsheet-like
grid display where the ordering of the rows can be changed based
on the values in a given column (eg, by clicking on the column
header).  Rows with column entries with the same value should
keep the same relative order, because that's what people expect.

Why?  Because that lets the user define the sort order -- first
sort on tertiary key, then sort on secondary key, then sort on
primary key.

> The way to do that is make your comparison function
> look at every part of the record that you care about, not just select
> some single field and count on stability to take care of the rest.

There are two answers to that.  All non-stable sorts can be
turned into a stable sort by tacking on a final comparison based
on the original position.  My spreadsheet example doesn't require
having a stable sort because I create my own.  But then there's
the inelegance of creating a new list to store the original position,
and the extra dispatch needed to check that new field.

And if there isn't a stable sort, nor one induced by creating
an extra list, then what is the interface for creating the correct
comparison function?  It could keep track of all the column
clicks, to know which ones are used as primary, secondary,
etc. sorts keys... but at some point it ends up with a tie
which cannot be resolved (in which case there's no problem)
or resolved by assuming the datatype of a given column.

For example, suppose I click on the 2nd column in

alpha 1
beta  2
beta  3
gamma 2
delta 2

There are ties with the three values of 2.  The first column
can break the tie, but only by assuming that the first column
can be sorted alphabetically, to create

alpha 1
beta  2
delta 2
gamma 2
beta  3

I argue that the correct order should preserve the original
sort order, to create

alpha 1
beta  2
gamma 2
delta 2
beta  3

because the user's order is actually the alphabetical order of the
original greek alphabet, implicit in the original order of the data
file.  (The user could click on the column header to resort for
that field, which would likely be done alphabetically, but at that
point that's the same as telling the computer to treat those fields
as english words, so sorting alphabetically is fine.)

> If at all possible, design the comparison function so that there are
> never any ties.  To do otherwise has in my experience been a source of
> bugs that don't become evident til much later.

I agree with that.  But what to do when there are ties?  I believe
people expect a stable sort (even without knowing what 'stable'
means) so it's helpful that Python's native sort be stable.

This is obviously a hard decision to make -- Python's sort has
been around for 10 years before this new requirement -- but I
don't think it's a poor one, and I've only seen theoretical
arguments for otherwise.

                    Andrew
                    dalke at dalkescientific.com






More information about the Python-list mailing list