[Python-ideas] Sorted lists

Steven D'Aprano steve at pearwood.info
Mon Apr 8 06:25:30 EDT 2019


On Mon, Apr 08, 2019 at 02:55:54AM -0700, Nathaniel Smith wrote:

> Right, by "doesn't affect" I meant "cannot get any benefit, even if their
> code is modified".

Ah, sorry I misunderstood you.


> >     # This only makes sense if data is a sequence (list)
> >     # not an iterator.
> >     quartiles = quantiles(data, n=4)
> >     quintiles = quantiles(data, n=5)
> >     deciles = quantiles(data, n=10)
> >     percentiles = quantiles(data, n=100)
> >
> 
> If only we had some kind of API that could compute multiple quantiles at
> the same time...

You mean something like this?

    quartiles, quintiles, deciles, percentiles = quantiles(data, n=(4, 5, 10, 100))

Yuck :-)

I'd rather allow an already_sorted=True parameter :-)


> > I'm perfectly comfortable with allowing the caller to lie if they want.
> > Its their own foot they're shooting.
> >
> 
> An already_sorted=True argument would be an explicit opt in, and consenting
> adults would apply. But your message was very explicit that __issorted__
> can be set implicitly, though. For example, this would give garbage results:
> 
> # implicitly sets the sorted flag
> data.sort()
> # preserves the flag, because hey it's sorted by *some* key
> data.reverse()
> statistics.median(data)

It would certainly be putting more responsibility on the caller to 
ensure the sorted flag was correct.


> You can't use this in statistics.median because it would break
> compatibility. 

I would argue differently.


> Also, isn't the whole point of 'statistics' to be the
> simple, reliable module for folks who aren't that worried about speed? This
> seems like a massive footgun.

Perhaps. Perhaps not as massive as it seems.

The expected use-case would be something like this:


    data = get_data()
    a = median(data)
    data.sort()    
    b = median(data)  # Like a, but faster.


To be a problem, the caller would need to do something like:


    data = get_data()
    a = median(data)
    data.sort(reversed=True)  # or some weird key function
    b = median(data)  # Garbage result.


I can't see people shooting themselves in the foot in this way by 
accident very often. But fair enough, it is a risk.


> > (I wouldn't be so blasé about this if it were a function written in C
> > that could segfault if the list wasn't sorted.)
> >
> 
> Silently giving the wrong answer is way worse than a segfault.

It depends on what you're worried about, and who gets the blame for the 
wrong answer.

As I understand it, it is the position of the core-devs that *any* seg 
fault in the interpreter or stdlib is a serious bug that must be fixed 
(possibly excepting ctypes); but if Python code returns garbage when you 
give it garbage input, that may or may not be considered a bug.

In this case, passing a list with the flag set when it is not actually 
sorted correctly would be a "Garbage In, Garbage Out" error, just as if 
they had explicitly passed a already_sorted=True argument.

But I take your earlier point that the argument version is explicit and 
opt-in, while the flag is implicit and may not be opt-in.

Given all the points already raised, I think that an explicit SortedList 
might be more appropriate.

Thanks everyone for all the feedback.



-- 
Steven


More information about the Python-ideas mailing list