[Python-ideas] collections.sortedset proposal

Paul Colomiets paul at colomiets.name
Wed Dec 26 10:59:51 CET 2012


Hi,

On Wed, Dec 26, 2012 at 9:58 AM, Terry Reedy <tjreedy at udel.edu> wrote:
>> SortedSet (name borrowed from Redis) is a basically a mapping of
>> (unique) keys to scores, that allows fast slicing by ordinal number
>> and by score.
>
>
> Since a set, in general, is not a mapping, I do not understand what you
> mean. If you mean a mapping from sorted position to item, then I would call
> it a sortedlist.
>

Ok. My description is vague. Here is one from Redis documentation:

Redis Sorted Sets are, similarly to Redis Sets, non repeating
collections of Strings. The difference is that every member of a
Sorted Set is associated with score, that is used in order to take the
sorted set ordered, from the smallest to the greatest score. While
members are unique, scores may be repeated.

http://redis.io/topics/data-types

I was just so silly to suppose that everybody knows Redis data types.

>
>> There are plenty of use cases for the sorted sets:
>>
>> * Leaderboard for a game
>
>
> This looks like an auto-sorted list.
>

Yep. The crucial property is fast insertion and updates.

>
>> * Priority queue (that supports task deletion)
>
>
> This looks like something else.
>

Priority queue is basically an auto-sorted list too. No?

>
>> * Timer list (e.g. can be used for tulip, supports deletion too)
>> * Caches with TTL-based, LFU or LRU eviction (including
>> `functools.lru_cache`)
>
>
> These look like sorted lists.
>

Yup. But we can't call the data structure  SortedList, because
elements must be unique.

>
>> * Search databases with relevance scores
>> * Statistics (many use cases)
>
>
> These are rather vague.
>

Yes. Included just to give some overview.

>
>> * Replacement for `collections.Counter` with faster `most_common()`
>
>
> This looks like something else.
>

Why? If you have a list sorted by counter values, you can have
`most_common()` by slicing.

> The standard answer is to list on or submit to pypi and get community
> approval and adoption. Then a pep with a commitment to maintenance even
> while others interfere with your 'baby'. Long-time core committers sometimes
> get to cut the process short, but even Guido is starting his propose async
> module/package with a pep and publicly available code for 3.3.
>

It's on the PyPI now. I know the standard answer :) So you don't
understand what SortedSets are and what would be good name for data
structure, or do you think it's useless?

The crucial point of adoption, is that most of the time people don't
want to add additional dependency for simple tasks like priority
queue, even if it's faster or more featureful. And I think that
SortedSets in Redis have proved their usefulness as a data structure.

--
Paul



More information about the Python-ideas mailing list