[Python-ideas] Putting `blist` into collections module

Tue Sep 23 02:01:41 CEST 2014

On 22 September 2014 01:30, Grant Jenks <grant.jenks at gmail.com> wrote:
> I think blist.blist is an excellent data type with a lot of value. But the
> performance graphs can be a bit misleading if you think they apply to the
> sortedlist, sorteddict, and sortedset types. In those scenarios, I do not
> believe blist is the best choice.
>
> The SortedContainers module (https://pypi.python.org/pypi/sortedcontainers)
> provides SortedList, SortedDict, and SortedSet data types.
>
> Disclaimer: I am the author of the Python SortedContainers project. Feedback
> welcome.

How about positive feedback? I think your module is ridiculously awesome.

However, despite keeping these in the back of my mind for times I
might want to use them, I so rarely seem to. With the overhead of a
few small Python files being so low, and the need for these
collections being so little, I honestly can't recommend this for the
standard library. And if I can't recommend this, I definitely can't
recommend the blist alternatives I view as typically worse.

I think there is merit in considering a blist.blist as the standard
list type; the asymptotics are pretty awesome. We'd be part of a
language that's trying something I don't think any other language has
dared. But with the amount of one-directional iteration, it seems like
the risks are high. I would also hesitate to do anything that would
damage PyPy's claim to fame, seeing as it's faster than C++ in some
benchmarks¹ and that is the kind of crown you want to keep. But having
a second list type would mostly prevent the it from ever showing up
because people just don't test with multiple list types. Hooking into
speed.pypy.org and getting positive results would really invigorate
the debate.

¹ https://gist.github.com/Veedrac/d25148faf20669589993
This is actually a somewhat real-life scenario; I was helping someone
yesterday on their thesis where they were optimising some Python code
which had exactly the sort of characteristics that this benchmark
shows off. Getting close to the speed of C++ by just changing
interpreters is a much better option than rewriting in C++ and missing
all the optimisations you have to do for it to run fast!