[Python-ideas] Putting `blist` into collections module

Grant Jenks grant.jenks at gmail.com
Mon Sep 22 02:30:06 CEST 2014


Long time lurker, first time poster. I think there may be multiple
discussions happening here so I wanted to highlight a competing module.

I think blist.blist is an excellent data type with a lot of value. But the
performance graphs can be a bit misleading if you think they apply to the
sortedlist, sorteddict, and sortedset types. In those scenarios, I do not
believe blist is the best choice.

The SortedContainers module (https://pypi.python.org/pypi/sortedcontainers)
provides SortedList, SortedDict, and SortedSet data types. It is
implemented in pure-Python, has 100% coverage and hours of stress testing.
The API implemented is very close to blist's and a lot of effort has been
put into documentation (http://www.grantjenks.com/docs/sortedcontainers/).
Furthermore, the data types provided are often faster than their blist
counterparts.

Extensive performance comparisons against other implementations of sorted
list, sorted dict, and sorted set types are documented (
http://www.grantjenks.com/docs/sortedcontainers/performance.html) along
with a comparison of runtimes and load-factors (similar to blist,
sortedcontainers uses a modified B-tree but with a tunable node size.) For
SortedList, an analysis of use cases on Github has been made. These
describe five use cases with performance analysis (
http://www.grantjenks.com/docs/sortedcontainers/performance-workload.html)

Disclaimer: I am the author of the Python SortedContainers project.
Feedback welcome.

Grant Jenks


On Sun, Sep 21, 2014 at 3:04 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:

> On Sun, 21 Sep 2014 05:18:38 +0000
> David Wilson <dw+python-ideas at hmmz.org> wrote:
> > On Sun, Sep 21, 2014 at 02:37:39PM +1000, Steven D'Aprano wrote:
> >
> > > > In 2007, this PEP was created that suggested integrating Daniel
> Stutzbach's
> > > > blist into Python: http://legacy.python.org/dev/peps/pep-3128/
> > > >
> > > > The PEP was rejected, but Raymond Hettinger made a note that "after
> a few
> > > > months, I intend to poll comp.lang.python for BList success stories.
> If
> > > > they exist, then I have no problem with inclusion in the collections
> > > > module."
> > >
> > > I have not used blist, but I have no objection to it becoming a
> > > collections type if the author agrees.
> >
> > It seems unsettling that we'd consider adding another special use
> > collection to the stdlib, when more widely applicable generalizations of
> > it are missing. In blist's case, it can (mostly) be trivially
> > reimplemented using an ordered map, which the standard library lacks.
>
> But can it be *efficiently* reimplemented using an ordered map?
> blist uses chunks of 128 pointers, which makes it almost as memory
> efficient as a standard list. A traditional ordered map (I suppose you
> mean some kind of balanced tree) would generally show much more
> overhead, if only because it has to store the keys explicitly. And also,
> you now have to do costly key comparisons every time you do a lookup
> ("costly" because they go through the object layer and its
> indirections, as opposed to simple integer arithmetic).
>
> > Coming from this perspective, I'd prefer that further additions were
> > limited to clean and far better understood structures. In this light,
> > could we perhaps instead discuss the merits of a collections.Tree,
> > collections.SortedDict or similar?
>
> This sounds like a false dilemma. We could have a collections.blist
> *and* a collections.Tree.
>
> Regards
>
> Antoine.
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140921/4f1a24a6/attachment.html>


More information about the Python-ideas mailing list