blist in standard library (was Re: Balanced trees)

Sat Mar 15 08:31:26 EDT 2014

On 15/03/2014 01:13, Joshua Landau wrote:
> On 8 March 2014 20:37, Mark Lawrence <breamoreboy at yahoo.co.uk> wrote:
>> I've found this link useful http://kmike.ru/python-data-structures/
>>
>> I also don't want all sorts of data structures added to the Python library.
>> I believe that there are advantages to leaving specialist data structures on
>> pypi or other sites, plus it means Python in a Nutshell can still fit in
>> your pocket and not a 40 ton articulated lorry, unlike the Java equivalent.
>
> The thing we really need is for the blist containers to become stdlib
> (but not to replace the current list implementation). The rejected PEP
> (http://legacy.python.org/dev/peps/pep-3128/) misses a few important
> points, largely in how the "log(n)" has a really large base:
> random.choice went from 1.2µs to 1.6µs from n=1 to n=10⁸, vs 1.2µs for
> a standard list.
>
> Further, it's worth considering a few advantages:
>
> * copy is O(1), allowing code to avoid mutation by just copying its
> input, which is good practice.
>
> * FIFO is effectively O(1), as the time just about doubles from n=1 to
> n=10⁸ so will never actually branch that much. There is still a speed
> benefit of collections.deque, but it's much, much less significant.
> This is very useful when considering usage as a multi-purpose data
> structure, and removes demand for explicit linked lists (which have
> foolishly been reimplemented loads of times).
>
> * It reduces demand for trees:
>      * There are efficient implementations of sortedlist, sortedset and
> sorteddict.
>      * Slicing, slice assignment and slice deletion are really fast.
>      * Addition of lists is sublinear. Instead of
> "list(itertools.chain(...))", one can add in a loop and end up
> *faster*.
>
> I think blist isn't very popular not because it isn't really good, but
> because it isn't a specialised structure. It is, however, almost there
> for almost every circumstance. This can help keep the standard library
> clean, especially of tree data structures.
>
> Here's what we kill:
>
> * Linked lists and doubly-linked lists, which are scarily popular for
> whatever reason. Sometimes people claim that collections.deque isn't
> powerful enough for whatever they want, and blist will almost
> definitely sate those cases.
>
> * Balanced trees, with blist.sortedlist. This is actually needed right now.
>
> * Poor performance in the cases where a lot of list merging and pruning happens.
>
> * Most uses of bisect.
>
> * Some instances where two data structures are used in parallel in
> order to keep performance fast on disparate operations (like `x in y`
> and `y[i]`).
>
> Now, I understand there are downsides to blist. Particularly, I've
> looked through the "benchmarks" and they seem untruthful. Further,
> we'd need a maintainer. Finally, nobody jumps at blists because
> they're rarely the obvious solution. Rather, they attempt to be a
> different general solution. Hopefully, though, a stdlib inclusion
> could make them a lot more obvious, and support in some current
> libraries could make them feel more at home.
>
> I don't know whether this is a good idea, but I do feel that it is
> more promising and general than having a graph in the standard
> library.
>

I'd raise this on python-dev or python ideas as a result of reading PEP 
3128.  Specifically the second paragraph states Raymond Hettinger's sage 
advice:-

"Depending on its success as a third-party module, it still has a chance 
for inclusion in the collections module. The essential criteria for that 
is whether it is a superior choice for some real-world use cases. I've 
scanned my own code and found no instances where BList would have been 
preferable to a regular list. However, that scan has a selection bias 
because it doesn't reflect what I would have written had BList been 
available. So, after a few months, I intend to poll comp.lang.python for 
BList success stories. If they exist, then I have no problem with 
inclusion in the collections module. After all, its learning curve is 
near zero -- the only cost is the clutter factor stemming from 
indecision about the most appropriate data structure for a given task."

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com