[Python-3000] BList PEP

Josiah Carlson jcarlson at uci.edu
Tue May 1 09:53:51 CEST 2007


"Daniel Stutzbach" <daniel at stutzbachenterprises.com> wrote:
> Title: BList: A faster list-like type
> 1. Add it to the collections module, or

+1

> 2. Replace the existing list type

-1

For types that are used every day, I can't help but prefer a simple
implementation.  Among the features of the current Python list
implementation is that small lists (0, 4, 8, 16 elements) use very
little space.  Your current BList implementation uses a fixed size for
the smallest sequence, 128, which would offer worse memory performance
for applications where many small lists are common.


> ========= ================                     ====================
> Operation Array-based list                     BList
> ========= ================                     ====================
> Copy      O(n)                                 **O(1)**
> Append    **O(1)**                             O(log n)
> Insert    O(n)                                 **O(log n)**
> Get Item  **O(1)**                             O(log n)
> Set Item  **O(1)**                             **O(log n)**
what's going on with this pair?                  ^^        ^^

> Del Item  O(n)                                 **O(log n)**
> Iteration O(n)                                 O(n)
> Get Slice O(k)                                 **O(log n)**
> Del Slice O(n)                                 **O(log n)**
> Set Slice O(n+k)                               **O(log k + log n)**
> Extend    O(k)                                 **O(log k + log n)**
> Sort      O(n log n)                           O(n log n)
> Multiply  O(nk)                                **O(log k)**
> ========= ================                     ====================



> The performance for the LIFO use case could be improved to O(n) time,

You probably want to mention "over n appends/pop(-1)s".  You also may
want to update the above chart to take into consideration that you plan
on doing that modification.

Generally, the BList is as fast or faster asymptotically than a list for
everything except random getitem/setitem; at which point it is O(logn)
rather than O(1).  You may want to explicitly state this in some later
version.


> Implementation
> ==============
> 
> The BList is based on the B+Tree data structure.  The BList is a wide,
> bushy tree where each node contains an array of up to 128 pointers to
> its children.  If the node is a leaf, its children are the
> user-visible objects that the user has placed in the list.  If node is
> not a leaf, its children are other BList nodes that are not
> user-visible.  If the list contains only a few elements, they will all
> be a children of single node that is both the root and a leaf.  Since
> a node is little more than array of pointers, small lists operate in
> effectively the same way as an array-based data type and share the
> same good performance characteristics.

In the collections module, there exists a deque type.  This deque type
more or less uses a sequence of 64 pointers, the first two of which are
linked list pointers to the previous and next block of pointers.  I
don't know how much tuning was done to choose this value of 64, but you
may consider reducing the number of pointers to 64 for the the same
cache/allocation behavior.


 - Josiah



More information about the Python-3000 mailing list