Internals and complexity of types, containers and algorithms

"Martin v. Löwis" martin at v.loewis.de
Mon Jun 25 17:33:44 EDT 2007


> - Is there a brief description (not source) how the types tuple, 
> string, list and dict are represented internally. Is a list behind 
> the scenes just a double linked list or an array or a mixture of 
> these things?

Sure, see below:

- tuples are represented as arrays, with a single block for the
  entire objects (object header, tuple size, and data)
- list are represented as arrays, with two memory blocks:
  one for object header and sizes, and the other one for the
  "guts", i.e. the actual data. The list uses over-allocation,
  to avoid resizing on each addition.
- strings are implemented as arrays, with a single block for
  the entire string. In addition to header, size, and data,
  it also contains a cached hash and a pointer to the interned
  version of the string (if any).
- dicts are implemented as hash tables, with open addressing.

> Is a dict a tree or a hash array and what is the 
> collision mechanism? 

It's a hash array. Not sure what you mean by "collision
mechanism" - perhaps "open addressing" is the answer?
In case you also want to know the probing:

  j(n+1) = 5*j(n) + 1 + perturb(n)
  perturb(n+1) = perturb(n) >> 5

> Is the string an array with some header info?

Exactly so.

> - What is the big-O complexity of the most common algorithms for 
> these types and containers? Is it O(n), O(n*log(n)) or O(n**2)? 
> I mean inserting, appending (front or back), finding something 
> and so on.

O(1) for indexed access in lists, tuples, and strings, and for
computing the length of any container.
O(n) for finding in lists, tuples, and strings. O(1) for inserting
and finding in dictionaries, assuming few collisions, except
when rehashing occurs. Amortized O(1) for inserting into lists.
O(n) for removing from lists.
O(1) for inserting inot

> - Is this information somewhere in the web? 

Most likely.

> Why is it not written in the documentation?

Because nobody has contributed such documentation.

> - When I want to use a representation of a game board like chess 
> in C/C++ I use an array[64] or bigger of int or char for the pieces. 
> What data structure or type would be useful in Python when the 
> performance ist most important? Is it list or string or an array 
> from a library or what else?

Depends on whether and how you want to modify the chessboard.
If you do want to modify, use a list.

Regards,
Martin



More information about the Python-list mailing list