implementing large trees

Bengt Richter bokr at oz.net
Mon Jul 15 15:23:17 EDT 2002


On Mon, 15 Jul 2002 15:22:32 +0200, Michael Schmitt <ms at NOMAIL.de> wrote:

>Hello.
>
>I'm trying to implement a large tree data structure, which does not fit 
>into memory. Based on an in-memory implementation, which has two 
>dictionarys, one for all nodes and one for all edges, the first idea was to 
>use shelve to have disc-based dictionaries.
>
>But for usual tree operations (depth-first traversal, breadth-first 
>traversal) the shelve based tree was much slower, than the tree using two 
>dictionaries and relying on swapping.
>
>What is the "best" way to implement such a tree? Is it possible to improve 
>performance without knowing how the tree is traversed? What about having a 
>breadth-first and a depth-first cache, that prefetch nodes and edges after 
>each access?
>
>Thanks for any hints.
- What about buying more memory ;-)
- Are your nodes and edges class instances? Are you using __slots__ ?
- What are you using as keys in your dictionaries?
- Do you need to touch every node as you follow edges?
- How many total nodes and edges?
- How many edges per node? Min? Max? Usual?
- Is it very deep or very wide?
- Is the tree essentially static for many traversals, so that it might pay to do
  some special overall pre-procesing to speed traversal?
- Is part of the tree "hot" and other parts hardly ever visited? Would caching pay off?
- What does your tree actually represent IRL?

Regards,
Bengt Richter



More information about the Python-list mailing list