Large Dictionaries

Scott David Daniels scott.daniels at acm.org
Mon May 29 09:39:50 EDT 2006


Thomas Ganss wrote:
> Klaas schrieb:
>> 4. Insert your keys in sorted order.
> This advice is questionable -....
> 
> My gut feeling on this matter is:
> IF the insert times of pre-sorted values is far better
> than the times of unsorted values, there is a chance
> that the resulting tree is unbalanced: only 1 compare
> operation after insert and no re-balancing of the tree.
> 
> re-indexing will probably give you far better access times
> in this case. Another option is to drop non RI indexes used only
> for query optimization and recreate them after batch insert.

Don't use your gut for such issues.  Pre-sorted data is such
a common special case (in fact, the only easily describable
special case) that many systems handle this specially.  For
example, time timsort (Python's internal sort) on pre-sorted
data; you'll find it is handled faster than random data.

If you are using your gut without testing, go ahead and
presort.  In any case, reading documents and testing beats
gut feels every time.

--Scott David Daniels
scott.daniels at acm.org



More information about the Python-list mailing list