huge dictionary -> bsddb/pickle question

Fri Jun 15 06:15:25 EDT 2007

In <1181895778.031710.58520 at o11g2000prd.googlegroups.com>, lazy wrote:

> I have a dictionary something like this,
> 
> key1=>{key11=>[1,2] , key12=>[6,7] , ....  }
> For lack of wording, I will call outer dictionary as dict1 and its
> value(inner dictionary) dict2 which is a dictionary of small fixed
> size lists(2 items)
> 
> The key of the dictionary is a string and value is another dictionary
> (lets say dict2)
> dict2 has a string key and a list of 2 integers.
> 
> Im processesing  HUGE(~100M inserts into the dictionary) data.
> I tried 2 options both seem to be slower and Im seeking suggestions to
> improve the speed. The code is sort of in bits and pieces, so Im just
> giving the idea.
> 
> […]
> 
> This is not getting to speed even with option 2. Before inserting, I
> do some processing on the line, so the bottleneck is not clear to me,
> (i.e in processing or inserting to db). But I guess its mainly because
> of pickling and unpickling.
> 
> Any suggestions will be appreciated :)

I guess your guess about the pickling as bottleneck is correct but
measuring/profiling will give more confidence.

Maybe another database than bsddb might be useful here.  An SQL one like
SQLite or maybe an object DB like zodb or Durus.

Ciao,
	Marc 'BlackJack' Rintsch