Large Dictionaries

Chris Foote chris at foote.com.au
Thu May 18 02:06:02 EDT 2006


Claudio Grondi wrote:
> Chris Foote wrote:
>> Klaas wrote:
>>
>>>> 22.2s  20m25s[3]
>>>
>>> 20m to insert 1m keys?  You are doing something wrong.
>>
>> I've put together some simplified test code, but the bsddb
>> module gives 11m for 1M keys:
>>
> I have run your code for the bsddb on my P4 2.8 GHz and have got:
> Number generator test for 1000000 number ranges
>         with a maximum of 3 wildcard digits.
> Wed May 17 16:34:06 2006 dictionary population started
> Wed May 17 16:34:14 2006 dictionary population stopped, duration 8.4s
> Wed May 17 16:34:14 2006 StorageBerkeleyDB population started
> Wed May 17 16:35:59 2006 StorageBerkeleyDB population stopped, duration 
> 104.3s
 >
> Surprising here, that the dictionary population gives the same time, but 
> the BerkeleyDB inserts the records 6 times faster on my computer than on 
> yours. I am running Python 2.4.2 on Windows XP SP2, and you?

Fedora core 5 with ext3 filesystem.  The difference will be due to
the way that Windows buffers writes for the filesystem you're using
(it sounds like you're using a FAT-based file system).

>> Number generator test for 1000000 number ranges
>>         with a maximum of 3 wildcard digits.
>> Wed May 17 22:18:17 2006 dictionary population started
>> Wed May 17 22:18:26 2006 dictionary population stopped, duration 8.6s
>> Wed May 17 22:18:27 2006 StorageBerkeleyDB population started
>> Wed May 17 22:29:32 2006 StorageBerkeleyDB population stopped, 
>> duration 665.6s
>> Wed May 17 22:29:33 2006 StorageSQLite population started
>> Wed May 17 22:30:38 2006 StorageSQLite population stopped, duration 65.5s
> As I don't have SQLite installed, it is interesting to see if the factor 
> 10 in the speed difference between BerkeleyDB and SQLite can be 
> confirmed by someone else.
> Why is SQLite faster here? I suppose, that SQLite first adds all the 
> records and builds the index afterwards with all the records there (with 
> db.commit()).

SQLite is way faster because BerkeleyDB always uses a disk file,
and SQLite is in RAM only.

Cheers,
Chris



More information about the Python-list mailing list