bsddb3 database file, are there any unexpected file size limits occuring in practice?

Claudio Grondi claudio.grondi at freenet.de
Thu Feb 23 05:14:22 EST 2006


Klaas wrote:
> Claudio Grondi wrote:
> 
> 
>>Beside the intended database file
>>   databaseFile.bdb
>>I see in same directory also the
>>   __db.001
>>   __db.002
>>   __db.003
>>files where
>>   __db.003 is ten times as larger as the databaseFile.bdb
>>and
>>   __db.001 has the same size as the databaseFile.bdb .
> 
> 
> I can't tell you exactly what each is, but they are the files that the
> shared environment (DBEnv) uses to coordinate multi-process access to
> the database.  In particular, the big one is likely the mmap'd cache
> (which defaults to 5Mb, I believe).
> 
> You can safely delete them, but probably shouldn't while your program
> is executing.
> 
> 
>>Is there any _good_ documentation of the bsddb3 module around beside
>>this provided with this module itself, where it is not necessary e.g. to
>>guess, that C integer value of zero (0) is represented in Python by the
>>value None returned in case of success by db.open() ?
> 
> 
> This is the only documentation available, AFAIK:
> http://pybsddb.sourceforge.net/bsddb3.html
> 
> For most of the important stuff it is necessary to dig into the bdb
> docs themselves.
Thank you for the reply.

Probably to avoid admitting, that the documentation is weak a positive 
way of stating this was found by using the phrase:

   "Berkeley DB was designed by programmers, for programmers."

So I have to try to get an excavator ;-) to speed up digging the docs 
and maybe even the source, right?

Are there online somewhere any useful simple examples of applications 
using the Berkeley DB I could learn from?

I am especially interested in using the multimap feature activated using 
db.set_flags(bsddb3.db.DB_DUPSORT) and fear, that after the database 
file size will grow during mapping tokens to the files they occur in (I 
have appr. 10 million files which I want to build a search index for) I 
will hit some unexpected limits and the project will fail like it 
happened to me once in the past when I tried to use MySQL for similar 
purpose (after the database file has grown over 2 GByte MySQL just began 
to hang when trying to add some more records).
I am on a Windows using the NTFS file system, so I don't expect problems 
with too large file size. In between I have also already working Python 
code performing the basic database operations I will need to feed and 
query the database.
Has someone used Berkeley DB for similar purpose and can tell me, that 
actually in practice (not in theory stated in the feature list of the 
Berkeley DB) I must not fear any problems?
It took me some days of continuous updating the MySQL database to see, 
that there is an unexpected strange limit for the database file size. I 
still have no idea what the actual cause of the problem with MySQL was 
(I suppose it in having only 256 MB RAM available that time) as it is 
known that MySQL databases larger than 2 GByte exist and are in daily 
use :-( .

This are the reasons why I would be glad to hear how to avoid running 
into a similar problem again _before_ I start to torture my machine with 
filling the appropriate Berkeley DB database with entries.

Claudio

> 
> -Mike
> 



More information about the Python-list mailing list