bsddb3 database file, are there any unexpected file size limits occuring in practice?

Claudio Grondi claudio.grondi at freenet.de
Tue Feb 28 06:05:37 EST 2006


Klaas wrote:
> Claudio writes:
> 
>>I am on a Windows using the NTFS file system, so I don't expect problems
>>with too large file size.
> 
> 
> how large can files grow on NTFS?  I know little about it.
No practical limit on current harddrives. i.e.:
Maximum file size
   Theory:          16 exabytes  minus  1 KB (2**64 bytes minus  1 KB)
   Implementation:  16 terabytes minus 64 KB (2**44 bytes minus 64 KB)
Maximum volume size
   Theory:                                    2**64 clusters minus 1
   Implementation: 256 terabytes minus 64 KB (2**32 clusters minus 1)
Files per volume
   4,294,967,295 (2**32 minus 1 file)
> 
> 
>>(I suppose it in having only 256 MB RAM available that time) as it is
>>known that MySQL databases larger than 2 GByte exist and are in daily
>>use :-( .
> 
> 
> Do you have more ram now?  
I have now 3 GByte RAM on my best machine, but Windows allows a process 
not to exceed 2 GByte, so in practice a little bit less than 2 GByte are 
the actual upper limit.

I've used berkeley dbs up to around 5 gigs
> in size and they performed fine.  However, it is quite important that
> the working set of the database (it's internal index pages) can fit
> into available ram.  If they are swapping in and out, there will be
> problems.
Thank you very much for your reply.

In my current project I expect the data to have much less volume than 
the indexes. In my failed MySQL project the size of the indexes was 
appr. same as the size of the indexed data (1 GByte).
In my current project I expect the total size of the indexes to exceed 
by far the size of the data indexed, but because Berkeley does not 
support multiple indexed columns (i.e. only one key value column as 
index) if I access the database files one after another (not 
simultaneously) it should work without problems with RAM, right?

Do the data volume required to store the key values have impact on the 
size of the index pages or does the size of the index pages depend only 
on the number of records and kind of the index (btree, hash)?

In last case, I were free to use for the key values also larger sized 
data columns without running into the problems with RAM size for the 
index itself, else I were forced to use key columns storing a kind of 
hash to get their size down (and two dictionaries instead of one).

What is the upper limit of number of records in practice?

Theoretical, as given in the tutorial, Berkeley is capable of holding up 
to billions of records with sizes of up to 4 GB each single record with 
tables up to total storage size of 256 TB of data.
By the way: are billions in the given context multiple of 1.000.000.000 
or of 1.000.000.000.000 i.e. in US or British sense?

I expect the number of records in my project in the order of tens of 
millions (multiple of 10.000.000).

I would be glad to hear if someone has already successful run Berkeley 
with this or larger amount of records and how much RAM and which OS had 
the therefore used machine (I am on Windows XP with 3 GByte RAM).

Claudio

> 
> -Mike
> 



More information about the Python-list mailing list