Bug in BSDDB?

Robin Dunn robin at alldunn.com
Sat Dec 16 02:37:20 EST 2000


"Bryan Mongeau" <bryan at eevolved.com> wrote in message
news:I9z_5.9426$0U2.369913 at news20.bellglobal.com...
> Greetings Python gurus,
>
> I was doing some benchmarks of the dbhash module when I came across some
> unsettling behavior in Windows 98:
>
> 1 - When shelve creates a new database with the "c" flag, it can only add
> 85652 keys to the dictionary before raising an error in bsddb.
>
> 2- If that error is caught and the write operation tried a second time,
the
> key is written just fine.
>
> 3- A dbhash file with 100000 simple records is 140 MB in Windows, compared
> to 21 MB in Unix. If you attempt to add more than 100000 keys in Windows,
> the size of the file quickly balloons to > 1 GIG. Keep in mind I was using
> unicode strings and cPickle.
>
> 4- An attempt to write more than 100000 keys will result in errors
> described in point 1, with ever-increasing frequency until eventually
> python itself will explode (illegal operation).
>

I don't know what's going on with bsddb, but I tweaked your samples to run
with my new bsddb3, and to use the dbshelve module included in that package.
(You can get it at http://pybsddb.sourceforge.net/)  The first sample uses
the default DB_HASH access method, and the second uses a DB_BTREE.  This was
run on a Win2K system so there may be some behavior differences for win98,
but I'm not sure to what extent.

As you can see from the outputs below the tests performed without crashing
or getting exceptions all the way up to 250000 and 500000 records.  The only
anomalous behavior is that the time for the hash test of 250000 records was
an order of magnitude larger than expected given the time progression for
the previous tests.  It may be that there were a lot of hash collisions...
Eventhough the keys were being added in sequential order the BTree tests
performed very well.

Other advantages to using bsddb3 are things like database cursors for
iterating over all the records in an efficient manner, the ability for
multiple processes and or threads to access the DB at once with automatic
single writer, multi reader locking semantics (you just have to set a flag),
or full transactional capabilities with logging, commit and rollback.

Here's my test run:

[c:\Temp\Others] python dbshelvetest1.py 10000
writing 10000 records
write took  2.31299996376  seconds
reading 10000 records
read took  1.34200000763  seconds

[c:\Temp\Others] python dbshelvetest1.py 50000
writing 50000 records
write took  14.2899999619  seconds
reading 50000 records
read took  7.41100001335  seconds

[c:\Temp\Others] python dbshelvetest1.py 100000
writing 100000 records
write took  38.7259999514  seconds
reading 100000 records
read took  15.0110000372  seconds

[c:\Temp\Others]  python dbshelvetest1.py 250000
writing 250000 records
write took  311.909000039  seconds
reading 250000 records
read took  33.1779999733  seconds

[c:\Temp\Others] dir testdb.*

 Volume in drive C is unlabeled      Serial number is A06A:886B
 Directory of  C:\Temp\Others\testdb.*

12/15/00  22:54       1,368,064  testdb.10000
12/15/00  22:55       5,193,728  testdb.50000
12/15/00  22:56      10,469,376  testdb.100000
12/15/00  23:04      41,361,408  testdb.250000
     58,392,576 bytes in 4 files and 0 dirs    58,392,576 bytes allocated
  4,029,243,392 bytes free

[c:\Temp\Others]



[c:\Temp\Others] python dbshelvetest2.py 10000
writing 10000 records
write took  1.08200001717  seconds
reading 10000 records
read took  0.570999979973  seconds

[c:\Temp\Others] python dbshelvetest2.py 50000
writing 50000 records
write took  4.43599998951  seconds
reading 50000 records
read took  2.8939999342  seconds

[c:\Temp\Others] python dbshelvetest2.py 100000
writing 100000 records
write took  8.70200002193  seconds
reading 100000 records
read took  5.84899997711  seconds

[c:\Temp\Others] python dbshelvetest2.py 250000
writing 250000 records
write took  21.1699999571  seconds
reading 250000 records
read took  14.8210000992  seconds

[c:\Temp\Others] python dbshelvetest2.py 500000
writing 500000 records
write took  40.699000001  seconds
reading 500000 records
read took  29.7120000124  seconds

[c:\Temp\Others] dir testdbbt.*

 Volume in drive C is unlabeled      Serial number is A06A:886B
 Directory of  C:\Temp\Others\testdbbt.*

12/15/00  23:09       1,466,368  testdbbt.10000
12/15/00  23:10       7,299,072  testdbbt.50000
12/15/00  23:10      14,598,144  testdbbt.100000
12/15/00  23:11      37,601,280  testdbbt.250000
12/15/00  23:12      75,956,224  testdbbt.500000
    136,921,088 bytes in 5 files and 0 dirs    136,921,088 bytes allocated
  3,892,310,016 bytes free

[c:\Temp\Others]


--
Robin Dunn
Software Craftsman
robin at AllDunn.com
http://wxPython.org     Java give you jitters?
http://wxPROs.com        Relax with wxPython!





begin 666 dbshelvetest1.py
M(R$@+W5S<B]B:6XO<'ET:&]N#0H-"FEM<&]R="!S>7,L('1I;64-"F9R;VT@
M8G-D9&(S(&EM<&]R="!D8G-H96QV90T*#0IN=6U%<G)O<G,],0T*<F5C8V]U
M;G0@/2!I;G0H<WES+F%R9W9;,5TI#0H-"F1B(#T at 9&)S:&5L=F4N;W!E;B at B
M=&5S=&1B+B5D(B E(')E8V-O=6YT+")C(BD-"G1S(#T@=&EM92YT:6UE*"D-
M"@T*<')I;G0@(G=R:71I;F<@)60@<F5C;W)D<R(@)2!R96-C;W5N= T*#0IF
M;W(@>"!I;B!R86YG92AR96-C;W5N="DZ#0H@(&1A=&$]6W4B<V]M92!T97-T
M(&1A=&$B+"!U(G-O;64@;6]R92!T97-T(&1A=&$B+"!S='(H>"E=#0H@(&ME
M>3UX#0H@('=H:6QE(#$Z#0H@(" @=')Y. at T*(" @(" @9&);<W1R*&ME>2E=
M(#T at 9&%T80T*(" @(" @8G)E86L-"B @("!E>&-E<'0Z#0H@(" @("!P<FEN
M=" B97)R;W(B+"!S='(H;G5M17)R;W)S*2P@(F%T(')E8V]R9"(L('-T<BAX
M*0T*(" @(" @;G5M17)R;W)S*STQ#0H-"F1B+G-Y;F,H*0T*#0IT9B ]('1I
M;64N=&EM92 at I#0IP<FEN=" B=W)I=&4@=&]O:R B+"!S='(H=&8M=',I+" B
M('-E8V]N9',B#0H-"G!R:6YT(")R96%D:6YG("5D(')E8V]R9',B("4@<F5C
M8V]U;G0-"G1S(#T@=&EM92YT:6UE*"D-"F9O<B!X(&EN(')A;F=E*')E8V-O
M=6YT*3H-"B @9&%T82 ](&1B6W-T<BAX*5T-"@T*=&8@/2!T:6UE+G1I;64H
M*0T*<')I;G0@(G)E860@=&]O:R B+"!S='(H=&8M=',I+" B('-E8V]N9',B
4#0H-"F1B+F-L;W-E*"D-"@T*#0H`
`
end

begin 666 dbshelvetest2.py
M(R$@+W5S<B]B:6XO<'ET:&]N#0H-"FEM<&]R="!S>7,L('1I;64-"F9R;VT@
M8G-D9&(S(&EM<&]R="!D8G-H96QV92P at 9&(-"@T*;G5M17)R;W)S/3$-"G)E
M8V-O=6YT(#T@:6YT*'-Y<RYA<F=V6S%=*0T*#0ID8B ](&1B<VAE;'9E+F]P
M96XH(G1E<W1D8F)T+B5D(B E(')E8V-O=6YT+" B8R(L(&9I;&5T>7!E/61B
M+D1"7T)44D5%*0T*=',@/2!T:6UE+G1I;64H*0T*#0IP<FEN=" B=W)I=&EN
M9R E9"!R96-O<F1S(B E(')E8V-O=6YT#0H-"F9O<B!X(&EN(')A;F=E*')E
M8V-O=6YT*3H-"B @9&%T83U;=2)S;VUE('1E<W0 at 9&%T82(L('4B<V]M92!M
M;W)E('1E<W0 at 9&%T82(L('-T<BAX*5T-"B @:V5Y/7 at -"B @=VAI;&4@,3H-
M"B @("!T<GDZ#0H@(" @("!D8EMS='(H:V5Y*5T@/2!D871A#0H@(" @("!B
M<F5A:PT*(" @(&5X8V5P=#H-"B @(" @('!R:6YT(")E<G)O<B(L('-T<BAN
M=6U%<G)O<G,I+" B870@<F5C;W)D(BP@<W1R*'@I#0H@(" @("!N=6U%<G)O
M<G,K/3$-"@T*9&(N<WEN8R at I#0H-"G1F(#T@=&EM92YT:6UE*"D-"G!R:6YT
M(")W<FET92!T;V]K("(L('-T<BAT9BUT<RDL("(@<V5C;VYD<R(-"@T*<')I
M;G0@(G)E861I;F<@)60@<F5C;W)D<R(@)2!R96-C;W5N= T*=',@/2!T:6UE
M+G1I;64H*0T*9F]R('@@:6X@<F%N9V4H<F5C8V]U;G0I. at T*("!D871A(#T@
M9&);<W1R*'@I70T*#0IT9B ]('1I;64N=&EM92 at I#0IP<FEN=" B<F5A9"!T
M;V]K("(L('-T<BAT9BUT<RDL("(@<V5C;VYD<R(-"@T*9&(N8VQO<V4H*0T*
$#0H-"@``
`
end




More information about the Python-list mailing list