[ python-Bugs-881522 ] Shelve slow after 7/8000 key

SourceForge.net noreply at sourceforge.net
Thu Jan 22 13:56:02 EST 2004


Bugs item #881522, was opened at 2004-01-21 12:09
Message generated for change (Comment added) made by tim_one
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=881522&group_id=5470

Category: Extension Modules
Group: Python 2.3
Status: Open
>Resolution: None
Priority: 5
Submitted By: Marco Beri (marcoberi)
Assigned to: Gregory P. Smith (greg)
Summary: Shelve slow after 7/8000 key

Initial Comment:
After about 8.000 insertion shelve became really, really 
slow.
This happens only with 2.3.3 #51 on Windows, not with 
2.2 and with 2.3 on Linux.
I try with writeback True or False: same problem.
Help! :-))


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2004-01-22 13:56

Message:
Logged In: YES 
user_id=31435

The original question is why a BDB hash is some 30x slower 
under 2.3 than under 2.2 or 2.1, and that does appear 
specific to Windows.

Skip threw btrees into this too, but that complication doesn't 
appear relevant to the original report (despite marcoberi's 
hearsay 2004-01-21 18:57 comment -- others posted actual 
output, making clear that dbhash is used under all Python 
versions in test1skip).

I'll note in passing that the test case inserts keys in already-
mostly-sorted order, which is a friendly order for a btree-
based mapping.  To get back to the original report, ignore 
everything here concerning test3skip and btrees.

----------------------------------------------------------------------

Comment By: Gregory P. Smith (greg)
Date: 2004-01-22 13:32

Message:
Logged In: YES 
user_id=413

This problem is not specific to windows.  hashopen in the
test3skip.py test case is 10x slower than btopen on my
linux-alpha system.

I don't know why BerkeleyDB hash databases are so much
slower than B-Tree ones.  My best suggestion is:  if it
hurts, don't do that.  Use a btree rather thah hash database.

Running the python process under strace on linux reveals
nothing obvious (no system calls are being made during the
time hash open is consuming lots of cpu...

You'll have to ask sleepycat themselves if you want a real
answer as to why hash databases don't perform well.

----------------------------------------------------------------------

Comment By: Marco Beri (marcoberi)
Date: 2004-01-22 13:16

Message:
Logged In: YES 
user_id=588604

I get your same results under normal cmd: 7.07 seconds vs 
0.46.

[c:\tmp]timer & \python23\python test3skip.py hashopen & 
timer
Timer 1 on: 19.13.22
Timer 1 off: 19.13.29  Elapsed: 0.00.07,07

[c:\tmp]timer & \python23\python test3skip.py btopen & timer
Timer 1 on: 19.13.45
Timer 1 off: 19.13.45  Elapsed: 0.00.00,46


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2004-01-22 13:02

Message:
Logged In: YES 
user_id=44345

Try test3skip.py.  You run it like this:

    python test3skip.py hashopen
    python test3skip.py btopen

I ran it on win2k under cygwin so I could use the time command 
(but ran the Windows version of Python).  Using btopen was much 
faster.  I got rid of shelve to eliminate it and pickle as possible 
sources of problems.

$ time /cygdrive/c/Python23/python test3skip.py hashopen

real    0m6.801s
user    0m0.015s
sys     0m0.000s

Administrator at CYCLOPS ~/tmp
$ time /cygdrive/c/Python23/python test3skip.py btopen

real    0m0.345s
user    0m0.015s
sys     0m0.015s

I don't know if the relationship between real, user and sys time 
means anything on cygwin, but the reported real times are very 
repeatable and match my subjective feel of the elapsed time.  This 
suggests there's something fishy with either the underlying library 
or with __setitem__ when using hash files.

I'm assigning to Greg so he can take a peek.  As the bsddb/
pybsddb guy he might have some better insight (certainly better 
than me).

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2004-01-22 13:01

Message:
Logged In: YES 
user_id=44345

Try test3skip.py.  You run it like this:

    python test3skip.py hashopen
    python test3skip.py btopen

I ran it on win2k under cygwin so I could use the time command 
(but ran the Windows version of Python).  Using btopen was much 
faster.  I got rid of shelve to eliminate it and pickle as possible 
sources of problems.

$ time /cygdrive/c/Python23/python test3skip.py hashopen

real    0m6.801s
user    0m0.015s
sys     0m0.000s

Administrator at CYCLOPS ~/tmp
$ time /cygdrive/c/Python23/python test3skip.py btopen

real    0m0.345s
user    0m0.015s
sys     0m0.015s

I don't know if the relationship between real, user and sys time 
means anything on cygwin, but the reported real times are very 
repeatable and match my subjective feel of the elapsed time.  This 
suggests there's something fishy with either the underlying library 
or with __setitem__ when using hash files.

I'm assigning to Greg so he can take a peek.  As the bsddb/
pybsddb guy he might have some better insight (certainly better 
than me).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2004-01-22 12:29

Message:
Logged In: YES 
user_id=31435

FYI, on a Win98SE box, test1skip.py took about 30 seconds 
under 2.3.3, and about 1 second under both 2.2.3 and 2.1.3.  
Under 2.3.3, no significant time is taken by a.close(), so it's 
all in the loop.  It prints "dbhash" under all versions.

----------------------------------------------------------------------

Comment By: Marco Beri (marcoberi)
Date: 2004-01-22 02:30

Message:
Logged In: YES 
user_id=588604

I tried your version: 31.36 seconds vs 0.65.
Just to be sure I tried on three different computers with 
Windows 2000: same gap.

[c:\tmp]timer & \Python23\python test1skip.py & timer
Timer 1 on:  8.21.58
dbhash
Timer 1 off:  8.22.29  Elapsed: 0.00.31,36

[c:\tmp]timer & \Python22\python test1skip.py & timer
Timer 1 on:  8.22.40
dbhash
Timer 1 off:  8.22.41  Elapsed: 0.00.00,65


----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2004-01-21 19:28

Message:
Logged In: YES 
user_id=44345

Can't reproduce on Mac OS X.  I tried with 2.2, 2.3 and CVS using
attached test1skip.py (no writeback - 2.2 doesn't support it, no
import pickle - not used, no key prints - just muddies the water,
print whichdb's result).

The times are close enough to not worry me:

montanaro:tmp% time python2.3 test1.py
dbhash

real    0m1.927s
user    0m1.720s
sys     0m0.080s
montanaro:tmp% time python2.2 test1.py
dbhash

real    0m1.250s
user    0m0.850s
sys     0m0.360s
montanaro:tmp% time python test1.py
dbhash

real    0m2.179s
user    0m1.950s
sys     0m0.120s

Please try this modified version just to make sure we are both
looking at the same thing.



----------------------------------------------------------------------

Comment By: Marco Beri (marcoberi)
Date: 2004-01-21 18:57

Message:
Logged In: YES 
user_id=588604

Skip Montanaro discovered that whichdb repors bsddb185 
with python 2.2 and dbhash with 2.3.3.
So why is it so slow after few thousand keys?

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2004-01-21 13:24

Message:
Logged In: YES 
user_id=11105

Hm, are windows bugs automatically assigned to me ;-)??

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=881522&group_id=5470



More information about the Python-bugs-list mailing list