defect
Michael Ströder
michael at stroeder.com
Wed Jun 26 01:26:41 CEST 2002
Derrick 'dman' Hudson wrote:
> I tried using python-ldap today (1.9.999.pre04-1, python 2.1.3-3), but
> it is way too inefficient. A simple search that results in 2 entries
> returned takes 30 seconds.
If that would be normal I would not use python-ldap. Let's see.
> Watching with top shows nearly 100% CPU
> usage for the 30 seconds, on an otherwise idle Athlon XP 1800+.
> OpenLDAP (2.0) is running on that same machine, however using
> ldapsearch or exim yields immediate results.
Frankly this is a very unprecise performance measurement. As I
wrote in one of my former postings there might be performance
issues due to the nature of the async reimplementation of
LDAPObject.result(). And it might look bad that your CPU meter
shows 100% load. But I also suspect a few other effects if you
just say "30 seconds vs. immediate results".
> '(mailGroupLocalPart=%s)' % listname ,
Is attribute mailGroupLocalPart indexed? An equality index should
be sufficient here. Also note that OpenLDAP 2's ldbm backend
implements server-side DB caching.
> $ time ./maillist.py AITP
> dann , derrick
>
> real 0m36.555s
> user 0m32.990s
> sys 0m3.490s
>
> $ time ldapsearch -x -h localhost -b "ou=People,o=International Teams"
> -s one "(mailGroupLocalPart=aitp)" "uid" | grep uid:
> uid: dann
> uid: derrick
>
> real 0m0.033s
> user 0m0.000s
> sys 0m0.010s
>
> I think this rather clearly points to python-ldap as the culprit.
Hmm, I have some doubts. Let's see.
> I
> need to do some integration of LDAP and some web-based programs, and
> would like to work with python, but this sort of performance hit just
> won't be usable.
As you might have noticed I'm doing web programming with
python-ldap. ;-) I'm using web2ldap for maintaining and searching
my personal address book and it's pretty responsive when using a
fast browser. I'm also browsing very large data sets (>150000).
Just to give you a clue what I'm actually doing with python-ldap
in a commercial pilot project: I'm scanning 170000 entries in far
less than an hour (mainly just reading the uid attribute). I'm
doing diffing whole entries at a rate of 50 entries/second (some
other work with a SQL DB is involved here). The process runs on a
P-III 450 Mhz box against a 4-CPU, 1GB RAM server running iPlanet
Directory Server 5.1.
> I'm willing to help with the code, if you point me
> to the interesting parts (and help me learn the C API of python and
> openldap as I go).
To find out the interesting parts one has to do proper performance
measuring.
And I would be really glad to see some *real* numbers. Please take
this advice to produce numbers I can take serious:
1. Eliminate all disk access => turn off all logging.
2. Eliminate caching issues => do many searches, throw away first
result.
Example: During my former expirements with large group entries
(>60000 member attribute values on OpenLDAP and 200000 on NS DS
4.16) I experienced a 30+ seconds interval when accessing the
group entry for the first time. Later the CompareRequest was done
much faster.
3. Eliminate DB backend issues => only search RootDSE.
(This hint by Kurt Zeilenga.)
4. Maximize performance impact of python-ldap => use faster LDAP
server.
I took some numbers on my P-III laptop against a locally installed
Netscape Directory Server 4.16SP1 which is much faster than recent
OpenLDAP. Test script is attached. There are three test cases
especially for the guys who are blaming python-ldap for bad
performance but are reconnecting to the LDAP server for each
query. ;-)
*** Read the RootDSE on same connection
279.630273 searches/second
*** Read the RootDSE on newly created connection without extra
simple bind
171.760053 searches/second
*** Read the RootDSE on newly created connection with an extra
simple bind
144.559940 searches/second
Now without the async implementation of LDAPObject.result()
directly wrapping the built-in C implementation:
*** Read the RootDSE on same connection
464.693726 searches/second
*** Read the RootDSE on newly created connection without extra
simple bind
248.286182 searches/second
*** Read the RootDSE on newly created connection with an extra
simple bind
202.098080 searches/second
The rates are obviously much higher (+40%..+65%) but note that we
did everything here to make python-ldap look really bad. Under
real-world conditions with real DB backend activity, server
logging on disk and whatever the percentage of the higher rate
looks surely different.
=> I have yet to see some serious numbers proving the "30 seconds
vs. immediate results".
Ciao, Michael.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: performance.py
URL: <http://mail.python.org/pipermail/python-ldap/attachments/20020626/c300f33f/attachment.ksh>
More information about the python-ldap
mailing list