defect

Michael Ströder michael at stroeder.com
Wed Jun 26 01:26:41 CEST 2002


Derrick 'dman' Hudson wrote:
> I tried using python-ldap today (1.9.999.pre04-1, python 2.1.3-3), but
> it is way too inefficient.  A simple search that results in 2 entries
> returned takes 30 seconds.

If that would be normal I would not use python-ldap. Let's see.

> Watching with top shows nearly 100% CPU
> usage for the 30 seconds, on an otherwise idle Athlon XP 1800+.
> OpenLDAP (2.0) is running on that same machine, however using
> ldapsearch or exim yields immediate results.  

Frankly this is a very unprecise performance measurement. As I 
wrote in one of my former postings there might be performance 
issues due to the nature of the async reimplementation of 
LDAPObject.result(). And it might look bad that your CPU meter 
shows 100% load. But I also suspect a few other effects if you 
just say "30 seconds vs. immediate results".

>                 '(mailGroupLocalPart=%s)' % listname ,

Is attribute mailGroupLocalPart indexed? An equality index should 
be sufficient here. Also note that OpenLDAP 2's ldbm backend 
implements server-side DB caching.

> $ time ./maillist.py AITP
> dann , derrick
> 
> real    0m36.555s
> user    0m32.990s
> sys     0m3.490s
> 
> $ time ldapsearch -x -h localhost -b "ou=People,o=International Teams"
> -s one "(mailGroupLocalPart=aitp)" "uid" | grep uid: 
> uid: dann
> uid: derrick
> 
> real    0m0.033s
> user    0m0.000s
> sys     0m0.010s
> 
> I think this rather clearly points to python-ldap as the culprit.

Hmm, I have some doubts. Let's see.

>  I
> need to do some integration of LDAP and some web-based programs, and
> would like to work with python, but this sort of performance hit just
> won't be usable.

As you might have noticed I'm doing web programming with 
python-ldap. ;-) I'm using web2ldap for maintaining and searching 
my personal address book and it's pretty responsive when using a 
fast browser. I'm also browsing very large data sets (>150000).

Just to give you a clue what I'm actually doing with python-ldap 
in a commercial pilot project: I'm scanning 170000 entries in far 
less than an hour (mainly just reading the uid attribute). I'm 
doing diffing whole entries at a rate of 50 entries/second (some 
other work with a SQL DB is involved here). The process runs on a 
P-III 450 Mhz box against a 4-CPU, 1GB RAM server running iPlanet 
Directory Server 5.1.

>  I'm willing to help with the code, if you point me
> to the interesting parts (and help me learn the C API of python and
> openldap as I go).

To find out the interesting parts one has to do proper performance 
measuring.

And I would be really glad to see some *real* numbers. Please take 
this advice to produce numbers I can take serious:

1. Eliminate all disk access => turn off all logging.

2. Eliminate caching issues => do many searches, throw away first 
result.
Example: During my former expirements with large group entries 
(>60000 member attribute values on OpenLDAP and 200000 on NS DS 
4.16) I experienced a 30+ seconds interval when accessing the 
group entry for the first time. Later the CompareRequest was done 
much faster.

3. Eliminate DB backend issues => only search RootDSE.
(This hint by Kurt Zeilenga.)

4. Maximize performance impact of python-ldap => use faster LDAP 
server.

I took some numbers on my P-III laptop against a locally installed 
Netscape Directory Server 4.16SP1 which is much faster than recent 
OpenLDAP. Test script is attached. There are three test cases 
especially for the guys who are blaming python-ldap for bad 
performance but are reconnecting to the LDAP server for each 
query. ;-)

*** Read the RootDSE on same connection
    279.630273 searches/second
*** Read the RootDSE on newly created connection without extra 
simple bind
    171.760053 searches/second
*** Read the RootDSE on newly created connection with an extra 
simple bind
    144.559940 searches/second

Now without the async implementation of LDAPObject.result() 
directly wrapping the built-in C implementation:

*** Read the RootDSE on same connection
    464.693726 searches/second
*** Read the RootDSE on newly created connection without extra 
simple bind
    248.286182 searches/second
*** Read the RootDSE on newly created connection with an extra 
simple bind
    202.098080 searches/second

The rates are obviously much higher (+40%..+65%) but note that we 
did everything here to make python-ldap look really bad. Under 
real-world conditions with real DB backend activity, server 
logging on disk and whatever the percentage of the higher rate 
looks surely different.

=> I have yet to see some serious numbers proving the "30 seconds 
vs. immediate results".

Ciao, Michael.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: performance.py
URL: <http://mail.python.org/pipermail/python-ldap/attachments/20020626/c300f33f/attachment.ksh>


More information about the python-ldap mailing list