Performance penalty for using python-ldap

Mon Jun 16 21:41:11 CEST 2003

Hi,

I was tuning an LDAP directory for a client last week and had cause to run 
some before and after benchmarks.

Basically for a 3000 entry directory I wrote a python script which did the 
following:

listed each entry using the filter (cn=*) using python-ldap and also 
invoking the shell to use the ldapsearch command. These were done twice: 
running all attributes an just returning the cn attribute

did 3000 random lookups using (cn=exact-match), and then (cn=exact-match*) 
again using python-ldap and the ldapsearch command.

The searches were run twice on unloaded machines, the first time to populate 
caches, the second time as a rough best-performance figure

The findings were somewhat surprising.

In the list whole directory search. ldap-search was generally and 
consistently at least 30% faster than python-ldap. I.e. these figures apply 
before and after tuning the directory. Remember the python searches are 
pre-bound while ldapsearch binds each time it is called.

In the random lookup test, the performance figures were comparable but this 
compares calling python-ldap to do a search against spawning a shell, 
running ldpasearch, binding then doing the search, i.e. the command line 
search has a LOT more overhead.

I'm happy to run some tests to identify the cause to see if we can fix it, 
any suggestions where to start?

General conclusions from my tests:

python-ldap has a suprising performance penalty

searching is helped by having ample cache (doh!)

returning 1 attribute is much faster than returning all of them (doh!)

searching on indexed attributes helps a lot (doh!)

Ed