[python-ldap] ldap.OPT_DESC, async ops and paged search controls
Mark R Bannister
mark at proseconsulting.co.uk
Thu Jan 22 14:33:16 CET 2015
On 21/01/2015 09:16, Michael Ströder wrote:
> Mark R Bannister wrote:
>> I've been using the new ldap.OPT_DESC feature introduced in python-ldap 2.4.17
>> and have a question concerning the use of it with asynchronous search
>> operations and paged search controls.
> Never used that myself.
>
> Why are you using paging? This only makes sense if you want to retrieve more
> than 1000 entries from MS AD.
I've tested this now without paged search controls and get the same problem.
The directory server is OpenLDAP 2.4.30 on Solaris 11.2 (it's listening
on localhost).
I've attached two test scripts that examine this problem. The output of
testldap1.py run on a test DN that has 7 child entries:
ldap.bind complete, fd 4
ldap.search started
calling select with ([4], [], []) , returned: ([4], [], [])
ldap.result returned: (97, [])
calling select with ([4], [], []) , returned: ([4], [], [])
ldap.result returned: (None, None)
calling select with ([4], [], []) , returned: ([4], [], [])
ldap.result returned: (None, None)
calling select with ([4], [], []) , returned: ([4], [], [])
ldap.result returned: (None, None)
calling select with ([4], [], []) , returned: ([4], [], [])
ldap.result returned: (None, None)
calling select with ([4], [], []) , returned: ([4], [], [])
ldap.result returned: (None, None)
calling select with ([4], [], []) , returned: ([4], [], [])
ldap.result returned: (None, None)
calling select with ([4], [], []) , returned: ([4], [], [])
ldap.result returned: (None, None)
calling select with ([4], [], []) , returned: ([4], [], [])
ldap.result returned: (101, [ ... results here ... ])
So why is select being woken up once per entry? Why isn't select being
woken up once with the entire result set? This is a simple case, and
for 7 entries one could argue it doesn't matter, but when I'm dealing
with 80,000 entries, that's a lot of unnecessary wake-ups.
>> Does this seem right to you and is there anyway to optimise this? All 80,000
>> entries are taking about 15 seconds to read into Python using the python-ldap
>> module compared with 5 seconds for native C.
> For better comparison of the numbers could you please also test Python code
> without using the ldap.OPT_DESC feature:
>
> 1. using LDAPObject.search_ext_s()
>
> 2. using ldap.resiter
>
> 3. 1. and 2. with and without paging
>
> I'd also try to see what wakes up the select() by using wireshark.
>
> Ciao, Michael.
>
>
I can't use wireshark for this one because it's localhost. However,
truss works fine for me and demonstrates that each time ldap.result() is
called it reads one entry from the fd, then returns [None, None]. I
have to call ldap.result() as many times as there are entries before I
actually get any meaningful results. This doesn't seem right to me.
If I run the second attached test script (testldap2.py) and run truss on
that, I see the same behaviour under the covers - a poll() and a
separate read() for each individual entry. I haven't tried
ldap.resiter, but I'm sure it's the same as it uses LDAPObject.result3().
Perhaps this is some tuning I've missed from OpenLDAP, which seems to
want to drip-feed one result at a time ... or is this normal? It seems
quite inefficient to me.
Thanks,
Mark.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ldap/attachments/20150122/500c8ce5/attachment.html>
-------------- next part --------------
#!/usr/bin/python
# LDAP async search for all entries directly underneath given base
import sys
import ldap
from select import select
base = sys.argv[1]
filter = "objectclass=*"
scope = ldap.SCOPE_ONELEVEL
l = ldap.initialize("ldap://localhost")
l.bind("", "")
fd = l.get_option(ldap.OPT_DESC)
print "ldap.bind complete, fd %d" % fd
l.search(base, scope, filter)
print "ldap.search started"
d = None
while d is None or len(d) == 0:
print "calling select with ([%d], [], [])" % fd,
sys.stdout.flush()
r, w, e = select([fd], [], [])
print ", returned: %s" % str((r, w, e))
(t, d) = l.result(timeout=0)
print "ldap.result returned: %s" % str((t, d))
l.unbind()
-------------- next part --------------
#!/usr/bin/python
# LDAP async search for all entries directly underneath given base
import sys
import ldap
base = sys.argv[1]
filter = "objectclass=*"
scope = ldap.SCOPE_ONELEVEL
l = ldap.initialize("ldap://localhost")
l.bind_s("", "")
print "ldap.bind complete"
result = l.search_s(base, scope, filter)
print "ldap.search returned: %s" % str(result)
l.unbind()
More information about the python-ldap
mailing list