[python-ldap] ANN: python-ldap 2.5.2

Tue Nov 21 05:18:43 EST 2017

On 11/21/2017 12:55 AM, Michael Ströder wrote:
> John Dennis wrote:
>> On 11/20/2017 02:23 PM, Michael Ströder wrote:
>>> Hmm. The main obstacle for back-porting pyldap is that I'd like to keep
>>> python-ldap binary-only and still let the calling app do the Unicode
>>> decode/encode stuff if needed. It seems you're endorsing the opposite
>>> way.
>>
>> Why? What is your rationale for that?
>>
>> It seems entirely appropriate and user friendly for the binding to
>> perform the UTF-8 encode/decode.
> 
> As Raphaël also pointed out it's a bit more complicated.
> 
> Even in my higher-level code I often e.g. treat DNs or similar values
> opaque which are internally sent or received through LDAP controls. In
> this case decoding and re-encoding is unnecessary and error-prone. Many
> code parts have to be changed. Code passing in or out to the
> higher-level module should be Unicode if appropriate (if needed).
> 
>> Asking users to understand *and* never forget to apply all
>> endcode/decode operations when handling LDAP data is large opportunity
>> for errors as has been shown to be prevalent in the existing binding
>> usage. Plus, doing the endcode/decode in the binding is super easy. So
>> why not if it's easy and has been shown to greatly eliminate errors?
> 
> Hmm, please don't get it wrong. But your comment sounds a bit that
> you're mostly handling simple searches with strings going in and out
> which are known to be UTF-8. But that's not the case in general.

That's a wrong assumption on your side. We all agree here that attribute 
*values* should, at the python-ldap level, always be byte strings.

Now, the question is about DNs, attribute names, and other data 
specified as UTF-8.
I argue that if you wish to port to Python 3, rather than Python 3 
syntax with C semantics, then str is the right choice.

Note that UTF-8 encoding/decoding is quite fast. Especially so for 
ASCII. Also, DNs and attribute names tend to be short; I'd be surprised 
if you found the overhead is even measurable in a real-world application.

> IMO there should be a low-level, bytes-only module like python-ldap and
> the "user-friendly" part (for whatever definition of role "user") should
> be done in higher-level wrapper modules. >
> Another point of view: Would you expect a low-level DNS module to handle
> Unicode decoding/encoding? Note that despite hostname conventions in the
> DNS protocol labels are OctetString.

A low-level DNS module should stick to specification.
Similarly, in Python 3, "text with a known encoding" is usually 
represented as "str", so in the (simple) cases LDAP specs specify the 
encoding, that's what I argue for the wrapper should use.

-- 
Petr Viktorin