[python-ldap] ANN: python-ldap 2.5.2

Tue Nov 21 05:38:31 EST 2017

On 11/21/2017 01:13 AM, Michael Ströder wrote:
> Michael Ströder wrote:
>> Even in my higher-level code I often e.g. treat DNs or similar values
>> opaque which are internally sent or received through LDAP controls.
> 
> IIRC while there are likely no issues in our comfortable Western Latin
> character world Unicode has some interesting corner-cases. So
> decoding/re-encoding might not lead to the same results. Frankly I have
> next-to-zero knowledge about all the Unicode Normalization Forms.

Unicode Normalization forms are orthogonal to encoding/decoding. From 
the python-ldap point of view, normalization works at the level of the 
data, not its representation.

Choosing to encode/decode doesn't bring any *new* normalization issues 
-- you can do normalization on an UTF-8 encoded bytestring.
Here's an example diagram:

"ﬁ" (one "character")  --- KC normalization -->  "fi" (two "characters")
    ^                                               ^
    | UTF-8 encode/decode                           |  UTF-8
    v                                               v
bytes([239, 172, 129]) --- KC normalization --> bytes([102, 105])

You can leave normalizing attribute names or DNs entirely to the 
application developers (as it's done now, with bytestrings).
Or you might choose to validate normal forms, or even auto-normalize, 
but that would be a separate, new feature. (And I don't think it would 
be a terribly useful feature for python-ldap.)

-- 
Petr Viktorin