[python-ldap] ANN: python-ldap 2.5.2
Petr Viktorin
pviktori at redhat.com
Tue Nov 21 05:38:31 EST 2017
On 11/21/2017 01:13 AM, Michael Ströder wrote:
> Michael Ströder wrote:
>> Even in my higher-level code I often e.g. treat DNs or similar values
>> opaque which are internally sent or received through LDAP controls.
>
> IIRC while there are likely no issues in our comfortable Western Latin
> character world Unicode has some interesting corner-cases. So
> decoding/re-encoding might not lead to the same results. Frankly I have
> next-to-zero knowledge about all the Unicode Normalization Forms.
Unicode Normalization forms are orthogonal to encoding/decoding. From
the python-ldap point of view, normalization works at the level of the
data, not its representation.
Choosing to encode/decode doesn't bring any *new* normalization issues
-- you can do normalization on an UTF-8 encoded bytestring.
Here's an example diagram:
"fi" (one "character") --- KC normalization --> "fi" (two "characters")
^ ^
| UTF-8 encode/decode | UTF-8
v v
bytes([239, 172, 129]) --- KC normalization --> bytes([102, 105])
You can leave normalizing attribute names or DNs entirely to the
application developers (as it's done now, with bytestrings).
Or you might choose to validate normal forms, or even auto-normalize,
but that would be a separate, new feature. (And I don't think it would
be a terribly useful feature for python-ldap.)
--
Petr Viktorin
More information about the python-ldap
mailing list