Ldap module and base64 oncoding

Sun May 26 15:48:38 EDT 2013

Joseph L. Casale wrote:
>> I'm not sure what exactly you're asking for.
>> Especially "is not being interpreted as a string requiring base64 encoding" is
>> written without giving the right context.
>>
>> So I'm just guessing that this might be the usual misunderstandings with use
>> of base64 in LDIF. Read more about when LDIF requires base64-encoding here:
>>
>> http://tools.ietf.org/html/rfc2849
>>
>> To me everything looks right:
>>
>> Python 2.7.3 (default, Apr 14 2012, 08:58:41) [GCC] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> 'ZGV0XDMzMTB3YmJccGc='.decode('base64').decode('utf-8')
>> u'det\\3310wbb\\pg'
>>>>>
>>
>> What do you think is a problem?
> 
> Thanks for the reply. The issues I am sure are in my code, I read the ldif source file and up
> with a values such as 'det\3310wbb\pg' after the base64 encoded entries are decoded.
> 
> The problem I am having is when I add this to an add/mod entry list and write it back out.
> As it does not get re-encoded to base64 the ldif file ends up seeing a text entry with a ^]
> character which if I re-read it with the parser it causes the handle method to break midway
> through the entry dict and so the last half re-appears disjoint without a dn.
> 
> Like I said, I am pretty sure its my poor misunderstanding of decoding and encoding.
> I am using the build from http://www.lfd.uci.edu/~gohlke/pythonlibs/ on a windows
> 2008 r2 server.
> 
> I have re-implemented handle to create a cidict holding all the dn/entry's that are parsed as
> I then perform some processing such as manipulating attribute values in the entry dict. I
> am pretty sure I am breaking things here. The data I am reading is coming from utf-16-le
> encoded files and has Unicode characters as the source directory is globally available, being
> written to in just about every country.

Processing LDIF is one thing, doing LDAP operations another.

LDIF itself is meant to be ASCII-clean. But each attribute value can carry any
byte sequence (e.g. attribute 'jpegPhoto'). There's no further processing by
module LDIF - it simply returns byte sequences.

The access protocol LDAPv3 mandates UTF-8 encoding for Unicode strings on the
wire if attribute syntax is DirectoryString, IA5String (mainly ASCII) or similar.

So if you're LDIF input returns UTF-16 encoded attribute values for e.g.
attribute 'cn' or 'o' or another attribute not being of OctetString or Binary
syntax something's wrong with the producer of the LDIF data.

> Is there a process for manipulating/adding data to the entry dict before I write it out that I
> should adhere to? For example, if I am adding a new attribute to be composed of part of
> another parsed attr for use in a modlist:
> 
>   {'customAttr': ['foo.{}.bar'.format(entry['uid'])]}
> 
> By looking at the value from above, 'det\3310wbb\pg', I gather the entry dict was parsed
> into byte strings. I should have decoded this, where as some of the data is Unicode and
> as such I should have encoded it?

I wonder what the string really is. At least the base64-encoding you provided
before decodes as UTF-8 but I'm not sure whether it's the right sequence of
Unicode code points you're expecting.

>>> 'ZGV0XDMzMTB3YmJccGc='.decode('base64').decode('utf-8')
u'det\\3310wbb\\pg'

I still can't figure out what you're really doing though. I'd recommend to
strip down your operations to a very simple test code snippet illustrating the
issue and post that here.

Ciao, Michael.