windows active directory ldap output encoding

Michael Ströder michael at stroeder.com
Tue Jul 8 04:52:13 EDT 2008


jo3c wrote:
> Im trying to get some information out of a windows sever 2003 chinese
> active directory system
> so let's say encoding is probably big5 or utf-8

The Unicode encoding of LDAP attributes with syntax Directory String is 
always UTF-8 (e.g. attributes 'cn', 'sn', 'givenName' or 'displayName').

> what im doing is simliar to ldapsearch in shell with my python script
> using python ldap module
> 
> the result is not the correct encoding..

What exactly did you expect?

>  'cn': ['\xe6\xb1\x9f\xe6\x9f\x8f\xe5\xa3\x95'],

 >>> unicode('\xe6\xb1\x9f\xe6\x9f\x8f\xe5\xa3\x95','utf-8')
u'\u6c5f\u67cf\u58d5'

I cannot tell whether this Unicode string of length 3 is correct since I 
cannot read Chinese and I probably don't have the necessary fonts 
installed. At least it decodes as UTF-8 which is correct at the LDAP level.

>  'displayName': ['\xe6\xb1\x9f\xe6\x9f\x8f\xe5\xa3\x95'],

 >>> unicode('\xe6\xb1\x9f\xe6\x9f\x8f\xe5\xa3\x95','utf-8')
u'\u6c5f\u67cf\u58d5'

Maybe you should provide the original Unicode string (e.g. in Python 
syntax) and tell us how you store that into your AD server. Note that 
the tools used to maintain AD are also part of the game.

Ciao, Michael.



More information about the Python-list mailing list