windows active directory ldap output encoding
Michael Ströder
michael at stroeder.com
Tue Jul 8 04:52:13 EDT 2008
jo3c wrote:
> Im trying to get some information out of a windows sever 2003 chinese
> active directory system
> so let's say encoding is probably big5 or utf-8
The Unicode encoding of LDAP attributes with syntax Directory String is
always UTF-8 (e.g. attributes 'cn', 'sn', 'givenName' or 'displayName').
> what im doing is simliar to ldapsearch in shell with my python script
> using python ldap module
>
> the result is not the correct encoding..
What exactly did you expect?
> 'cn': ['\xe6\xb1\x9f\xe6\x9f\x8f\xe5\xa3\x95'],
>>> unicode('\xe6\xb1\x9f\xe6\x9f\x8f\xe5\xa3\x95','utf-8')
u'\u6c5f\u67cf\u58d5'
I cannot tell whether this Unicode string of length 3 is correct since I
cannot read Chinese and I probably don't have the necessary fonts
installed. At least it decodes as UTF-8 which is correct at the LDAP level.
> 'displayName': ['\xe6\xb1\x9f\xe6\x9f\x8f\xe5\xa3\x95'],
>>> unicode('\xe6\xb1\x9f\xe6\x9f\x8f\xe5\xa3\x95','utf-8')
u'\u6c5f\u67cf\u58d5'
Maybe you should provide the original Unicode string (e.g. in Python
syntax) and tell us how you store that into your AD server. Note that
the tools used to maintain AD are also part of the game.
Ciao, Michael.
More information about the Python-list
mailing list