Unicode characters

Paul Johnston paul.johnston at manchester.ac.uk
Mon Sep 4 09:39:36 EDT 2006


Hi
I have a string which I convert into a list then read through it
printing its glyph and numeric representation

#-*- coding: utf-8 -*-

thestring = "abcd"
thelist = list(thestring)

for c in thelist:
     print c,
     print ord(c)

Works fine for latin characters but when I put in a unicode character
a two byte character gives me two characters. For example an arabic
alef returns

*  216
* 167

( the first asterix is the empty set symbol the second a double "s")

Putting in sequential characters i.e. alef, beh, teh mabuta, gives me
sequential listings i.e.
216  167
216  168
216  169 
So it is reading the correct details.


Is there anyway to get the c in the for loop to recognise it is
reading a multiple byte character.
I have followed the info in PEP 0263 and am using Python 2.4.3 Build
12 on a Windows box  within Eclipse 3.2.0 and Python plugins 1.2.2

Cheers Paul



More information about the Python-list mailing list