accessing individual characters in unicode strings
Christian Heimes
lists at cheimes.de
Sat Apr 12 08:48:02 EDT 2008
Peter Robinson schrieb:
> Dear list
> I am at my wits end on what seemed a very simple task:
> I have some greek text, nicely encoded in utf8, going in and out of a
> xml database, being passed over and beautifully displayed on the web.
> For example: the most common greek word of all 'kai' (or και if your
> mailer can see utf8)
> So all I want to do is:
> step through this string a character at a time, and do something for
> each character (actually set a width attribute somewhere else for each
> character)
As John already said: UTF-8 ain't unicode. UTF-8 is an encoding similar
to ASCII or Latin-1 but different in its inner workings. A single
character may be encoded by up to 6 bytes.
I highly recommend Joel's article on unicode:
The Absolute Minimum Every Software Developer Absolutely, Positively
Must Know About Unicode and Character Sets (No Excuses!)
http://www.joelonsoftware.com/articles/Unicode.html
Christian
More information about the Python-list
mailing list