Multibyte Character Surport for Python

Chris Liechti cliechti at gmx.net
Thu May 9 17:03:55 EDT 2002


huaiyu at gauss.almadan.ibm.com (Huaiyu Zhu) wrote in
news:slrnadlmm2.5kg.huaiyu at gauss.almadan.ibm.com: 
> Out of curiosity: If a character is two bytes, what would len()
> report?  If s is a unicode string with wide characters, would list(s)
> be made of characters or bytes?  Would that be different under the
> current situation, or the PEP 263, or under Stephen's proposal?  Would
> it change depending on how the unicode is encoded?

we have an interactive console:
>>> len(unicode("hello"))
5

len gives you the number of characters no matter how many bytes are needed 
to represent them.

>>> list(unicode("hello"))
[u'h', u'e', u'l', u'l', u'o']

so you get a list of unicode characters.
 
> A list of such simple questions and answers for various proposals
> would help many more people to understand the relevant PEPs.

i think most of that get's clear when you play around with the current 
python and its unicode handling so that it does not need a special mention.

chris

-- 
Chris <cliechti at gmx.net>




More information about the Python-list mailing list