Grapheme clusters, a.k.a.real characters

Steve D'Aprano steve+python at pearwood.info
Sun Jul 16 20:59:02 EDT 2017


On Mon, 17 Jul 2017 01:40 am, Rustom Mody wrote:

> On Sunday, July 16, 2017 at 8:10:41 PM UTC+5:30, Rick Johnson wrote:
[...] 
> $ python
> Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 12:22:00)
> [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
> Type "help", "copyright", "credits" or "license" for more information.
>>>> len("á")
> 1
>>>> len("á")
> 2
> 
> Shall we stipulate it to be 1.5? [¿ Maybe 1½ ?]

Please don't feed the trolls. If you have to respond to Ranting Rick, at least
write something sensible that people following this thread might learn from,
instead of encouraging his nonsense.

I don't believe for a second you seriously would like len(some_string) to
return '1½', but just in case anyone is taking that proposal seriously, that
would break backwards compatibility. len() must return an int, not a float, a
complex number, or a string.

If you want to know the length of a string *in bytes*, you have to encode it to
bytes first, using some specific encoding, then call len() on those bytes.

If you want to know the length of a string *in code points*, then just call
len() on the string.

If you want to know the height or width of a string in pixels in some specific
font, see your GUI toolkit.

If you want to know the length of a string in "characters" (graphemes), well,
Python doesn't have a built-in function to do that, or a standard library
solution. Yet.



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.




More information about the Python-list mailing list