Grapheme clusters, a.k.a.real characters

Marko Rauhamaa marko at pacujo.net
Sat Jul 15 10:01:21 EDT 2017


Steve D'Aprano <steve+python at pearwood.info>:

> On Sat, 15 Jul 2017 05:50 pm, Marko Rauhamaa wrote:
>> I might want random access to the "Grapheme clusters, a.k.a.real
>> characters".
>
> That would be nice to have, but the truth is that for most coders,
> Unicode code points are the low-hanging fruit that get you 95% of the
> way, and for many applications that's "close enough".

I think "close enough" is actually dangerous. We shouldn't encourage
that practice.

> Support for the Unicode grapheme breaking algorithm would get you
> probably 90% of the rest of the way. And then some sort of
> configurable system where defaults were based on the locale would
> probably get you a fairly complete grapheme-based text library.

Yes, that kind of a text class would be useful.

> I'm interested in such a thing. That's why I pointed out the issue on
> the bug tracker, to try to garner interest in it. As far as I can
> tell, you seem to be more interested in cheap point scoring, digs
> against Unicode, and an insistence that UTF-8 is better than strings
> (which doesn't even make sense).

It does seem to me UTF-8 is a better waiting position than strings.
Strings give you more trouble while not truly solving any problems.


Marko



More information about the Python-list mailing list