Grapheme clusters, a.k.a.real characters

Marko Rauhamaa marko at pacujo.net
Wed Jul 19 08:13:33 EDT 2017


Chris Angelico <rosuav at gmail.com>:

> On Wed, Jul 19, 2017 at 7:53 PM, Marko Rauhamaa <marko at pacujo.net> wrote:
>> Here's a proposal:
>>
>>    * introduce a building (predefined) class Text
>>
>>    * conceptually, a Text object is a sequence of "real" characters
>>
>>    * you can access each "real" character by its position in O(1)
>>
>>    * the "real" character is defined to be a integer computed as follows
>>      (in pseudo-Python):
>>
>>       string = the NFC normal form of the real character as a string
>>       rc = 0
>>       shift = 0
>>       for codepoint in string:
>>           rc |= ord(codepoing) << shift
>>           shift += 6
>>       return rc
>>
>>     * t[n] evaluates to an integer
>
> A string could consist of 1 base character and N-1 combining
> characters. Can you still access those combined characters in constant
> time?

Yes.


Marko



More information about the Python-list mailing list