Grapheme clusters, a.k.a.real characters

Chris Angelico rosuav at gmail.com
Wed Jul 19 06:59:39 EDT 2017


On Wed, Jul 19, 2017 at 7:53 PM, Marko Rauhamaa <marko at pacujo.net> wrote:
> Here's a proposal:
>
>    * introduce a building (predefined) class Text
>
>    * conceptually, a Text object is a sequence of "real" characters
>
>    * you can access each "real" character by its position in O(1)
>
>    * the "real" character is defined to be a integer computed as follows
>      (in pseudo-Python):
>
>       string = the NFC normal form of the real character as a string
>       rc = 0
>       shift = 0
>       for codepoint in string:
>           rc |= ord(codepoing) << shift
>           shift += 6
>       return rc
>
>     * t[n] evaluates to an integer

A string could consist of 1 base character and N-1 combining
characters. Can you still access those combined characters in constant
time?

ChrisA



More information about the Python-list mailing list