Grapheme clusters, a.k.a.real characters

Marko Rauhamaa marko at pacujo.net
Wed Jul 19 09:42:39 EDT 2017


Chris Angelico <rosuav at gmail.com>:

> Perhaps we don't have the same understanding of "constant time". Or
> are you saying that you actually store and represent this as those
> arbitrary-precision integers? Every character of every string has to
> be a multiprecision integer?

Yes, although feel free to optimize. The internal implementation isn't
important but those "multiprecision" integers are part of an outward
interface. So you could have:

    >>> for c in Text("aq̈u \U0001F64B\U0001F3FF\u200D\u2642\uFE0F"):
    ...     print(c)
    ...
    97
    1895826184
    117
    32
    5152920508016097895476141586773579

(Note, though, that Python3 only has integers, there's no
"multiprecision" about them.)


Marko



More information about the Python-list mailing list