Could you verify this, Oh Great Unicode Experts of the Python-List?

Chris Angelico rosuav at gmail.com
Sun Aug 11 07:45:41 EDT 2013


On Sun, Aug 11, 2013 at 12:14 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> Consider a single character. It can have 0 to 5 accents, in any
> combination. Order doesn't matter, and there are no duplicates, so there
> are:
>
> 0 accent: take 0 from 5 = 1 combination;
> 1 accent: take 1 from 5 = 5 combinations;
> 2 accents: take 2 from 5 = 5!/(2!*3!) = 10 combinations;
> 3 accents: take 3 from 5 = 5!/(3!*2!) = 10 combinations;
> 4 accents: take 4 from 5 = 5 combinations;
> 5 accents: take 5 from 5 = 1 combination
>
> giving a total of 32 combinations for a single character. Since there are
> four characters in this hypothetical language that take accents, that
> gives a total of 4*32 = 128 distinct code points needed.

There's an easy way to calculate it. Instead of the "take N from 5"
notation, simply look at it as a set of independent bits - each of
your accents may be either present or absent. So it's 1<<5
combinations for a single character, which is the same 32 figure you
came up with, but easier to work with in the ridiculous case.

> In reality, Unicode has currently code points U+0300 to U+036F (112 code
> points) to combining characters. It's not really meaningful to combine
> all 112 of them, or even most of 112 of them...

If you *were* to use literally ANY combination, that would be 1<<112
which is... uhh... five billion yottacombinations. Don't bother
working that one out by the "take N" method, it'll take you too long
:)

Oh, and that's 1<<112 possible combining character combinations, so
you then need to multiply that by the number of base characters you
could use....

ChrisA



More information about the Python-list mailing list