Could you verify this, Oh Great Unicode Experts of the Python-List?

Joshua Landau joshua at landau.ws
Sun Aug 11 07:59:09 EDT 2013


On 11 August 2013 12:14, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> On Sun, 11 Aug 2013 10:44:40 +0100, Joshua Landau wrote:
>
>> On 11 August 2013 10:09, Steven D'Aprano
>> <steve+comp.lang.python at pearwood.info> wrote:
>>> The reason some accented letters have single code point forms is to
>>> support legacy charsets; the reason some only exist as combining
>>> characters is due to the combinational explosion. Some languages allow
>>> you to add up to five or six different accent on any of dozens of
>>> different letters. If each combination needed its own unique code
>>> point, there wouldn't be enough code points. For bonus points, if there
>>> are five accents that can be placed in any combination of zero or more
>>> on any of four characters, how many code points would be needed?
>>
>> 52?
>
> More than double that.
>
> Consider a single character. It can have 0 to 5 accents, in any
> combination. Order doesn't matter, and there are no duplicates, so there
> are:
>
> 0 accent: take 0 from 5 = 1 combination;
> 1 accent: take 1 from 5 = 5 combinations;
> 2 accents: take 2 from 5 = 5!/(2!*3!) = 10 combinations;
> 3 accents: take 3 from 5 = 5!/(3!*2!) = 10 combinations;
> 4 accents: take 4 from 5 = 5 combinations;
> 5 accents: take 5 from 5 = 1 combination
>
> giving a total of 32 combinations for a single character. Since there are
> four characters in this hypothetical language that take accents, that
> gives a total of 4*32 = 128 distinct code points needed.

I didn't see "four characters", and I did (1 + 5 + 10) * 2 and came up
with 52...
Maybe I should get more sleep.



More information about the Python-list mailing list