[Python-3000] Unicode IDs -- why NFC? Why allow ligatures?

Jim Jewett jimjjewett at gmail.com
Thu Jun 7 01:06:05 CEST 2007


On 6/5/07, Stephen J. Turnbull <stephen at xemacs.org> wrote:

> A scan of the full table for Unicode Version 2.0 ...  'n (Afrikaans), and

I asked a friend who speaks Afrikaans; apparently it is more a word
than a letter.

"""
ʼn is derived from the Dutch word en which means "a" in English. The `
is in place of the e e.g. a woman would translate into "ʼn vrou" It is
used very often as it is an indefinite article. SMS language usually
just uses the n without the apostrophe.
""' -- Tania Adendorff

So it is common, but losing it is already sort of acceptable.  And
that is the strongest endorsement we have seen.

(There were mixed opinions on Technical symbols, and no one has spoken
up yet about the half-dozen Croatian digraphs corresponding to Serbian
Cyrillic.)

There is legitimate disagreement over whether to

(1)  forbid the Kompatibility characters in IDs
(2)  translate them to the canonical equivalents,
(3)  or just leave them alone because ID= should be the same as string=,

but I think dealing with K characters is now a "least of evils"
decision, instead of "we need them for something."

On another note, I have no idea how Martin's name (in the Cc line) ended up as:

"""
 L$(D+S(Bwis"
"""

If I knew, it *might* have a bearing on what sorts of
canonicalizations should be performed, and what sorts of warnings the
parser ought to emit for likely corrupted text.

-jJ


More information about the Python-3000 mailing list