[Python-3000] Unicode IDs -- why NFC? Why allow ligatures?

Tue Jun 5 04:37:31 CEST 2007

Ligatures, such as Ĳ and ĳ (unicode 0x0132, 0x0133) are considered
acceptable identifier characters unless explicitly tailored out.
(They appear in both ID and XID)

Do we really want this, or should we assume that ĳ and ij should be
equivalent?  If so, then we need to enforce this somehow.

To me, this suggests that we should use the NFKD form.  Examples at
http://www.unicode.org/reports/tr15/tr15-28.html show that only the
Decomposition forms split ﬁ (ligature 0xFB01) into the constituents f
and i.  Kompatibility form is needed to merge characters that are "the
same" except for some presentational quirk, such as being
superscripted or half-width.

The PEP assumes NFC, but I haven't really understood why, unless that
is required for compatibility with other systems (in which case, it
should be made explicit).

-jJ