[Python-3000] Unicode IDs -- why NFC? Why allow ligatures?

Jim Jewett jimjjewett at gmail.com
Tue Jun 5 19:14:59 CEST 2007


On 6/5/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> > I'd love to get rid of full-width ASCII and halfwidth kana (via
> > compatibility decomposition).  Native Japanese speakers often use them
> > interchangably with the "proper" versions when correcting typos and
> > updating numbers in a series.  Ugly, to say the least.  I don't think
> > that native Japanese would care, as long as the decomposition is done
> > internally to Python.

> Not sure what the proposal is here. If people say "we want the PEP do
> NFKC", I understand that as "instead of saying NFC, it should say
> NFKC", which in turn means "all identifiers are converted into the
> normal form NFKC while parsing".

I would prefer that.

> With that change, the full-width ASCII characters would still be
> allowed in source - they just wouldn't be different from the regular
> ones anymore when comparing identifiers.

I *think* that would be OK; so long as they mean the same thing, it is
just a quirk like using a different font.  I am slightly concerned
that it might mean "string as string" and "string as identifier" have
different tests for equality.

> Another option would be to require that the source is in NFKC already,
> where I then ask again what precisely that means in presence of
> non-UTF source encodings.

My own opinion is that it would be reasonable to put those in NFKC
form as part of the parser's internal translation to unicode.  (But I
agree that it makes sense to do that for all encodings, if it is done
for any.)

-jJ


More information about the Python-3000 mailing list