[Python-3000] Unicode strings, identifiers, and import

"Martin v. Löwis" martin at v.loewis.de
Thu May 17 15:49:31 CEST 2007


> Does the tokenizer do this for all string literals, too? Otherwise you
> could still get surprises with things like x.foo vs. getattr(x,
> "foo"), if the name foo were normalized but the string "foo" were not.

No. If you use a string literal, chances are very high that you put
NFC into your source code file (if it's not UTF-8, most codecs will
produce NFC naturally; if it is UTF-8, it depends on your editor).

If you get the attribute name from elsewhere, it's a design choice
of who should perform the normalization. One could specify that
builtin getattr does that, or one could require that the application
does it in cases where the strings aren't guaranteed to be in NFC.

The only case where I know of a software that explicitly changes
the normalization, and not to NFC, is OSX, which uses NFD on
disk.

Regards,
Martin


More information about the Python-3000 mailing list