Unicode 7

Fri May 2 21:15:49 EDT 2014

On Sat, May 3, 2014 at 10:58 AM, Rustom Mody <rustompmody at gmail.com> wrote:
> You think this
>
>>>> (ﬁne, fine) = (1,2) # and no issue about it
>
> is fine?

Not sure which part you're objecting to. Are you saying that this
should be an error:

>>> a, a = 1, 2 # simple ASCII identifier used twice

or that Python should take the exact sequence of codepoints, rather
than normalizing?

Python 3.5.0a0 (default:6a0def54c63d, Mar 26 2014, 01:11:09)
[GCC 4.7.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> ﬁne = 1
>>> vars()
{'__package__': None, '__spec__': None, '__doc__': None, 'fine': 1,
'__loader__': <class '_frozen_importlib.BuiltinImporter'>,
'__builtins__': <module 'builtins' (built-in)>, '__name__':
'__main__'}

As regards normalization, I would be happy with either "keep it
exactly as you provided" or "normalize according to <insert Unicode
standard normalization here>", as long as it's consistent. It's like
what happens with SQL identifiers: according to the standard, an
unquoted name should be uppercased, but some databases instead
lowercase them. It doesn't break code (modulo quoted names, not
applicable here), as long as it's consistent.

(My reading of PEP 3131 is that NFKC is used; is that what's
implemented, or was that a temporary measure and/or something for Py2
to consider?)

ChrisA