Allowing non-ASCII identifiers

Tue Feb 10 11:59:35 EST 2004

John Roth wrote:
 > ...
> I believe that unicode (actually UTF-8) source code files
> are legitimate if you declare them properly in the encoding
> line. In fact, UTF-8 is the example in the documentation.
> 
> I'm all in favor of going to unicode all the way. I'd like to
> have the proper mathematical symbols for logical and set
> operations, as well as integer divide. They're all there in the
> unicode character set, after all; why should we have to
> settle for archaic character restrictions?
Because some of us use archaic systems and/or fonts which are
incapable of displaying such symbols.  Never mind whether we
can read them.

Also, we would have to solve the issue of multiple representations
for the same identifier (normalized identifiers)?  There are four
equivalent representations:

     (u'\N{Latin small letter e with acute}l'
                        u'\N{Latin small letter e with grave}ve')

     (u'\N{Latin small letter e with acute}l'
                        u'e\N{Combining grave accent}ve')

     (u'e\N{Combining acute accent}l'
                        u'\N{Latin small letter e with grave}ve')

     (u'e\N{Combining acute accent}l'
                        u'e\N{Combining grave accent}ve')

Unicode says we should treat these four identically.  Further,
they each have a distinct hash code, so a dictionary will not 
necessarily even try to compare them to find them equal.

-- 
-Scott David Daniels
Scott.Daniels at Acm.Org