[Python-3000] Support for PEP 3131

Sat May 26 18:39:57 CEST 2007

On 5/26/07, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Jim Jewett writes:
>  > So long as we allow tailoring, I think the maximal set should be
>  > generous -- and I don't see any reason to pre-exclude anything
>  > outside ASCII.

> Cf characters?  Are we admitting "stupid bidi tricks", too?<wink>

If Tomer needs them.

Seriously, I wouldn't put Cf characters in the default accepted
tabled.  (But remember that *I* would limit that default to ASCII.)
Tomer suggested that bidi characters might be needed to get Hebrew and
Arabic working correctly.  Given that someone has already decided to
use Arabic (or even Arabic presentational forms), he or she is better
placed to decide whether Cf characters are needed too.

> But I'll tell you what my reason is: we want to be in a position to
> avoid prohibiting previously acceptable characters wherever possible.

Agreed; but in my opinion, the decision to allow those characters is
local; the decision to rescind them would therefore also be local.

We do want to avoid retracting characters from the default set.  (And
again, if we restrict that default set to ASCII, we'll be fine.)

>  > There are people who like to use names like "Program Files" or
>  > "Summary of Results.Apr-3-2007 version 2.xls"; I expect the same will
>  > be true of identifiers.  So long as the punctuation is not ASCII, we
>  > might as well let them.

> Why not let them use ASCII punctuation, as long as it's not Python
> syntax?

Because there really isn't any unreserved ASCII punctuation.  One
issue with @decorators was that it caused some hassle for (reasonably
well-known) third-party tools which had been using the "@" character.

It would make perfect sense to me if the consensus French table
excluded guillemots.  But I figure that should be their decision.

>  > The other committees say to exclude certain scripts, like
>  >  Linear B and Ogham.

(I should probably have noted that Linear B and Ogham are not used by
any modern language; I *think* the excluded scripts were all for
things that would not represent anyone's primary script or mother
tongue.)

>  > If unicode comes out with a new revision, the new characters should
>  > probably be allowed; I don't want a situation where users of Cham or
>  > Lepcha[1] are told they have to wait another year because their
>  > scripts weren't formally adopted into unicode until after python 3.4.0
>  > was already released.

> Tough call.  I'd say, let's cross that bridge when we come to it.

> In any case there will have to be some mechanism to access a Unicode
> database at either build time or run time.  Let them munge that
> database if they're in a hurry.

I had been thinking of the unicode version as a feature that didn't
change within a python release.  Perhaps that is negotiable?

> Maybe the way to handle this is to allow private-space characters in
> identifiers as an option.  That would be doable with your well-known
> file scheme.  But it's very dangerous across modules.

It turns out that page was out of date; Lepcha and Cham now have code
points which haven't been formally approved, but aren't likely to
change.  Officially, they're still undefined, but using private-space
probably isn't the right answer.  So either we allow these particular
"undefined" characters, or we (for now) disallow Lepcha and Cham.

-jJ