[Python-3000] Support for PEP 3131

Stephen J. Turnbull turnbull at sk.tsukuba.ac.jp
Tue May 29 05:57:23 CEST 2007


Greg Ewing writes:

 > Stephen J. Turnbull wrote:

 > > If an English speaker would pronounce the spelling of an English
 > > word "A B C", and an Arabic speaker an Arabic word as "1 2 3",
 > > then *as an identifier* the combination English then Arabic is
 > > spelled "A B C _ 1 2 3".

 > But would an Arabic speaker pronounce the identifier as a whole
 > as "A B C 1 2 3" or "1 2 3 A B C"? That's where I find it all
 > gets very confusing.

Then "unask the question."<wink>  Bidi is *not* context-free; that
question is not properly formulated.  Pragmatically, in a well-formed
Python program in the overwhelming majority of cases you or she will
be in a LTR context, so it will be read "A B C _ 1 2 3".

The ambiguity you have in mind probably is best expressed "what
happens with a single line program?"  Eg, one that appears on the
display like this:

                               ABC_321

True, a native Arabic speaker would surely (absent any context except
her early upbringing) read that "1 2 3 _ A B C".  And I admit, I'd
read it "A B C _ 1 2 3".  That looks like an ambiguity requiring use
of a direction indicator, but it's not.  According to PEP 263 all
Python programs implicitly start in ASCII (otherwise the optional
coding cookie cannot be parsed, and presumably not the optional
shebang, either).

So since the Python programmer (whether natively English-speaking or
Arabic-speaking) starts in state "LTR", she reads the "A" first, not
the "1", and there are no problems.  Of course, you want to be able to
express the identifier that would be spelled out (and represented in
memory!) as "1 2 3 _ A B C", and you can:

                               321_ABC

Since I'm not an Arabic-speaker at all, I can only say I suspect that
Arabic speakers will learn to do this context initialization very
quickly, and to read comments marked at the *end* of the line, rather
than the beginning.  Ie, to an Arabic speaker an Arabic header comment
will feel like this:

                                     This is the Foomatic program. #
                                     It makes passes at compilers. #
                        It is licentiously speaking a GPL program. #

A smart editor should be able to format that:

This is the Foomatic program.                                      #
It makes passes at compilers.                                      #
It is licentiously speaking a GPL program.                         #

It feels weird, but it's not that bad, to me anyway.

Once again, speakers of bidi languages are in a world of pain anyway;
it's reasonable to suppose that this doesn't really make things worse.
There are ambiguities here, and while a naive native speaker might
resolve them differently in ad hoc cases from the above, I doubt
they'd be lucky enough to come up with a consistent interpretation.
On the other hand, humans are *designed* to learn the arbitrary rules
of languages, as children, at least.  Adults who are fortunate enough
to retain enough of that ability to learn to program probably will
have little trouble with this particular arbitrary rule.  At least,
that's my guess, indirectly supported by the Unicode rules for
identifiers which suggests that it is reasonable to prohibit direction
indicators in identifiers.


More information about the Python-3000 mailing list