[Python-3000] Unicode identifiers (Was: sets in P3K?)

Hye-Shik Chang hyeshik at gmail.com
Sun Apr 30 15:46:57 CEST 2006


On 4/29/06, Guido van Rossum <guido at python.org> wrote:
> On 4/28/06, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> > Guido van Rossum wrote:
> > >> I was hoping to propose a PEP on non-ASCII identifiers some
> > >> day; that would (of course) include a requirement that the
> > >> standard library would always be restricted to ASCII-only
> > >> identifiers as a style-guide.
> > >
> > > IMO communication about code becomes much more cumbersome if there are
> > > non-ASCII letters in identifiers, and the rules about what's a letter,
> > > what's a digit, and what separates two identifiers become murky.
> >
> > It depends on the language you use to communicate. In English,
> > it is certainly cumbersome to talk about Chinese identifiers.
> > OTOH, I believe it is cumbersome to communicate about English
> > identifiers in Chinese, either, because the speakers might
> > not even know what the natural-language concept behind the
> > identifiers is, and because they can't pronounce the identifier.
>
> True; but (and I realize we're just pitting our beliefs against each
> other here) I believe that Chinese computer users are more likely to
> be able (and know how) to type characters in the Latin alphabet than
> Western programmers are able to type Chinese. For example, I notice
> that baidu.cn (a Chinese search engine) spells its own name (a big
> brand in China) using the Latin alphabet. I expect that Chinese users
> are used to typing "baidu.cn" in their browser's search bar, rather
> than Chinese characters.

The example is a quite corner case.  CJK users can't use their
own languages in URL because MSIE doesn't support IDNA yet.
Even the most of ISPs of the countries hijacks DNS queries
and forwards to install a ActiveX control that handles their
original scripts (hanzi or hangul).

And there is an another practical problem in romanized identifiers.
Because their romanization method isn't quite inconsistent among
users though it's standardized.  For example, a Korean word
meaning "maintenance" is "unyeong" in standard method.  But
Korean people writes it as "woonyoung", "unyoung", "oonyeong"
or even "unyeong" etc.  This would make it NameError-prone and
build a big barrier for children to learn a complex standard
romanization system first to learn Python.


Hye-Shik


More information about the Python-3000 mailing list