[Python-Dev] #pragmas in Python source code

Da Silva, Mike Mike.Da.Silva@uk.fid-intl.com
Wed, 12 Apr 2000 19:37:56 +0100


Java uses ResourceBundles, which are identified by basename + 2 character
locale id (eg "en", "fr" etc).  The content of the resource bundle is
essentially a dictionary of name value pairs.

MS Visual C++ uses pragma code_page(windows_code_page_id) in resource files
to indicate what code page was used to generate the subsequent text.

In both cases, an application would rely on a fixed (7 bit ASCII) subset to
give the well-known key to find the localized text for the current locale.

Any "hardcoded" string literals would be mangled when attempting to display
them using an alternate locale.

So essentially, one could take the view that correct support for
localization is a runtime issue affecting the user of an application, not
the developer.  Hence, myfile.py may contain 8 bit string literals encoded
in  my current windows encoding (1252) but my user may be using Japanese
Windows in code page 932.  All I can guarantee is that the first 128
characters (notwithstanding BACKSLASH) will be rendered correctly - other
characters will be interpreted as half width Katakana or worse.

Any literal strings one embeds in code should be purely for the benefit of
the code, not for the end user, who should be seeing properly localized
text, pulled back from a localized text resource file _NOT_ python code, and
automatically pumped through the appropriate native <--> unicode
translations as required by the code.

So to sum up,
1	Hardcoded strings are evil in source code unless they use the
invariant ASCII (and by extension UTF8) character set.
2	A proper localized resource loading mechanism is required to fetch
genuine localized text from a static resource file (ie not myfile.py).
3	All transformations of 8 bit strings to and from unicode should
explicitly specify the 8 bit encoding for the source/target of the
conversion, as appropriate.
4	Assume that a Japanese / Chinese programmer will find it easier to
code using the invariant ASCII subset than a Western European / American
will be able to read hanzi in source code.

Regards,
Mike da Silva

-----Original Message-----
From: Ka-Ping Yee [mailto:ping@lfw.org]
Sent: Wednesday, April 12, 2000 6:45 PM
To: Fred L. Drake, Jr.
Cc: Python Developers @ python.org
Subject: Re: [Python-Dev] #pragmas in Python source code


On Wed, 12 Apr 2000, Fred L. Drake, Jr. wrote:
>  > Or do we need to separate out two categories of pragmas --
>  > pre-parse and post-parse pragmas?
> 
>   Eeeks!  We don't need too many special forms!  That's ugly!

Eek indeed.  I'm tempted to suggest we drop the multiple-encoding
issue (i can hear the screams now).  But you're right, i've never
heard of another language that can handle configurable encodings
right in the source code.  Is it really necessary to tackle that here?

Gak, what do Japanese programmers do?  Has anyone seen any of that
kind of source code?


-- ?!ng


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://www.python.org/mailman/listinfo/python-dev