[I18n-sig] Japanese commentary on the Pre-PEP (1 of 4)
Toby Dickenson
tdickenson@geminidataloggers.com
Tue, 20 Feb 2001 14:22:43 +0000
On Tue, 20 Feb 2001 19:16:07 +0900, Brian Takashi Hooper
<brian@tomigaya.shibuya.tokyo.jp> wrote:
>Hi there, this is Brian Hooper in Tokyo.
>The proposed character model thread seems to have simmered down so I
>don't know how interested people will be in this, but I gathered a few
>comments about the Pre-PEP from the Japanese Python mailing list, and
>translated the responses - I think there were some very good points
>brought up, and I'd like to add the messages I received (with the
>permission of their authors) to the discussion. =20
Thankyou for this effort.
>For example, given:
>
>PyObject *simple(PyObject *o, PyObject *args)
>{
> char *filename;
> if (!PyArg_ParseTuple(args, "s", filename))
> return NULL;
> File *f =3D fopen(filename, "w");
> if (!f)
> return NULL;
> fprintf(f, "spam");
> fclose(f);
> Py_INCREF(Py_None);
> return Py_None;
>}
>
>from Python you can write:
>
>sample.simple("????????")
>
>and it will work as is in almost any platform and language environment.
If those ??? are anything other than ASCII characters, then it doesnt
work *predictably* today. (assuming the requirement that the file name
is correct when viewed using the platforms native file browser)
>Well, we could take care when writing our Python scripts only to use =
strings
>in such a way that PyArg_ParseTuple() does not cause an error.
Sticking with the fopen example; I had assumed it is desirable to get
an error if a script tries to create a file whose name contains
japanse characters, on a filesystem that does not support that.
>Use byte strings
>
>Instead of using a character string, we could call our function as
>
>sample.simple(b"????????")
>
>and everything then works fine. However, if we always have to use byte
>strings when interacting with extension libraries, then we haven't =
really
>achieved any real improvement in terms of internationalization, and =
there's
>not much point to implementing the PEP in that case...
If this is a legacy extension library then a byte string is all it
expects. You could call this function as
sample.simple(u"????????".encode('encoding_expected_by_sample_dot_simple'=
))
I agree we need to provide a simpler interface to new extensions.
>PyObject *simple(PyObject *o, PyObject *args)
>{
> Py_UNICODE *filename;
> char native_filename[MAX_FILE];
>=09
> if (!PyArg_ParseTuple(args, "u", filename))
> return NULL;
>
>#IF SJIS
> /* SJIS??? */
>#ELSE
> /* EUC??? */
>#ENDIF
>=09
> FILE *f =3D fopen(....)
>
>I don't think anyone really wants to write code like this.
I think those ifdefs could be replaced by one call to PyUnicode_Encode
>Furthermore, adding this kind of support isn't likely to be provided by
>European or American programmers, since the coincidence of the =
ISO-8859-1
>with the <=3D 255 range of Unicode makes such explicit support =
unnecessary
>for applications which only use Latin-1 or ASCII. (So: Non-American/
>European programmers will have to add support for libraries they want to
>use)
As a European native-English speaker, I dont think this is true so
long as we preserve the ASCII default encoding. An application that
stores latin-1 data in a mix of unicode and plain strings will quickly
trigger an exception (as soon as a unicode string mixes with a plain
string containing a non-ASCII byte).
A useful counterexample may be Mark Hammond's extensions for
supporting win32 and com. They have always included explicit support
for automatic encoding of unicode parameters on platforms where win32
uses 8-bit strings, and automatic decoding of plain strings when used
with COM, which is always unicode.
Toby Dickenson
tdickenson@geminidataloggers.com