[I18n-sig] Japanese commentary on the Pre-PEP (1 of 4)

Toby Dickenson tdickenson@geminidataloggers.com
Tue, 20 Feb 2001 14:22:43 +0000


On Tue, 20 Feb 2001 19:16:07 +0900, Brian Takashi Hooper
<brian@tomigaya.shibuya.tokyo.jp> wrote:

>Hi there, this is Brian Hooper in Tokyo.

>The proposed character model thread seems to have simmered down so I
>don't know how interested people will be in this, but I gathered a few
>comments about the Pre-PEP from the Japanese Python mailing list, and
>translated the responses - I think there were some very good points
>brought up, and I'd like to add the messages I received (with the
>permission of their authors) to the discussion. =20

Thankyou for this effort.


>For example, given:
>
>PyObject *simple(PyObject *o, PyObject *args)
>{
>	char *filename;
>	if (!PyArg_ParseTuple(args, "s", filename))
>		return NULL;
>	File *f =3D fopen(filename, "w");
>	if (!f)
>		return NULL;
>	fprintf(f, "spam");
>	fclose(f);
>	Py_INCREF(Py_None);
>	return Py_None;
>}
>
>from Python you can write:
>
>sample.simple("????????")
>
>and it will work as is in almost any platform and language environment.

If those ??? are anything other than ASCII characters, then it doesnt
work *predictably* today. (assuming the requirement that the file name
is correct when viewed using the platforms native file browser)

>Well, we could take care when writing our Python scripts only to use =
strings
>in such a way that PyArg_ParseTuple() does not cause an error.

Sticking with the fopen example; I had assumed it is desirable to get
an error if a script tries to create a file whose name contains
japanse characters, on a filesystem that does not support that.

>Use byte strings
>
>Instead of using a character string, we could call our function as
>
>sample.simple(b"????????")
>
>and everything then works fine.  However, if we always have to use byte
>strings when interacting with extension libraries, then we haven't =
really
>achieved any real improvement in terms of internationalization, and =
there's
>not much point to implementing the PEP in that case...

If this is a legacy extension library then a byte string is all it
expects. You could call this function as

sample.simple(u"????????".encode('encoding_expected_by_sample_dot_simple'=
))

I agree we need to provide a simpler interface to new extensions.


>PyObject *simple(PyObject *o, PyObject *args)
>{
>	Py_UNICODE *filename;
>	char native_filename[MAX_FILE];
>=09
>	if (!PyArg_ParseTuple(args, "u", filename))
>		return NULL;
>
>#IF SJIS
>	/* SJIS??? */
>#ELSE
>	/* EUC??? */
>#ENDIF
>=09
>	FILE *f =3D fopen(....)
>
>I don't think anyone really wants to write code like this.

I think those ifdefs could be replaced by one call to PyUnicode_Encode

>Furthermore, adding this kind of support isn't likely to be provided by
>European or American programmers, since the coincidence of the =
ISO-8859-1
>with the <=3D 255 range of Unicode makes such explicit support =
unnecessary
>for applications which only use Latin-1 or ASCII.  (So: Non-American/
>European programmers will have to add support for libraries they want to
>use)

As a European native-English speaker, I dont think this is true so
long as we preserve the ASCII default encoding. An application that
stores latin-1 data in a mix of unicode and plain strings will quickly
trigger an exception (as soon as a unicode string mixes with a plain
string containing a non-ASCII byte).

A useful counterexample may be Mark Hammond's extensions for
supporting win32 and com. They have always included explicit support
for automatic encoding of unicode parameters on platforms where win32
uses 8-bit strings, and automatic decoding of plain strings when used
with COM, which is always unicode.

Toby Dickenson
tdickenson@geminidataloggers.com