[Python-Dev] Re: PEP 263 - Defining Python Source Code Encodings

15 Jul 2002 23:16:52 +0200

Guido van Rossum <guido@python.org> writes:

> Yes, but all the non-ASCII has to be represented as Unicode strings.
> I.e. no Latin-1 in 8-bit strings!

Exactly. This might still cause problems for inspect and other
introspective tools.

For ASCII identifiers, I agree that using byte strings is sensible,
for best backwards compatibility.

> Really?  I thought Unicode's isalpha() was built on the Unicode text
> database?

It isn't if it has a "usable wchar_t", see unicodeobject.h:

#if defined(HAVE_USABLE_WCHAR_T) && defined(WANT_WCTYPE_FUNCTIONS)

#include <wctype.h>

#define Py_UNICODE_ISSPACE(ch) iswspace(ch)

...

I was missing the part that it also requires active selection of
wctype functions - that is probably a feature that is never used.  So
it is better than I thought: isletter might vary across builds on the
same platform, but likely never varies in practice.

Regards,
Martin