[Python-Dev] fun with unicode, part 1

Fredrik Lundh Fredrik Lundh" <effbot@telia.com
Tue, 2 May 2000 09:55:49 +0200


Tim Peters wrote:
> [Guido asks good questions about how Windows deals w/ Unicode =
filenames,
>  last Thursday, but gets no answers]

you missed Finn Bock's post on how Java does it.

here's another data point:

Tcl uses a system encoding to convert from unicode to a suitable
system API encoding, and uses the following approach to figure out
what that one is:

    windows NT/2000:
        unicode (use wide api)

    windows 95/98:
        "cp%d" % GetACP()
        (note that this is "cp1252" in us and western europe,
        not "iso-8859-1")
 =20
    macintosh:
        determine encoding for fontId 0 based on (script,
        smScriptLanguage) tuple. if that fails, assume
        "macroman"

    unix:
        figure out the locale from LC_ALL, LC_CTYPE, or LANG.
        use heuristics to map from the locale to an encoding
        (see unix/tclUnixInit). if that fails, assume "iso-8859-1"

I propose adding a similar mechanism to Python, along these lines:

    sys.getdefaultencoding() returns the right thing for windows
    and macintosh, "iso-8859-1" for other platforms.

    sys.setencoding(codec) changes the system encoding.  it's
    used from site.py to set things up properly on unix and other
    non-unicode platforms.

</F>