Internal Format (Re: [Python-Dev] Internationalization Toolkit)

Fredrik Lundh fredrik@pythonware.com
Wed, 10 Nov 1999 09:08:06 +0100


Guido van Rossum <guido@CNRI.Reston.VA.US> wrote:
> http://starship.skyport.net/~lemburg/unicode-proposal.txt

Marc-Andre writes:

    The internal format for Unicode objects should either use a Python
    specific fixed cross-platform format <PythonUnicode> (e.g. 2-byte
    little endian byte order) or a compiler provided wchar_t format (if
    available). Using the wchar_t format will ease embedding of Python in
    other Unicode aware applications, but will also make internal format
    dumps platform dependent. 

having been there and done that, I strongly suggest
a third option: a 16-bit unsigned integer, in platform
specific byte order (PY_UNICODE_T).  along all other
roads lie code bloat and speed penalties...

(besides, this is exactly how it's already done in
unicode.c and what 'sre' prefers...)

</F>