[Python-Dev] unicode/string asymmetries

Fredrik Lundh fredrik@pythonware.com
Wed, 9 Jan 2002 14:16:30 +0100


jack wrote:
> > struct.pack("32s", wu(u"VS_VERSION_INFO"))
>
> Why would you have to specify the encoding if what you want is the normal,
> standard encoding?

because there is no such thing as a "normal, standard
encoding" for a unicode character, just like there's no
"normal, standard encoding" for an integer (big endian,
little endian?), a floating point number (ieee, vax, etc),
a screen coordinate, etc.

as soon as something gets too large to store in a byte,
there's always more than one obvious way to store it ;-)

> Or, to rephrase the question, why do C programmers only
> have to s/char/wchar_t/

because they're tend to prefer to quickly get the wrong
result? ;-)

C makes no guarantees about wchar_t, so Python's Unicode
type doesn't rely on it (it can use it, though: you can check
the HAVE_USABLE_WCHAR_T macro to see if it's the same
thing; see PyUnicode_FromWideChar for an example).

in the Mac case, it might be easiest to configure things so
that HAVE_USABLE_WCHAR_T is always true, and assume
that Py_UNICODE is the same thing as wchar_t.  (checking
this in the module init function won't hurt, of course)

but you cannot rely on that if you're writing truly portable
code.

</F>