[Python-Dev] unicode/string asymmetries

Jack Jansen jack@oratrix.nl
Wed, 09 Jan 2002 15:55:11 +0100


> jack wrote:
> > > struct.pack("32s", wu(u"VS_VERSION_INFO"))
> >
> > Why would you have to specify the encoding if what you want is the normal,
> > standard encoding?
> 
> because there is no such thing as a "normal, standard
> encoding" for a unicode character, just like there's no
> "normal, standard encoding" for an integer (big endian,
> little endian?), a floating point number (ieee, vax, etc),
> a screen coordinate, etc.

What I here call the "normal, standard encoding" is what the C library 
supports. Your analogy of integers and floats is exactly the right one: even 
though there are many ways to represent an integer what you get back from 
PyArg_Parse("l") is a standard C "long".

Maybe the confusion is that whereever I have said "unicode" in the past I 
should have said "wchar_t". I know there are, in theory, many encodings of 
Unicode but in practice there is only one that I'm interested in most of the 
time and that's wchar_t, because that's what all my APIs want.

So, I would like PyArg_Parse/Py_BuildValue formats that are symmetric to "s", 
"s#" and "z" but that return wchar_t strings and that work with both 
UnicodeObjects and StringObjects.
--
- Jack Jansen        <Jack.Jansen@oratrix.com>        http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma Goldman -