PyArg_ParseTuple and Unicode

Mon Oct 22 16:13:23 EDT 2001

On Thu, 18 Oct 2001, Scottie wrote:

> I  am confused here, but this is not uncommon.  I am trying to get an
> extension to handle both normal strings and unicode.  From my initial
> reading of the PyArg_ParseTuple document, I thought the following
> would work:
>
> if( PyArg_ParseTuple(arg, "...u#...", ...) ) {
>     ...Unicode ops...
> }
> else if( PyArg_ParseTuple(arg, "...s#...", ...) ) {
>     ...plain string ops...
> }
>
>
> My understanding of "u" and "u#" was that they would fail on non-
> unicode input (while "s" and "s#" pass both along).  The behavior I
> see is different: "u#" gives me a pointer to the base of a vanilla
> string, but divides the length by two.
>
> Is this problem behavior in 2.2a4, or does the document need to
> explain this better, or is there something I am just not understanding?
> Or, of course, some combination of the three.

It's because the internal representation of Unicode strings in Python is,
strangely enough, Unicode (UCS-2), which uses wchar_t as its unit type, not
char, and wchar_t is twice as large as a char.

-- 
Ignacio Vazquez-Abrams  <ignacio at openservices.net>

   "As far as I can tell / It doesn't matter who you are /
    If you can believe there's something worth fighting for."
       - "Parade", Garbage