PyArg_ParseTuple and Unicode
Ignacio Vazquez-Abrams
ignacio at openservices.net
Mon Oct 22 16:13:23 EDT 2001
On Thu, 18 Oct 2001, Scottie wrote:
> I am confused here, but this is not uncommon. I am trying to get an
> extension to handle both normal strings and unicode. From my initial
> reading of the PyArg_ParseTuple document, I thought the following
> would work:
>
> if( PyArg_ParseTuple(arg, "...u#...", ...) ) {
> ...Unicode ops...
> }
> else if( PyArg_ParseTuple(arg, "...s#...", ...) ) {
> ...plain string ops...
> }
>
>
> My understanding of "u" and "u#" was that they would fail on non-
> unicode input (while "s" and "s#" pass both along). The behavior I
> see is different: "u#" gives me a pointer to the base of a vanilla
> string, but divides the length by two.
>
> Is this problem behavior in 2.2a4, or does the document need to
> explain this better, or is there something I am just not understanding?
> Or, of course, some combination of the three.
It's because the internal representation of Unicode strings in Python is,
strangely enough, Unicode (UCS-2), which uses wchar_t as its unit type, not
char, and wchar_t is twice as large as a char.
--
Ignacio Vazquez-Abrams <ignacio at openservices.net>
"As far as I can tell / It doesn't matter who you are /
If you can believe there's something worth fighting for."
- "Parade", Garbage
More information about the Python-list
mailing list