Unicode problem in ucs4

Mon Mar 23 03:41:10 EDT 2009

On Mar 23, 6:18 pm, abhi <abhigyan_agra... at in.ibm.com> wrote:

[snip]
> Hi Mark,
>      Thanks for the help. I tried PyUnicode_AsWideChar() but I am
> getting the same result i.e. only the first letter.
>
> sample code:
>
> #include<Python.h>
>
> static PyObject *unicode_helper(PyObject *self,PyObject *args){
>         PyObject *sampleObj = NULL;
>         wchar_t *sample = NULL;
>         int size = 0;
>
>       if (!PyArg_ParseTuple(args, "O", &sampleObj)){
>                 return NULL;
>       }
>
>          // use wide char function
>       size = PyUnicode_AsWideChar(databaseObj, sample,
> PyUnicode_GetSize(databaseObj));

What is databaseObj???  Copy/paste the *actual* code that you compiled
and ran.

>       printf("%d chars are copied to sample\n", size);
>       wprintf(L"database value after unicode conversion is : %s\n",
> sample);
>       return Py_BuildValue("");
>
> }
>
> static PyMethodDef funcs[]={{"unicodeTest",(PyCFunction)
> unicode_helper,METH_VARARGS,"test ucs2, ucs4"},{NULL}};
>
> void initunicodeTest(void){
>         Py_InitModule3("unicodeTest",funcs,"");
>
> }
>
> This prints the following when input value is given as "test":
> 4 chars are copied to sample
> database value after unicode conversion is : t

[presuming littleendian] The ucs4 string will look like "\t\0\0\0e
\0\0\0s\0\0\0t\0\0\0" in memory. I suspect that your wprintf is
grokking only 16-bit doodads -- "\t\0" is printed and then "\0\0" is
end-of-string. Try your wprintf on sample[0], ..., sample[3] in a loop
and see what you get. Use bog-standard printf to print the hex
representation of each of the 16 bytes starting at the address sample
is pointing to.