can't get utf8 / unicode strings from embedded python

David M. Cotter me at davecotter.com
Fri Aug 23 16:49:23 EDT 2013


note everything works great if i use Ascii, but:

in my utf8-encoded script i have this:

>	print "frøânçïé"

in my embedded C++ i have this:

PyObject*	CPython_Script::print(PyObject *args)
{
	PyObject		*resultObjP	= NULL;
	const char		*utf8_strZ	= NULL;
	
	if (PyArg_ParseTuple(args, "s", &utf8_strZ)) {
		Log(utf8_strZ, false);

		resultObjP = Py_None;
		Py_INCREF(resultObjP);
	}
	
	return resultObjP;
}

Now, i know that my Log() can print utf8 (has for years, very well debugged)

but what it *actually* prints is this:

>	print "frøânçïé"
--> frøânçïé

another method i use looks like this:
>	kj_commands.menu("控件", "同步滑帧", "全局无滑帧")
or
>	kj_commands.menu(u"控件", u"同步滑帧", u"全局无滑帧")

and in my C++ i have:

SuperString		ScPyObject::GetAs_String()
{
	SuperString		str;
	
	if (PyUnicode_Check(i_objP)) {
		#if 1
		//	method 1
		{
			ScPyObject		utf8Str(PyUnicode_AsUTF8String(i_objP));
			
			str = utf8Str.GetAs_String();
		}
		#elif 0
		//	method 2
		{
			UTF8Char		*uniZ = (UTF8Char *)PyUnicode_AS_UNICODE(i_objP);
		
			str.assign(&uniZ[0], &uniZ[PyUnicode_GET_DATA_SIZE(i_objP)], kCFStringEncodingUTF16);
		}
		#else
		//	method 3
		{
			UTF32Vec			charVec(32768); CF_ASSERT(sizeof(UTF32Vec::value_type) == sizeof(wchar_t));
			PyUnicodeObject		*uniObjP = (PyUnicodeObject *)(i_objP);
			Py_ssize_t			sizeL(PyUnicode_AsWideChar(uniObjP, (wchar_t *)&charVec[0], charVec.size()));
			
			charVec.resize(sizeL);
			charVec.push_back(0);
			str.Set(SuperString(&charVec[0]));
		}
		#endif
	} else {
		str.Set(uc(PyString_AsString(i_objP)));
	}
	
	Log(str.utf8Z());
	
	return str;
}


for the string, "控件", i get:
--> 控件

for the *unicode* string, u"控件", Methods 1, 2, and 3, i get the same thing:
--> 控件

okay so what am i doing wrong???



More information about the Python-list mailing list