can't get utf8 / unicode strings from embedded python

David M. Cotter me at davecotter.com
Sun Aug 25 18:32:52 EDT 2013


i got it!!  OMG!  so sorry for the confusion, but i learned a lot, and i can share the result:

the CORRECT code *was* what i had assumed.  the Python side has always been correct (no need to put "u" in front of strings, it is known that the bytes are utf8 bytes)

it was my "run script" function which read in the file.  THAT was what was "reinterpreting" the utf8 bytes as macRoman (on both platforms).  correct code below:

SuperString		ScPyObject::GetAs_String()
{
	SuperString		str;
	
	if (PyUnicode_Check(i_objP)) {
		ScPyObject		utf8Str(PyUnicode_AsUTF8String(i_objP));
		
		str = utf8Str.GetAs_String();
	} else {
		//	calling "uc" on this means "assume this is utf8"
		str.Set(uc(PyString_AsString(i_objP)));
	}
	
	return str;
}

PyObject*	PyString_FromString(const SuperString& str)
{
	return PyString_FromString(str.utf8Z());
}




More information about the Python-list mailing list