can't get utf8 / unicode strings from embedded python

David M. Cotter me at davecotter.com
Sun Aug 25 18:25:07 EDT 2013


fair enough.  I can provide further proof of strangeness.
here is my latest script:  this is saved on disk as a UTF8 encoded file, and when viewing as UTF8, it shows the correct characters.

==================
# -*- coding: utf-8 -*- 
import time, kjams, kjams_lib

def log_success(msg, successB, str):
	if successB:
		print msg + " worked: " + str
	else:
		print msg + "failed: " + str

def do_test(orig_str):
	cmd_enum = kjams.enum_cmds()
	
	print "---------------"
	print "Original string: " + orig_str
	print "converting..."

	oldstr = orig_str;
	newstr = kjams_lib.do_command(cmd_enum.kScriptCommand_Unicode_Test, oldstr)
	log_success("first", oldstr == newstr, newstr);
	
	oldstr = unicode(orig_str, "UTF-8")
	newstr = kjams_lib.do_command(cmd_enum.kScriptCommand_Unicode_Test, oldstr)
	newstr = unicode(newstr, "UTF-8")
	log_success("second", oldstr == newstr, newstr);
	
	oldstr = unicode(orig_str, "UTF-8")
	oldstr.encode("UTF-8")
	newstr = kjams_lib.do_command(cmd_enum.kScriptCommand_Unicode_Test, oldstr)
	newstr = unicode(newstr, "UTF-8")
	log_success("third", oldstr == newstr, newstr);

	print "---------------"
	
def main():
	do_test("frøânçïé")
	do_test("控件")

#-----------------------------------------------------
if __name__ == "__main__":
	main()

==================
and the latest results:

   20: ---------------
   20: Original string: frøânçïé
   20: converting...
   20: first worked: frøânçïé
   20: second worked: frøânçïé
   20: third worked: frøânçïé
   20: ---------------
   20: ---------------
   20: Original string: 控件
   20: converting...
   20: first worked: 控件
   20: second worked: 控件
   20: third worked: 控件
   20: ---------------

now, given the C++ source code, this should NOT work, given that i'm doing some crazy re-coding of the bytes.

so, you see, it does not matter whether i pass "unicode" strings or regular "strings", they all translate to the same, weird macroman.  

for completeness, here is the C++ code that the script calls:

===================
			case kScriptCommand_Unicode_Test: {
				pyArg = iterP.NextArg_OrSyntaxError();
				
				if (pyArg.get()) {
					SuperString str = pyArg.GetAs_String();
					
					resultObjP = PyString_FromString(str);
				}
				break;
			}

===================



More information about the Python-list mailing list