can't get utf8 / unicode strings from embedded python
David M. Cotter
me at davecotter.com
Sun Aug 25 18:25:07 EDT 2013
fair enough. I can provide further proof of strangeness.
here is my latest script: this is saved on disk as a UTF8 encoded file, and when viewing as UTF8, it shows the correct characters.
==================
# -*- coding: utf-8 -*-
import time, kjams, kjams_lib
def log_success(msg, successB, str):
if successB:
print msg + " worked: " + str
else:
print msg + "failed: " + str
def do_test(orig_str):
cmd_enum = kjams.enum_cmds()
print "---------------"
print "Original string: " + orig_str
print "converting..."
oldstr = orig_str;
newstr = kjams_lib.do_command(cmd_enum.kScriptCommand_Unicode_Test, oldstr)
log_success("first", oldstr == newstr, newstr);
oldstr = unicode(orig_str, "UTF-8")
newstr = kjams_lib.do_command(cmd_enum.kScriptCommand_Unicode_Test, oldstr)
newstr = unicode(newstr, "UTF-8")
log_success("second", oldstr == newstr, newstr);
oldstr = unicode(orig_str, "UTF-8")
oldstr.encode("UTF-8")
newstr = kjams_lib.do_command(cmd_enum.kScriptCommand_Unicode_Test, oldstr)
newstr = unicode(newstr, "UTF-8")
log_success("third", oldstr == newstr, newstr);
print "---------------"
def main():
do_test("frøânçïé")
do_test("控件")
#-----------------------------------------------------
if __name__ == "__main__":
main()
==================
and the latest results:
20: ---------------
20: Original string: frøânçïé
20: converting...
20: first worked: frøânçïé
20: second worked: frøânçïé
20: third worked: frøânçïé
20: ---------------
20: ---------------
20: Original string: 控件
20: converting...
20: first worked: 控件
20: second worked: 控件
20: third worked: 控件
20: ---------------
now, given the C++ source code, this should NOT work, given that i'm doing some crazy re-coding of the bytes.
so, you see, it does not matter whether i pass "unicode" strings or regular "strings", they all translate to the same, weird macroman.
for completeness, here is the C++ code that the script calls:
===================
case kScriptCommand_Unicode_Test: {
pyArg = iterP.NextArg_OrSyntaxError();
if (pyArg.get()) {
SuperString str = pyArg.GetAs_String();
resultObjP = PyString_FromString(str);
}
break;
}
===================
More information about the Python-list
mailing list