can't get utf8 / unicode strings from embedded python

Sat Aug 24 15:45:37 EDT 2013

On Sat, Aug 24, 2013 at 9:47 AM, David M. Cotter <me at davecotter.com> wrote:
>
> > What _are_ you using?
> i have scripts in a file, that i am invoking into my embedded python within a C++ program.  there is no terminal involved.  the "print" statement has been redirected (via sys.stdout) to my custom print class, which does not specify "encoding", so i tried the suggestion above to set it:
>
> static const char *s_RedirectScript =
>         "import " kEmbeddedModuleName "\n"
>         "import sys\n"
>         "\n"
>         "class CustomPrintClass:\n"
>         "       def write(self, stuff):\n"
>         "               " kEmbeddedModuleName "." kCustomPrint "(stuff)\n"
>         "class CustomErrClass:\n"
>         "       def write(self, stuff):\n"
>         "               " kEmbeddedModuleName "." kCustomErr "(stuff)\n"
>         "sys.stdout = CustomPrintClass()\n"
>         "sys.stderr = CustomErrClass()\n"
>         "sys.stdout.encoding = 'UTF-8'\n"
>         "sys.stderr.encoding = 'UTF-8'\n";
>
>
> but it didn't help.
>
> I'm still getting back a string that is a utf-8 string of characters that, if converted to "macRoman" and then interpreted as UTF8, shows the original, correct string.  who is specifying macRoman, and where, and how do i tell whoever that is that i really *really* want utf8?
> --

If you're running this from a C++ program, then you aren't getting
back characters. You're getting back bytes. If you treat them as
UTF-8, they'll work properly. The only thing wrong is the text editor
you're using to open the file afterwards- since you aren't specifying
an encoding, it's assuming MacRoman. You can try putting the UTF-8 BOM
(it's not really a BOM) at the front of the file- the bytes 0xEF 0xBB
0xBF are used by some editors to identify a file as UTF-8.