Making IDLE3 ignore non-BMP characters instead of throwing an exception?

eryk sun eryksun at gmail.com
Mon Oct 17 19:23:02 EDT 2016


On Mon, Oct 17, 2016 at 8:35 PM, Random832 <random832 at fastmail.com> wrote:
> On Mon, Oct 17, 2016, at 14:20, eryk sun wrote:
>> You can patch print() to transcode non-BMP characters as surrogate
>> pairs. For example:
>>
>> On Windows this should allow printing non-BMP characters such as
>> emojis (e.g. U+0001F44C).
>
> I thought there was some reason this wouldn't work with tk, or else
> tkinter would do it already?

I don't know whether it causes problems elsewhere in Tk, but it has no
problem passing along a UTF-16 string to Windows. For example, see the
following with a breakpoint set on TextOut [1]:

    >>> root = tkinter.Tk()
    >>> w = tkinter.Label(root, text='test: \ud83d\udc4c')
    >>> w.pack()

    Breakpoint 0 hit
    GDI32!TextOutW:
    00007fff`6d6c61d0 ff2532a10200    jmp
        qword ptr [GDI32!_imp_TextOutW (00007fff`6d6f0308)]
        ds:00007fff`6d6f0308={gdi32full!TextOutW (00007fff`6a3143c0)}

    0:000> du @r9
    000000d6`dfdeea50  "test: .."

    0:000> dw @r9 l8
    000000d6`dfdeea50  0074 0065 0073 0074 003a 0020 d83d dc4c

The lpString parameter (x64 register r9) is the label's text,
including the surrogate pair "\ud83d\udc4c" (i.e. U+0001F44C).

[1]: https://msdn.microsoft.com/en-us/library/dd145133:



More information about the Python-list mailing list