[issue45105] Incorrect handling of unicode character \U00010900

Steven D'Aprano report at bugs.python.org
Sun Sep 5 09:08:09 EDT 2021


Steven D'Aprano <steve+python at pearwood.info> added the comment:

I'm afraid I cannot reproduce the problem.

>>> s = '000𐤀'  # \U00010900
>>> s
'000𐤀'
>>> s[0]
'0'
>>> s[1]
'0'
>>> s[2]
'0'
>>> s[3]
'𐤀'
>>> list(s)
['0', '0', '0', '𐤀']


That is using Python 3.9 in the xfce4-terminal. Which xterm are you using?

I am very confident that it is a bug in some external software, possibly the xterm, possibly the browser or other application where you copied the PHOENICIAN LETTER ALF character from in the first place. It looks like it is related to mishandling of the Right-To-Left character:

>>> unicodedata.bidirectional(s[3])
'R'


Using Firefox, when I attempt to select the text s = '000...' in Max's initial message with the mouse, the selection highlighting jumps around. See the screenshot attached. (selection.png) Depending on how I copy the text, sometimes I get '000 ALF' and sometimes '0 ALF 00' which hints that something is getting confused by the RTL character, possibly the browser, possible the copy/paste clipboard, possibly the terminal. But regardless, I cannot replicate the behaviour you show where list(s) is different from indexing the characters one by one.

It is very common for applications to mishandle mixed RTL and LTR characters, and that can have all sorts of odd display and copy/paste issues.

----------
nosy: +steven.daprano
Added file: https://bugs.python.org/file50260/selection.png

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue45105>
_______________________________________


More information about the Python-bugs-list mailing list