[issue45105] Incorrect handling of unicode character \U00010900
Eryk Sun
report at bugs.python.org
Sun Sep 5 08:53:10 EDT 2021
Eryk Sun <eryksun at gmail.com> added the comment:
AFAICT, there is no bug here. It's just confusing how Unicode right-to-left characters in the repr() can modify how it's displayed in the console/terminal. Use the ascii() representation to avoid the problem.
> The same behavior does not occur when directly using the unicode point
> ```
> >>> s='000\U00010900'
The original string has the Phoenician right-to-left character at index 1, not at index 3. The "0" number characters in the original have weak directionality when displayed. You can see the reversal with a numeric sequence that's separated by spaces. For example:
s = '123\U00010900456'
>>> print(*s, sep='\n')
1
2
3
𐤀
4
5
6
>>> print(*s)
1 2 3 𐤀 4 5 6
Latin letters have left-to-right directionality. For example:
>>> s = '123\U00010900abc'
>>> print(*s)
1 2 3 𐤀 a b c
You can check the bidirectional property [1] using the unicodedata module:
>>> import unicodedata as ud
>>> ud.bidirectional('\U00010900')
'R'
>>> ud.bidirectional('0')
'EN'
>>> ud.bidirectional('a')
'L'
---
[1] https://en.wikipedia.org/wiki/Unicode_character_property#Bidirectional_writing
----------
nosy: +eryksun
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue45105>
_______________________________________
More information about the Python-bugs-list
mailing list