[issue45105] Incorrect handling of unicode character \U00010900

Sun Sep 5 09:15:48 EDT 2021

Steven D'Aprano <steve+python at pearwood.info> added the comment:

Eryk Sun said:

> The original string has the Phoenician right-to-left character at index 1, not at index 3.

I think you may be mistaken. In Max's original post, he has

    s = '000X'

where the X is actually the Phoenician ALF character. At least that is how it is displayed in my browser.

(But note that in the Windows terminal, Max has '0X00' instead.)

Max's demonstration code shows a discrepancy between extracting the chars one by one using indexing, and with list. Simulating his error:

    s = '000X'  # X is actually ALF
    list(s)
    # --> returns [0 0 0 X]
    [s[i] for i in range(4)]  # indexing each char one at a time
    # --> returns [0 X 0 0]

I have not yet been able to replicate that reported behaviour.

I agree totally with Eryk Sun that this is probably not a Python bug. He thinks it is displaying the correct behaviour. I think it is probably a browser or xterm bug.

But unless someone can replicate the mismatch between list and indexing, I doubt it is something we can do anything about.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue45105>
_______________________________________