unicode direction control characters

Random832 random832 at fastmail.com
Tue Jan 2 16:24:39 EST 2018


On Tue, Jan 2, 2018, at 10:36, Robin Becker wrote:
> >> u'\u200e28\u200e/\u200e09\u200e/\u200e1962'
>
> I guess I'm really wondering whether the BIDI control characters have any 
> semantic meaning. Most numbers seem to be LTR.
> 
> If I saw u'\u200f12' it seems to imply that the characters should be displayed 
> '21', but I don't know whether the number is 12 or 21.

No, 200F acts as a single R-L character (like an invisible letter), not an override for adjacent characters (as 202E would). LRM/RLM basically act like an invisible letter of the relevant directionality.

European numerals have "weak" LTR directionality (to allow them to be used as part of e.g. a list of numbers in a sentence written in an RTL language), and don't affect some punctuation marks the same way as letters. I believe the purpose here is to ensure that it displays as 28/09/1962 instead of 1962/09/28 when the surrounding context is right-to-left. For the slash in particular, this was apparently a bug that was fixed in some recent version of unicode (this is mentioned briefly in UTR9, look for "solidus"), so earlier implementations or non-unicode implementations may not have supported it correctly.



More information about the Python-list mailing list