[issue18624] Add alias for iso-8859-8-i which is the same as iso-8859-8

Marc-Andre Lemburg report at bugs.python.org
Sat Aug 3 17:33:59 CEST 2013


Marc-Andre Lemburg added the comment:

On 02.08.2013 16:37, R. David Murray wrote:
> 
> I got the impression from what I read that -e included additional control sequences, but perhaps I misunderstood and that only meant that the data stream was expected to *use* additional control sequences but the control codes themselves are part of the base codec?
> 
> I'm specifically thinking of this statement from the linked reference:
> 
> "Because HTML uses the Unicode bidirectionality algorithm, conforming documents encoded using ISO 8859-8 must be labeled as "ISO-8859-8-i". Explicit directional control is also possible with HTML, but cannot be expressed with ISO 8859-8, so "ISO-8859-8-e" should not be used."
> 
> The "cannot be expressed" seems to imply there are differences in the codec.

No, not really. After some more research, I found that the -i and
-e suffixes are defined in RFC 1556:

http://tools.ietf.org/html/rfc1556

At the codec level, these encodings are all the same. The suffixes
define whether or not to interpret some of their control characters
with respect to bidi text when visualizing the text.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue18624>
_______________________________________


More information about the Python-bugs-list mailing list