Codecs for ISO 8859-11 (Thai) and 8859-16 (Romanian)

"Martin v. Löwis" martin at v.loewis.de
Wed Jul 28 18:08:41 EDT 2004


Richard Brodie wrote:
>>ISO-8859-11 is actually very difficult to implement, as it is unclear
>>whether the characters \x80..\x9F are assigned in this character set
>>or not. In fact, it is unclear whether the character set contains
>>even C0.
> 
> 
> That seems like a very fine distinction to me; the Unicode mapping tables
> are the same for those points as in ISO-8859-1, so what's the difference?

For ISO-8859-1, I believe the standard actually says that those code
points are C1. For ISO-8859-11, you can find various statements in the
net, some claiming that it includes C1, and some claiming that it
doesn't. Somebody would actually have to take a look at ISO-8859-11 to
find out what is the case.

The issue is complicated by two facts:
- many sources indicate that ISO-8859-11 is derived by taking TIS-620,
   and adding NBSP into 0xa0. Now, it seems quite clear that TIS-620 does
   *not* include C1.
- some sources indicate certain restrictrions wrt. to control functions,
   eg. in

    http://www.nectec.or.th/it-standards/iso8859-11/

   which says "control functions are not used to create composite graphic
   symbols from two or more graphic characters (see 6). "
   I don't know what this means, especially as section 6 does not talk
   about control functions. Section 7 says that any control functions
   are out of scope of ISO 8859, which I believe is factually incorrect.

Regards,
Martin



More information about the Python-list mailing list