Codecs for ISO 8859-11 (Thai) and 8859-16 (Romanian)
"Martin v. Löwis"
martin at v.loewis.de
Wed Jul 28 18:08:41 EDT 2004
Richard Brodie wrote:
>>ISO-8859-11 is actually very difficult to implement, as it is unclear
>>whether the characters \x80..\x9F are assigned in this character set
>>or not. In fact, it is unclear whether the character set contains
>>even C0.
>
>
> That seems like a very fine distinction to me; the Unicode mapping tables
> are the same for those points as in ISO-8859-1, so what's the difference?
For ISO-8859-1, I believe the standard actually says that those code
points are C1. For ISO-8859-11, you can find various statements in the
net, some claiming that it includes C1, and some claiming that it
doesn't. Somebody would actually have to take a look at ISO-8859-11 to
find out what is the case.
The issue is complicated by two facts:
- many sources indicate that ISO-8859-11 is derived by taking TIS-620,
and adding NBSP into 0xa0. Now, it seems quite clear that TIS-620 does
*not* include C1.
- some sources indicate certain restrictrions wrt. to control functions,
eg. in
http://www.nectec.or.th/it-standards/iso8859-11/
which says "control functions are not used to create composite graphic
symbols from two or more graphic characters (see 6). "
I don't know what this means, especially as section 6 does not talk
about control functions. Section 7 says that any control functions
are out of scope of ISO 8859, which I believe is factually incorrect.
Regards,
Martin
More information about the Python-list
mailing list