[I18n-sig] Big5 Codecs

Tom Emerson tree@basistech.com
Tue, 31 Oct 2000 14:39:41 -0500 (EST)

Frank J.S. Chen writes:
 > > a) What source did you use for the mapping table?
 > It follows the proposal issued by M.A. Lemburg.
 > BIG5 encoding can map to Unicode encoding and reversely.

But the Unicode Consortium's mapping table does not round-trip Big 5
--- so where did you get the table?

 > There are Level 1 and Level 2 in BIG5, so I define them apart.
 > This table is complete, but I just make a small test, not well-tested
 > indeed. 

I have a few megabytes of Big Five encoded text --- I'll test it
out. ;-)

 > > b) How do you handle EUDC code-points?
 > What is EUDC code point? I cannot find this field name in
 > the BMP layout.

EUDC are the End-User Defined Character region, the 3rd level of Big
5. Several groups, including HKUST, the Hong Kong government, and the
Taiwan military define characters in the 3rd region. Other Big 5
extensions, such as ETen, also use this block.

EUDC is divided into three segments: 0xFA40 -- 0xFEFE, 0x8E40 --
0xA0FE, and 0x8140 -- 0x8DFE.


Tom Emerson                                          Basis Technology Corp.
Zenkaku Language Hacker                            http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"