[I18n-sig] Big5 Codecs

Tom Emerson tree@basistech.com
Tue, 31 Oct 2000 23:33:59 -0500 (EST)


Frank J.S. Chen writes:
 > What do you mean "round-trip"? If a big5 code point is undefined, it 
 > still has a corrosponding Unicode code point, but nothing in BIG5 encoding
 > string. This Python table is post-handled by myself to fit with the
 > proposal.

We're talking across purposes here. There are codepoints in Big 5
that, in the Unicode mapping table, map to U+FFFD. For example, what
does your table map Big 5 0xA2CE to in Unicode? The problem is that
0xA4CA and 0xA2CE can map to U+5345. But if you see U+5345 which of
these Big 5 code points do you map to?

Round tripping means that I can convert 0xA2CE to Unicode and back to
Big 5 and get 0xA2CE out. Similarly, I can convert Big 5 0xA4CA to
Unicode and back to Big 5 and get 0xA4CA. With the unicode
consortium's mapping tables, you do not get that behavior. So how to
you handle it? And what is the source for your Big5 <-> Unicode
mapping values? CMEX? Microsoft? Where?

 > > EUDC are the End-User Defined Character region, the 3rd level of Big
 [...]
 > 
 > That's a problem!

Indeed it is, a serious one.

    -tree

-- 
Tom Emerson                                          Basis Technology Corp.
Zenkaku Language Hacker                            http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"