[I18n-sig] Big5 Codecs
Tom Emerson
tree@basistech.com
Tue, 31 Oct 2000 23:33:59 -0500 (EST)
Frank J.S. Chen writes:
> What do you mean "round-trip"? If a big5 code point is undefined, it
> still has a corrosponding Unicode code point, but nothing in BIG5 encoding
> string. This Python table is post-handled by myself to fit with the
> proposal.
We're talking across purposes here. There are codepoints in Big 5
that, in the Unicode mapping table, map to U+FFFD. For example, what
does your table map Big 5 0xA2CE to in Unicode? The problem is that
0xA4CA and 0xA2CE can map to U+5345. But if you see U+5345 which of
these Big 5 code points do you map to?
Round tripping means that I can convert 0xA2CE to Unicode and back to
Big 5 and get 0xA2CE out. Similarly, I can convert Big 5 0xA4CA to
Unicode and back to Big 5 and get 0xA4CA. With the unicode
consortium's mapping tables, you do not get that behavior. So how to
you handle it? And what is the source for your Big5 <-> Unicode
mapping values? CMEX? Microsoft? Where?
> > EUDC are the End-User Defined Character region, the 3rd level of Big
[...]
>
> That's a problem!
Indeed it is, a serious one.
-tree
--
Tom Emerson Basis Technology Corp.
Zenkaku Language Hacker http://www.basistech.com
"Beware the lollipop of mediocrity: lick it once and you suck forever"