[I18n-sig] JapaneseCodecs 1.4.8 released

Tamito KAJIYAMA kajiyama@grad.sccs.chukyo-u.ac.jp
Fri, 6 Sep 2002 11:05:17 +0900


martin@v.loewis.de (Martin v. Loewis) writes:
| 
| > One addition: the mapping used in Java is also one-to-one so
| > that it may be another candidate.
| 
| That is not true (according to the ICU data). Java maps U+00A5 to
| 0x5c, which it maps back to U+005C.

A test program showed that Java's mapping works as follows:

  0x815f -> U+ff3c -> 0x815f
  0x5c   -> U+005c -> 0x5c
            U+00a5 -> 0x5c

It is not true that Java's mapping is one-to-one.  But both
0x815f and 0x5c show a round-trip, which is what I want to have.
The mapping of U+00a5 to 0x5c seems a fallback.

The test program and its execution result are shown below.  I've
used Sun's J2SE 1.3 on Linux.

$ cat UnicodeTest1.java
class UnicodeTest1 {
    public static void main(String args[]) {
        try {
            byte[] b = { -127, 95, 92 }; /* 0x815f, 0x5c */
            String s = new String(b, "Shift_JIS") + "\u00a5";
            System.out.print("Unicode:  "); dump(s.getBytes("UnicodeBig"));
            System.out.print("Shift_JIS:"); dump(s.getBytes("Shift_JIS"));
        } catch (java.io.UnsupportedEncodingException e) {
            e.printStackTrace();
        }
    }
    public static void dump(byte[] b) {
        for (int i = 0; i < b.length ; i++) {
            String h = "0" + Integer.toHexString(b[i]);
            System.out.print(" " + h.substring(h.length()-2, h.length()));
        }
        System.out.println();
    }
}
$ javac UnicodeTest1.java
$ java UnicodeTest1
Unicode:   fe ff ff 3c 00 5c 00 a5
Shift_JIS: 81 5f 5c 5c
$ 

Regards,

-- 
KAJIYAMA, Tamito <kajiyama@grad.sccs.chukyo-u.ac.jp>