[I18n-sig] JapaneseCodecs 1.4.8 released

Tom Emerson tree@basistech.com
Thu, 5 Sep 2002 20:30:46 -0400


Martin v. Loewis writes:
> I can agree on the mapping of 0x815f; it maps to U+FF3C on glibc. I'm
> confused about 0x5c; glibc maps it to U+00A5 (YEN SIGN).

This is a complex topic:

In JIS-Roman and pure ShiftJIS, 0x5C encodes the Yen sign, so
transcoding from pure ShiftJIS to Unicode means that 0x5C maps to
U+00A5.

On Windows, 0x5C serves a double life as both the pathname separator
*and* as the Yen sign in their version of ShiftJIS, CP932. This means
that that the price of Murakami Haruki's 'Noruei no Mori' (part 1) on
Amazon.co.jp right now is \467 (i.e., 0x5C 0x34 0x36 0x37).

It also means that 'C:\foo\bar' displays with Yen signs instead of
back slashes.

Hence the mapping from CP932 to Unicode is ambiguous: do you map 0x5C
to U+005C or U+00A5? It depends on context: the transcoder doesn't know.

You also need to know whether the file came from a "pure" ShiftJIS
system, such as earlier versions of Mac OS, or a CP932 system, since
the interpretation of 0x5C may or may not be ambiguous.

The "usual" recommendation is to map 0x5C to U+00A5 when dealing with
pure ShiftJIS and to U+005C when dealing with CP932.

There is a similar problem with 0x7E where it maps to different things
in ShiftJIS and CP932.

The same problem also occurs in the Microsoft Korean code page, where
0x5C is either a path separator (mapping to U+005C) or the Won sign
(mapping to U+20A9).

> Also, where did you get the mapping from the Consortium? I can't find
> a current table, but
>
> http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/SHIFTJIS.TXT

You answer your own question, sort of. The Consortium no longer
maintains the East Asian mapping tables (with the exception of JIS X
0213, GB 18030, and HKSCS, where mappings are supplied by the
Japanese, Chinese, and Kong Kong SAR governments, respectively). This
has been a point of contention between me and the UTC, but they don't
want to and I don't have time.

-- 
Tom Emerson                                          Basis Technology Corp.
Software Architect                                 http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"