[XML-SIG] Character encodings and expat

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Mon, 30 Oct 2000 23:38:47 +0100


> But there is a private use area in the BMP as well... and if you
> plan to write round-trip safe codecs for corporate character sets,
> then you'll have to use these to make the transfer safe.

Well, you can't make round-trip encoding safe for them - that is the
very nature of the private use area. If convert set A to Unicode,
using the private map, then convert to set B, and back from there, you
likely lose.

If there are "official" mappings between some corporate's character
set and Unicode, then I'd expect all converters that support the
corporate character set also to treat the private use area in the same
way.

If there are no official mappings published by the corporation, then
you are better of using the platform converters on the corporation's
operating system. Those will definitely get the private use area
right; the ones provided by Python in a cross-platform cross-vendor
way might not.

Regards,
Martin