[XML-SIG] Character encodings and expat
M.-A. Lemburg
mal@lemburg.com
Mon, 30 Oct 2000 23:57:10 +0100
"Martin v. Loewis" wrote:
>
> > But there is a private use area in the BMP as well... and if you
> > plan to write round-trip safe codecs for corporate character sets,
> > then you'll have to use these to make the transfer safe.
>
> Well, you can't make round-trip encoding safe for them - that is the
> very nature of the private use area. If convert set A to Unicode,
> using the private map, then convert to set B, and back from there, you
> likely lose.
True. With "round trip" I meant encoding A -> Unicode -> encoding A.
This is often needed in order to do processing on the data and
should be a 1-1 mapping if possible.
> If there are "official" mappings between some corporate's character
> set and Unicode, then I'd expect all converters that support the
> corporate character set also to treat the private use area in the same
> way.
>
> If there are no official mappings published by the corporation, then
> you are better of using the platform converters on the corporation's
> operating system. Those will definitely get the private use area
> right; the ones provided by Python in a cross-platform cross-vendor
> way might not.
Right.
Perhaps the codecs should warn about these conversions by applying
error handling to them (raise exceptions, ignore, replace, etc.) ?!
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/