[Python-Dev] RE: Ill-defined encoding for CP875?

M.-A. Lemburg mal@lemburg.com
Mon, 14 May 2001 11:02:19 +0200


Tim Peters wrote:
> 
> [M.-A. Lemburg]
> > ...
> > The "right" thing to do here, is to simply remove cp875
> > from the test for round-tripping.
> 
> I'm relieved you think so, since that's what I already did <wink>.
> 
> > It is not the only encoding which fails this test, but it's not
> > our fault: the codecs were all generated from the original codec
> > maps at the Unicode.org site.
> >
> > If their mappings are broken, we can't do much about it... other
> > than to ignore the error or remove the codec altogether.
> 
> On general principle I don't like either of those -- "in the face of
> ambiguity, refuse the temptation to guess".  It's at least surprising to see
> 
> >>> unicode("?", "cp875").encode("cp875")
> '\xfd'
> >>>
> 
> now, yes?  Would it be better if an ambiguous encoding raised an exception in
> "strict" mode?  That is, a third choice is to alert users when they're
> relying on a broken part of a mapping.

The problem is: which part would raise the exception -- the
encoder or the decoder ?

Here are some more options:

* sort the items before creating the encoding table from the
  decoding one (makes the mapping stable)

* map keys which have multiple mappings in the encoding table
  to None -- this causes their usage to raise an exception
  (undefined mapping)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/