[I18n-sig] error handling in charmap-based codecs

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Fri, 22 Dec 2000 15:18:49 +0100


> Date: Thu, 21 Dec 2000 19:46:56 +0100
> A mapping to None means: this mapping is undefined, so raise an
> exception. If this were the default, then all cpXXX.py would have
> to include all 1-1 mappings explicitely, e.g. 0x0020: 0x0020.
> This would cause the tables to enlarge substantially.

I'm not sure what you mean by "tables". Please have a look at patch
#103002; the actual increase in source code bytes for the Python core
is quite minimal. Some overhead occurs when a codec is imported - it
then adds at most 512 additional keys to the dictionaries that are
actually used.

> To explicitely declare a mapping undefined, you'd have to add
> mappings to None. This is what causes the bug you reported on SF.
> A proper fix would involve adding the relevant mappings to all
> decode maps in the standard codecs.

Why would that be smaller than adding the identity mappings? I hope
you'd fill in the Nones using a loop, not by placing them in source
code. Then the source code change is identical in both solutions. At
run-time, the identity mapping is smaller than the mapping to None,
since you'd need more than 65000 additional entries in each
encoding_map.

[how to correct the koi8-r codec]
> Simple: add the missing mappings to None for the range 0..255.
> The mapping lives in the Python module koi8_r.py -- there's
> no need to touch any C code.

That would be incorrect. u" ".encode("koi8-r") would then give a
UnicodeError, when the result should be " ".

I'm not sure why touching C code is a bad thing - especially when it
is such a small change. There is clearly in error in the C function;
at least three different people have independently noticed the
misbehaviour, and identified that function as the cause. Besides
yourself, I have not seen anybody defending the "feature".

Regards,
Martin