[I18n-sig] error handling in charmap-based codecs

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Wed, 20 Dec 2000 22:22:17 +0100


> This is because I wanted to avoid having to put a huge number of 
> mappings to None into the codec dictionaries. This would have
> caused the codec modules and dictionaries to become much larger
> than acceptable for the standard distribution. 

I can't see the problem. If KeyError means "character not in the
target character set", then why exactly would you have to put mappings
to None into the codec dictionaries? Can you please give an example of
a mapping that would need to be changed?

> > I can't see any reason for defaulting to *Latin-1*.
> 
> See above. The encodings using the charmap codec are usually
> only minor modifications of Latin-1.

I see, but I don't see. Let's take koi8_r.py as an example. It has a
complete mapping for the range 128..255, the rest (0..127) is intended
as a 1:1 mapping. I can't see a problem writing

decoding_map = codecs.identity_dictionary(range(0,128))
decoding_map.update({

	0x0080: 0x2500,	# 	BOX DRAWINGS LIGHT HORIZONTAL
	0x0081: 0x2502,	# 	BOX DRAWINGS LIGHT VERTICAL
...
})

where codecs.identity_dictionary is defined as

def identity_dictionary(rng):
    res = {}
    for i in rng:res[i]=i
    return res

That will produce somewhat larger dictionaries once a codec is *used*,
but it won't change the distribution significantly.

> Huh ? The solution is simple: you only have to add mappings to None
> as appropriate. There's no need to change the codec.

So how can I correct the koi8_r codec without changing the C code?

Regards,
Martin