[issue24339] iso6937 encoding missing

Julien report at bugs.python.org
Sun Nov 13 17:11:01 EST 2016


Julien added the comment:

Hi John, thanks for your contribution,

Looks like your implementation is missing some codepoints, like "\t":

    >>> print("\t".encode(encoding='iso6937'))                                                                                     
    [...]
    UnicodeError: encoding with 'iso6937' codec failed (UnicodeError: Unacceptable utf-8 character)

Probably due to the "range(0x20, "…, why `0x20`?

You're having problems to decode multibytes sequences as you're not having the `else: … result += chr(c[0])` in this case. So typically decoding `\xc2\x20` will raise a `KeyError` as `\x20` is _not_ in your decoding table.

Also, please conform your contribution to the PEP8: you're missing spaces after comas and you're sometime indenting with 8 spaces instead of 4.

I implemented a simple checker based on glibc localedata, it show clearly your decoding problems step by step, and should be easily extended to check for your encoding function too, see attachment. It uses the ISO6937 found typically in the locales debian package or in an 'apt-get sourcee glibc'.

----------
nosy: +sizeof
Added file: http://bugs.python.org/file45478/check_iso6937.py

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue24339>
_______________________________________


More information about the Python-bugs-list mailing list