[issue14850] The inconsistency of codecs.charmap_decode

Fri May 18 16:46:36 CEST 2012

New submission from Serhiy Storchaka <storchaka at gmail.com>:

codecs.charmap_decode behaves differently with native and user string as decode table.

>>> import codecs
>>> print(ascii(codecs.charmap_decode(b'\x00', 'replace', '\uFFFE')))
('\ufffd', 1)
>>> class S(str): pass
... 
>>> print(ascii(codecs.charmap_decode(b'\x00', 'replace', S('\uFFFE'))))
('\ufffe', 1)

It's because charmap decoder (function PyUnicode_DecodeCharmap in Objects/unicodeobject.c) uses different algorithms for exact strings and for other.

We need to fix it? If yes, what should return `codecs.charmap_decode(b'\x00', 'replace', {0:'\uFFFE'})`? What should return `codecs.charmap_decode(b'\x00', 'replace', {0:0xFFFE})`?

----------
components: Interpreter Core
messages: 161054
nosy: storchaka
priority: normal
severity: normal
status: open
title: The inconsistency of codecs.charmap_decode
type: behavior
versions: Python 2.7, Python 3.2, Python 3.3

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue14850>
_______________________________________