[issue25937] DIfference between utf8 and utf-8 when i define python source code encoding.

Sat Dec 26 17:05:17 EST 2015

Marc-Andre Lemburg added the comment:

On 26.12.2015 22:46, STINNER Victor wrote:
> 
> In Python, there are multiple implementations of the utf-8 codec with many
> shortcuts. I'm not surprised to see bugs depending on the exact syntax of
> the utf-8 codec name. Maybe we need to share even more code to normalize
> and compare codec names. (I think that py3 is better than py2 on this part.)

There's only one implementation (the one in unicodeobject.c), which is used
directly or via the wrapper in the encodings package, but there
are a few shortcuts to bypass the codec registry scattered around
the code since UTF-8 is such a commonly used codec.

In the case in question, the codec registry should trigger decoding
via the encodings package (rather than going directly to C APIs),
so will eventually end up using the same code. I wonder why this does not
trigger the exception.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue25937>
_______________________________________