[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

Thu Feb 24 22:17:42 CET 2011

Ezio Melotti <ezio.melotti at gmail.com> added the comment:

Probably not, but that part should be changed if possible, because is less efficient than the previous version that was allocating only 11 bytes.

The problem here is that the previous versions was only changing/removing chars, whereas this might add spaces too, so the string might get longer. E.g. 'utf8' -> 'utf 8'. The worst case is 'a1a1a1' -> 'a 1 a 1 a 1', and including the trailing \0, the result might end up being twice as long than the original encoding string. It can be fixed returning 0 as soon as the normalized string reaches a fixed threshold (something like 15 chars, depending on the longest normalized encoding name).

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue11303>
_______________________________________