[Python-Dev] Unicode charmap decoders slow

Hye-Shik Chang hyeshik at gmail.com
Thu Oct 6 05:11:06 CEST 2005


On 10/6/05, M.-A. Lemburg <mal at egenix.com> wrote:
> Hye-Shik, could you please provide some timeit figures for
> the fastmap encoding ?
>

(before applying Walter's patch, charmap decoder)

% ./python Lib/timeit.py -s "s='a'*53*1024; e='iso8859_10';
u=unicode(s, e)" "s.decode(e)"
100 loops, best of 3: 3.35 msec per loop

(applied the patch, improved charmap decoder)

% ./python Lib/timeit.py -s "s='a'*53*1024; e='iso8859_10';
u=unicode(s, e)" "s.decode(e)"
1000 loops, best of 3: 1.11 msec per loop

(the fastmap decoder)

% ./python Lib/timeit.py -s "s='a'*53*1024; e='iso8859_10_fc';
u=unicode(s, e)" "s.decode(e)"
1000 loops, best of 3: 1.04 msec per loop

(utf-8 decoder)

% ./python Lib/timeit.py -s "s='a'*53*1024; e='utf_8'; u=unicode(s,
e)" "s.decode(e)"
1000 loops, best of 3: 851 usec per loop

Walter's decoder and the fastmap decoder run in mostly same way.
So the performance difference is quite minor.  Perhaps, the minor
difference came from the existence of wrapper function on each codecs;
the fastmap codec provides functions usable as Codecs.{en,de}code
directly.

(encoding, charmap codec)

% ./python Lib/timeit.py -s "s='a'*53*1024; e='iso8859_10';
u=unicode(s, e)" "u.encode(e)"
100 loops, best of 3: 3.51 msec per loop

(encoding, fastmap codec)

% ./python Lib/timeit.py -s "s='a'*53*1024; e='iso8859_10_fc';
u=unicode(s, e)" "u.encode(e)"
1000 loops, best of 3: 536 usec per loop

(encoding, utf-8 codec)

% ./python Lib/timeit.py -s "s='a'*53*1024; e='utf_8'; u=unicode(s,
e)" "u.encode(e)"
1000 loops, best of 3: 1.5 msec per loop

If the encoding optimization can be easily done in Walter's approach,
the fastmap codec would be too expensive way for the objective because
we must maintain not only fastmap but also charmap for backward
compatibility.

Hye-Shik


More information about the Python-Dev mailing list