[Python-Dev] Unicode charmap decoders slow

M.-A. Lemburg mal at egenix.com
Tue Oct 4 22:29:36 CEST 2005


Walter Dörwald wrote:
> Am 04.10.2005 um 04:25 schrieb jepler at unpythonic.net:
> 
> 
>>As the OP suggests, decoding with a codec like mac-roman or  
>>iso8859-1 is very
>>slow compared to encoding or decoding with utf-8.  Here I'm working  
>>with 53k of
>>data instead of 53 megs.  (Note: this is a laptop, so it's possible  
>>that
>>thermal or battery management features affected these numbers a  
>>bit, but by a
>>factor of 3 at most)
>>
>>$ timeit.py -s "s='a'*53*1024; u=unicode(s)" "u.encode('utf-8')"
>>1000 loops, best of 3: 591 usec per loop
>>$ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('utf-8')"
>>1000 loops, best of 3: 1.25 msec per loop
>>$ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('mac-roman')"
>>100 loops, best of 3: 13.5 msec per loop
>>$ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('iso8859-1')"
>>100 loops, best of 3: 13.6 msec per loop
>>
>>With utf-8 encoding as the baseline, we have
>>    decode('utf-8')      2.1x as long
>>    decode('mac-roman') 22.8x as long
>>    decode('iso8859-1') 23.0x as long
>>
>>Perhaps this is an area that is ripe for optimization.
> 
> 
> For charmap decoding we might be able to use an array (e.g. a tuple  
> (or an array.array?) of codepoints instead of dictionary.
> 
> Or we could implement this array as a C array (i.e. gencodec.py would  
> generate C code).

That would be a possibility, yes.

Note that the charmap codec was meant as faster replacement
for the old string transpose function. Dictionaries are used
for the mapping to avoid having to store huge (largely empty)
mapping tables - it's a memory-speed tradeoff.

Of course, a C version could use the same approach as
the unicodedatabase module: that of compressed lookup
tables...

	http://aggregate.org/TechPub/lcpc2002.pdf

genccodec.py anyone ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 04 2005)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::


More information about the Python-Dev mailing list