[I18n-sig] CJKCodecs 1.0b1 is released

Hye-Shik Chang perky@i18n.org
Sun, 13 Jul 2003 04:33:35 +0900


On Sat, Jul 12, 2003 at 09:14:11PM +0200, M.-A. Lemburg wrote:
> Hye-Shik Chang wrote:
> >And, I created utf-8 and utf-16 codec for cjkcodecs just for fun.
> >I shipped them because they are somewhat faster than Python's equivalents.
> 
> That's interesting. How did you achieve the speedups ? The
> Python codecs for these are already rather well optimized.
> 

Ahh. Sorry for incorrect statement. After my some tests, I found
Python's codecs are lots faster than CJKCodecs's for .encode() and
.decode() functions. (2x ~ 4x) CJKCodecs's codecs were faster than
Python's for StreamReader/Writers only. (by similar ratio)

> >(StreamReader benchmarks with a usual 10Kbyte chinese text)
> >(all values are in iterates/sec)
> >
> >            Python  CJKCodecs
> >read(16)    14      187
> >read(256)   221     1645
> >read(512)   468     1990
> >readline    361     921
> >readlines   785     1193
> >
> >They are not so big and don't replace Python's codecs by default.
> >(distributed as commented out on cjkcodecs/aliases.py)
> >So, I think they are not so useless comparing to their size.
> 
> Ah, I think I know what's causing this: you are measuring
> Python function calls (.read() and readlines() for UTF-8/16
> are Python functions implemented in codecs.py) against
> C type methods.

Agreed.

I'm considering removing utf-{8,16} from 1.0 release and leave
utf-7 only. :)


Regards,
    Hye-Shik =)