[Python-Dev] Adding Japanese Codecs to the distro

Hye-Shik Chang perky@fallin.lv
Thu, 16 Jan 2003 20:38:55 +0900


On Thu, Jan 16, 2003 at 11:05:55AM +0100, Martin v. L?wis wrote:
> "M.-A. Lemburg" <mal@lemburg.com> writes:
> 
> > Thoughts ?
> 
> I'm in favour of adding support for Japanese codecs, but I wonder
> whether we shouldn't incorporate the C version of the Japanese codecs
> package instead, despite its size.

And, the most important merit that C version have but Pure version
doesn't is sharing library texts inter processes. Most modern OSes can
share them and C version is even smaller than Python version in case of
KoreanCodecs 2.1.x (on CVS)

Here's process status on FreeBSD 5.0/i386 with Python 2.3a1(of 2003-01-15)
system.

USER    PID %CPU %MEM   VSZ  RSS  TT  STAT STARTED      TIME COMMAND
perky 56713  0.0  1.2  3740 3056  p3  S+    8:11PM   0:00.08 python
   : python without any codecs
perky 56739  6.3  5.7 15376 14728  p3  S+    8:17PM   0:04.02 python
   : python with python.cp949 codec
perky 56749  0.0  1.2  3884 3196  p3  S+    8:20PM   0:00.06 python
   : python with c.cp949 codec


alice(perky):/usr/pkg/lib/python2.3/site-packages/korean% size _koco.so
   text    data     bss     dec     hex filename
 122861    1844      32  124737   1e741 _koco.so

On C codec, processes shares 122861 bytes on system-wide and consumes only
1844 bytes each, besides on Pure codec consumes 12 Mega bytes each. This
must concerned very seriously for launching time of have "# encoding: euc-jp"
or something CJK encodings.

> I would also suggest that it might be more worthwhile to expose
> platform codecs, which would give us all CJK codecs on a number of
> major platforms, with a minimum increase in the size of the Python
> distribution, and with very good performance.

KoreanCodecs is tested on {Free,Net,Open}BSD, Linux, Solaris, HP-UX,
Windows{95,98,NT,2000,XP}, Cygwin without any platform #ifdef's.
I sure that any CJK codecs can be ported into any platforms that Python is
ported.




Regards,

    Hye-Shik =)