From Misha.Wolf@reuters.com  Fri Sep  7 19:12:01 2001
From: Misha.Wolf@reuters.com (Misha.Wolf@reuters.com)
Date: Fri, 07 Sep 2001 19:12:01 +0100
Subject: [I18n-sig] Last Call for Papers - 20th Unicode Conference - Jan/Feb 2001 -
 Washington DC
Message-ID: <T55da80f39bc407b7067e4@reuters.com>

>>>>>>>>>>>>>>>>>>>>>>>  Last Call for Papers!  <<<<<<<<<<<<<<<<<<<<<<<

           Twentieth International Unicode Conference (IUC20)
               Unicode and the Web: The Global Connection
                    http://www.unicode.org/iuc/iuc20
                     January 28 - February 1, 2002
                          Washington, DC, USA

>>>>>>>>>>>>>>>>>>>>>>>>  Just 2 weeks to go!  <<<<<<<<<<<<<<<<<<<<<<<<

                  Submissions due: September 21, 2001
                  Notification date: October 12, 2001
                Completed papers due : November 2, 2001
            (in electronic form and camera-ready paper form)

>>>>>>>>>>>>>>>>>>>  Send in your submission now!  <<<<<<<<<<<<<<<<<<<<

The Unicode Standard has become the foundation for all modern text
processing.  It is used on large machines, tiny portable devices, and
for distributed processing across the Internet.  The standard brings
cost-reducing efficiency to international applications and enables the
exchange of text in an ever increasing list of natural languages.

New technologies and innovative Internet applications, as well as the
evolving Unicode Standard, bring new challenges along with their new
capabilities.  This technical conference will explore the opportunities
created by the latest advances and how to leverage them, as well as
potential pitfalls to be aware of, and problem areas that need further
research.

We invite you to submit papers which either define the software of
tomorrow, demonstrate best practice with today's software, or articulate
problems that must be solved before further advances can occur.  Papers
should discuss subjects in the context of Unicode, internationalization
or localization. You can view the programs of previous conferences at:
http://www.unicode.org/unicode/conference/about-conf.html

Conference attendees are generally involved in either the development,
deployment or use of Unicode software or content, or the globalization
of software and the Internet.  They include managers, software
engineers, systems analysts, font designers, graphic designers, content
developers, technical writers, and product marketing personnel.

THEME & TOPICS

Computing with Unicode is the overall theme of the Conference.
Presentations should be geared towards a technical audience.  Topics of
interest include, but are not limited to, the following (within the
context of Unicode, internationalization or localization):

- UTFs: Not enough or too many?
- Security concerns e.g. Avoiding the spoofing of UTF-8 data
- Impact of new encoding standards
- Implementing Unicode: Practical and political hurdles
- Portable devices
- Implementing new features of recent versions of Unicode
- Algorithms (e.g. normalization, collation, bidirectional)
- Programming languages and libraries (Java, Perl, et al)
- The World Wide Web (WWW)
- Search engines
- Library and archival concerns
- Operating systems
- Databases
- Large scale networks
- Government applications
- Evaluations (case studies, usability studies)
- Natural language processing
- Migrating legacy applications
- Cross platform issues
- Printing and imaging
- Optimizing performance of systems and applications
- Testing applications
- XML and Web protocols
- Business models for software development (e.g. Open source)

SESSIONS

The Conference Program will provide a wide range of sessions including:
- Keynote presentations
- Workshops/Tutorials
- Technical presentations
- Panel sessions

All sessions except the Workshops/Tutorials will be of 40 minute
duration.  In some cases, two consecutive 40 minute program slots may be
devoted to a single session.

The Workshops/Tutorials will each last approximately three hours.  They
should be designed to stimulate discussion and participation, using
slides and demonstrations.

PUBLICITY

If your paper is accepted, your details will be included in the
Conference brochure and Web pages and the paper itself will appear on a
Conference CD, with an optional printed book of Conference Proceedings.

CONFERENCE LANGUAGE

The Conference language is English.  All submissions, papers and
presentations should be provided in English.

SUBMISSIONS

Submissions MUST contain:

1. An abstract of 150-250 words, consisting of statement of purpose,
   paper description, and your conclusions or final summary.

2. A brief biography.

3. The details listed below:

   SESSION TITLE:             _________________________________________

                              _________________________________________

   YOUR TITLE (eg Prof):      _________________________________________

   YOUR NAME:                 _________________________________________

   YOUR JOB TITLE:            _________________________________________

   ORGANIZATION/AFFILIATION:  _________________________________________

   ORGANIZATION'S WWW URL:    _________________________________________

   YOUR WWW URL:              _________________________________________

   ADDRESS FOR PAPER MAIL:    _________________________________________

                              _________________________________________

                              _________________________________________

   TELEPHONE:                 _________________________________________

   FAX:                       _________________________________________

   E-MAIL ADDRESS:            _________________________________________

   TYPE OF SESSION:           [ ] Keynote presentation

                              [ ] Workshop/Tutorial

                              [ ] Technical presentation

                              [ ] Panel

   PANELISTS (if Panel):      _________________________________________

                              _________________________________________

                              _________________________________________

                              _________________________________________

                              _________________________________________

                              _________________________________________

                              _________________________________________

                              _________________________________________

   TARGET AUDIENCE (you may select more than one category):

                              [ ] Content Developers

                              [ ] Font Designers

                              [ ] Graphic Designers

                              [ ] Managers

                              [ ] Marketers

                              [ ] Software Engineers

                              [ ] Systems Analysts

                              [ ] Technical Writers

                              [ ] Others (please specify):

                              _________________________________________

                              _________________________________________

   LEVEL OF SESSION (you may select more than one category):

                              [ ] Beginner

                              [ ] Intermediate

                              [ ] Advanced

Submissions should be sent by e-mail to either of the following
addresses:

   papers@unicode.org

   info@global-conference.com

They should use ASCII, non-compressed text and the following subject
line:

   Proposal for IUC 20

If desired, a copy of the submission may also be sent by post to:

   Twentieth International Unicode Conference
   c/o Global Meeting Services, Inc.
   4360 Benhurst Avenue
   San Diego, CA  92122  USA
   Tel: +1 858 638 0206
   Fax: +1 858 638 0504

CONFERENCE PROCEEDINGS

All Conference papers will be published on CD.  Printed proceedings will
be offered as an option.

EXHIBIT OPPORTUNITIES

The Conference will have an Exhibition area for corporations or
individuals who wish to display and promote their products, technology
and/or services.

Every effort will be made to provide maximum exposure and advertising.

Exhibit space is limited.  For further information or to reserve a
place, please contact Global Meeting Services at the above location.

CONFERENCE VENUE

   Omni Shoreham Hotel
   2500 Calvert Street, NW
   Washington, DC  20008
   USA

   Tel: +1 202 234 0700
   Fax: +1 202 265 7972

THE UNICODE CONSORTIUM

The Unicode Consortium was founded as a non-profit organization in 1991.
It is dedicated to the development, maintenance and promotion of The
Unicode Standard, a worldwide character encoding.  The Unicode Standard
encodes the characters of the world's principal scripts and languages,
and is code-for-code identical to the international standard ISO/IEC
10646.  In addition to cooperating with ISO on the future development of
ISO/IEC 10646, the Consortium is responsible for providing character
properties and algorithms for use in implementations.  Today the
membership base of the Unicode Consortium includes major computer
corporations, software producers, database vendors, research
institutions, international agencies and various user groups.

For further information on the Unicode Standard, visit the Unicode Web
site at http://www.unicode.org or e-mail <info@unicode.org>

                           *  *  *  *  *

Unicode(r) and the Unicode logo are registered trademarks of Unicode,
Inc.  Used with permission.


-----------------------------------------------------------------
        Visit our Internet site at http://www.reuters.com

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.


From Misha.Wolf@reuters.com  Wed Sep 12 16:51:49 2001
From: Misha.Wolf@reuters.com (Misha.Wolf@reuters.com)
Date: Wed, 12 Sep 2001 16:51:49 +0100
Subject: [I18n-sig] Status of the Unicode Conference
Message-ID: <T55f3b325f0c407b707480@reuters.com>

There follows a message from Lisa Moore, Unicode Conference co-chair.
For Conference details, see http://www.unicode.org/iuc/iuc19 .

Misha

~~~

>From the Unicode conference, let me say, that yes, there is a conference
underway.  Certainly many people who planned to attend are no longer able.
So far, about ten of our speakers are unable to make travel arrangements.

To the best of my knowledge, no one involved in the conference was in New
York or on one of the flights involved in today's tragedies.

We have contacted most of the speakers who have not been able to travel,
and they are well, and many are still planning on coming.

So, if you are in the Bay Area, and wish to attend, please do so - we would
very much like to see you.

Lisa


-----------------------------------------------------------------
        Visit our Internet site at http://www.reuters.com

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.


From rnd@onego.ru  Fri Sep 14 20:38:00 2001
From: rnd@onego.ru (Roman Suzi)
Date: Fri, 14 Sep 2001 23:38:00 +0400 (MSD)
Subject: [I18n-sig] pygettext and PEP #?#
Message-ID: <Pine.LNX.4.30.0109142330430.2985-100000@rnd.onego.ru>

Hello!

I remeber we had hot discussion about how to tell
Python which encoding it's code is in.

po-files use the following convention:

"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: Fri Sep 14 21:32:52 2001\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: ENCODING\n"
"Generated-By: pygettext.py 1.3\n"

Probably Python's module-level doc-string could also adopt
RFC-822 style header which will provide such meta-information?
(Right now doc strings do not concatenate).

That is, making __doc__ RFC822 message which header has metainfromation
and body - usual comments.

Sincerely yours, Roman Suzi
-- 
_/ Russia _/ Karelia _/ Petrozavodsk _/ rnd@onego.ru _/
_/ Friday, September 14, 2001 _/ Powered by Linux RedHat 6.2 _/
_/ "URA Redneck if you own a homemade fur coat." _/


From martin@loewis.home.cs.tu-berlin.de  Fri Sep 14 21:49:08 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 14 Sep 2001 22:49:08 +0200
Subject: [I18n-sig] pygettext and PEP #?#
In-Reply-To: <Pine.LNX.4.30.0109142330430.2985-100000@rnd.onego.ru> (message
 from Roman Suzi on Fri, 14 Sep 2001 23:38:00 +0400 (MSD))
References: <Pine.LNX.4.30.0109142330430.2985-100000@rnd.onego.ru>
Message-ID: <200109142049.f8EKn8B02520@mira.informatik.hu-berlin.de>

> Probably Python's module-level doc-string could also adopt
> RFC-822 style header which will provide such meta-information?
> (Right now doc strings do not concatenate).
> 
> That is, making __doc__ RFC822 message which header has metainfromation
> and body - usual comments.

It depends on what you want to use this information for. If you want
the interpreter to automatically react in some way (e.g. convert
strings to Unicode objects automatically based on the module
encoding), then I suggest that (ab-)using the doc string for that is a
bad idea.

Furthermore, I doubt that many users of doc strings are interested in
the encoding of the module doc string (which they'd get when doing
help(module)); instead, they only care that it prints right even if it
is not ASCII.

There are many ways to signal languages, and RFC822 headers are surely
one of them (the application in GNU message catalogs originated from
MIME, which is also the foundation for indicating languages in HTTP).

So the problem is not so much the format of the meta information, but
where to place it and how to process it.

Regards,
Martin


From Alexandre.Fayolle@logilab.fr  Sat Sep 15 18:51:26 2001
From: Alexandre.Fayolle@logilab.fr (Alexandre Fayolle)
Date: Sat, 15 Sep 2001 19:51:26 +0200 (CEST)
Subject: [I18n-sig] gettext and windows
Message-ID: <Pine.LNX.4.21.0109151947290.4578-100000@orion.logilab.fr>

Hello,

I'm testing an app that runs fine under linux, but whose l10n fails under
windows (Win98). By tracking down the code in gettext.py, I saw that
this module uses environment variables to get the current locale. However,
this is not the correct way of doing things on Windows system, since the
LC_ALL variables is generally not set, resulting in the C locale being
used.

I have a patch which uses locale.getdefaultlocale()[0] to get information,
but I wanted to know if there was a reason why this had not been used in
the first place. 

Thanks

Alexandre Fayolle
-- 
LOGILAB, Paris (France).
http://www.logilab.com   http://www.logilab.fr  http://www.logilab.org
Narval, the first software agent available as free software (GPL).


From martin@loewis.home.cs.tu-berlin.de  Mon Sep 17 07:09:32 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 17 Sep 2001 08:09:32 +0200
Subject: [I18n-sig] gettext and windows
In-Reply-To: <Pine.LNX.4.21.0109151947290.4578-100000@orion.logilab.fr>
 (message from Alexandre Fayolle on Sat, 15 Sep 2001 19:51:26 +0200
 (CEST))
References: <Pine.LNX.4.21.0109151947290.4578-100000@orion.logilab.fr>
Message-ID: <200109170609.f8H69Wb01007@mira.informatik.hu-berlin.de>

> I have a patch which uses locale.getdefaultlocale()[0] to get information,
> but I wanted to know if there was a reason why this had not been used in
> the first place. 

gettext.find uses the GNU gettext strategy for locating catalogs. I
think there should be a routine that models GNU gettext as close as
possible, even on Windows - Windows also supports environment
variables, after all. That routine does not need to be called
gettext.find, though.

I'd agree that this algorithm is not optimal. However, just
considering the default locale is not appropriate, either:
- On Windows, there is a user locale and a system locale. I don't
  know what the change is that they ever differ, but if they do,
  this might need consideration
- Currently, catalogs are located in 
  <sys.prefix>/share/locale/<lang>/LC_MESSAGES. This is appropriate
  on Unix, since each of the directories on the path will contain
  a lot of other stuff. It is less appropriate on Windows; we should
  consider placing the catalogs into a location nearer to the root
  of the Python installation.
- GNU gettext has the feature of fallback languages, e.g. setting
  LANGUAGES to "fr:es" indicates that you prefer French translations,
  but if none are available, you'd prefer Spanish ones over the default
  text (which typically is English). That may be worth being exposed
  also (*).

So there is something to be fixed, but it appears that more is
involved than just looking at the default locale.

Regards,
Martin

(*) Of course, the fallback mechanism is not fully implemented in
gettext.py, yet: it will fallback on a per-catalog basis, but not on a
per-message basis.


From Misha.Wolf@reuters.com  Wed Sep 19 06:23:07 2001
From: Misha.Wolf@reuters.com (Misha.Wolf@reuters.com)
Date: Wed, 19 Sep 2001 06:23:07 +0100
Subject: [I18n-sig] Last Call for Papers - 20th Unicode Conference - Jan/Feb 2001 -
 Washington DC
Message-ID: <T56157fea13c407b707480@reuters.com>

Because of the recent tragic events and the resulting disruption we are
sending you a reminder that this is the final week for submissions for
        the Twentieth International Unicode Conference (IUC20).

>>>>>>>>>>>>>>>>>>>>>>>  Last Call for Papers!  <<<<<<<<<<<<<<<<<<<<<<<

           Twentieth International Unicode Conference (IUC20)
               Unicode and the Web: The Global Connection
                    http://www.unicode.org/iuc/iuc20
                     January 28 - February 1, 2002
                          Washington, DC, USA

>>>>>>>>>>>>>>>>>>>>>>>>  Just 2 weeks to go!  <<<<<<<<<<<<<<<<<<<<<<<<

                  Submissions due: September 21, 2001
                  Notification date: October 12, 2001
                Completed papers due : November 2, 2001
            (in electronic form and camera-ready paper form)

>>>>>>>>>>>>>>>>>>>  Send in your submission now!  <<<<<<<<<<<<<<<<<<<<

The Unicode Standard has become the foundation for all modern text
processing.  It is used on large machines, tiny portable devices, and
for distributed processing across the Internet.  The standard brings
cost-reducing efficiency to international applications and enables the
exchange of text in an ever increasing list of natural languages.

New technologies and innovative Internet applications, as well as the
evolving Unicode Standard, bring new challenges along with their new
capabilities.  This technical conference will explore the opportunities
created by the latest advances and how to leverage them, as well as
potential pitfalls to be aware of, and problem areas that need further
research.

We invite you to submit papers which either define the software of
tomorrow, demonstrate best practice with today's software, or articulate
problems that must be solved before further advances can occur.  Papers
should discuss subjects in the context of Unicode, internationalization
or localization. You can view the programs of previous conferences at:
http://www.unicode.org/unicode/conference/about-conf.html

Conference attendees are generally involved in either the development,
deployment or use of Unicode software or content, or the globalization
of software and the Internet.  They include managers, software
engineers, systems analysts, font designers, graphic designers, content
developers, technical writers, and product marketing personnel.

THEME & TOPICS

Computing with Unicode is the overall theme of the Conference.
Presentations should be geared towards a technical audience.  Topics of
interest include, but are not limited to, the following (within the
context of Unicode, internationalization or localization):

- UTFs: Not enough or too many?
- Security concerns e.g. Avoiding the spoofing of UTF-8 data
- Impact of new encoding standards
- Implementing Unicode: Practical and political hurdles
- Portable devices
- Implementing new features of recent versions of Unicode
- Algorithms (e.g. normalization, collation, bidirectional)
- Programming languages and libraries (Java, Perl, et al)
- The World Wide Web (WWW)
- Search engines
- Library and archival concerns
- Operating systems
- Databases
- Large scale networks
- Government applications
- Evaluations (case studies, usability studies)
- Natural language processing
- Migrating legacy applications
- Cross platform issues
- Printing and imaging
- Optimizing performance of systems and applications
- Testing applications
- XML and Web protocols
- Business models for software development (e.g. Open source)

SESSIONS

The Conference Program will provide a wide range of sessions including:
- Keynote presentations
- Workshops/Tutorials
- Technical presentations
- Panel sessions

All sessions except the Workshops/Tutorials will be of 40 minute
duration.  In some cases, two consecutive 40 minute program slots may be
devoted to a single session.

The Workshops/Tutorials will each last approximately three hours.  They
should be designed to stimulate discussion and participation, using
slides and demonstrations.

PUBLICITY

If your paper is accepted, your details will be included in the
Conference brochure and Web pages and the paper itself will appear on a
Conference CD, with an optional printed book of Conference Proceedings.

CONFERENCE LANGUAGE

The Conference language is English.  All submissions, papers and
presentations should be provided in English.

SUBMISSIONS

Submissions MUST contain:

1. An abstract of 150-250 words, consisting of statement of purpose,
   paper description, and your conclusions or final summary.

2. A brief biography.

3. The details listed below:

   SESSION TITLE:             _________________________________________

                              _________________________________________

   YOUR TITLE (eg Prof):      _________________________________________

   YOUR NAME:                 _________________________________________

   YOUR JOB TITLE:            _________________________________________

   ORGANIZATION/AFFILIATION:  _________________________________________

   ORGANIZATION'S WWW URL:    _________________________________________

   YOUR WWW URL:              _________________________________________

   ADDRESS FOR PAPER MAIL:    _________________________________________

                              _________________________________________

                              _________________________________________

   TELEPHONE:                 _________________________________________

   FAX:                       _________________________________________

   E-MAIL ADDRESS:            _________________________________________

   TYPE OF SESSION:           [ ] Keynote presentation

                              [ ] Workshop/Tutorial

                              [ ] Technical presentation

                              [ ] Panel

   PANELISTS (if Panel):      _________________________________________

                              _________________________________________

                              _________________________________________

                              _________________________________________

                              _________________________________________

                              _________________________________________

                              _________________________________________

                              _________________________________________

   TARGET AUDIENCE (you may select more than one category):

                              [ ] Content Developers

                              [ ] Font Designers

                              [ ] Graphic Designers

                              [ ] Managers

                              [ ] Marketers

                              [ ] Software Engineers

                              [ ] Systems Analysts

                              [ ] Technical Writers

                              [ ] Others (please specify):

                              _________________________________________

                              _________________________________________

   LEVEL OF SESSION (you may select more than one category):

                              [ ] Beginner

                              [ ] Intermediate

                              [ ] Advanced

Submissions should be sent by e-mail to either of the following
addresses:

   papers@unicode.org

   info@global-conference.com

They should use ASCII, non-compressed text and the following subject
line:

   Proposal for IUC 20

If desired, a copy of the submission may also be sent by post to:

   Twentieth International Unicode Conference
   c/o Global Meeting Services, Inc.
   4360 Benhurst Avenue
   San Diego, CA  92122  USA
   Tel: +1 858 638 0206
   Fax: +1 858 638 0504

CONFERENCE PROCEEDINGS

All Conference papers will be published on CD.  Printed proceedings will
be offered as an option.

EXHIBIT OPPORTUNITIES

The Conference will have an Exhibition area for corporations or
individuals who wish to display and promote their products, technology
and/or services.

Every effort will be made to provide maximum exposure and advertising.

Exhibit space is limited.  For further information or to reserve a
place, please contact Global Meeting Services at the above location.

CONFERENCE VENUE

   Omni Shoreham Hotel
   2500 Calvert Street, NW
   Washington, DC  20008
   USA

   Tel: +1 202 234 0700
   Fax: +1 202 265 7972

THE UNICODE CONSORTIUM

The Unicode Consortium was founded as a non-profit organization in 1991.
It is dedicated to the development, maintenance and promotion of The
Unicode Standard, a worldwide character encoding.  The Unicode Standard
encodes the characters of the world's principal scripts and languages,
and is code-for-code identical to the international standard ISO/IEC
10646.  In addition to cooperating with ISO on the future development of
ISO/IEC 10646, the Consortium is responsible for providing character
properties and algorithms for use in implementations.  Today the
membership base of the Unicode Consortium includes major computer
corporations, software producers, database vendors, research
institutions, international agencies and various user groups.

For further information on the Unicode Standard, visit the Unicode Web
site at http://www.unicode.org or e-mail <info@unicode.org>

                           *  *  *  *  *

Unicode(r) and the Unicode logo are registered trademarks of Unicode,
Inc.  Used with permission.


-----------------------------------------------------------------
        Visit our Internet site at http://www.reuters.com

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.


From kajiyama@grad.sccs.chukyo-u.ac.jp  Tue Sep 25 16:38:13 2001
From: kajiyama@grad.sccs.chukyo-u.ac.jp (Tamito KAJIYAMA)
Date: Wed, 26 Sep 2001 00:38:13 +0900
Subject: [I18n-sig] JapaneseCodecs 1.4 released
Message-ID: <200109251538.AAA30063@dhcp209.grad.sccs.chukyo-u.ac.jp>

Hi all,

I released JapaneseCodecs version 1.4.  The source tarball is
available at the following location:

  http://pseudo.grad.sccs.chukyo-u.ac.jp/~kajiyama/python/

The major enhancement of this release is the set of new codecs
written in C.  The performances in both speed and storage size
would be impressive as described below.  Please check it out!

Here is the result of a simple benchmark test that encodes a
Unicode string and then decodes it back.  The new codecs written
in C are much much faster than the old codecs written in Python
(time is shown in seconds).

  a Unicode string of 10,000 chars
                          in Python   in C
  japanese.euc-jp         1.074       0.003859
  japanese.shift_jis      1.059       0.003981
  japanese.iso-2022-jp    0.842       0.007737
  
  a Unicode string of 100,000 chars
                          in Python   in C
  japanese.euc-jp         11.54       0.02978
  japanese.shift_jis      11.55       0.03047
  japanese.iso-2022-jp    8.345       0.06522

  a Unicode string of 1,000,000 chars
                          in Python   in C
  japanese.euc-jp         126.7       0.2259
  japanese.shift_jis      125.9       0.2276
  japanese.iso-2022-jp    82.87       0.5892

The runtime memory size is also reduced drastically.  In the
case of a Linux box of mine, the old codecs in Python require
the runtime memory of 3,364K bytes, while the new codecs in C
occupy only 124K bytes.  In addition, the start-up time of the
Python interpreter is much shorter if one of the Japanese codecs
is used as the system default encoding.

I adopted a hashing technique in order to archive the high
performances in both speed and storage size.  Thanks Marc-Andre
for your advice (given by a couple of private messages long time
ago ;-).

Part of the program in src/_japanese_codecs.c is based on
ms932codec.c written by Atsuo ISHIMOTO.  Some helper functions
are used as they are.  I appreciate his invaluable work.

For developers of possible derived packages: Character mapping
tables in the form of hash tables are in src/_japanese_codecs.h.
This is an auto-generated file; you may want to look at the hash
table generator src/hgen.py and hash table look-up functions in
src/_japanese_codecs.c (lookup_jis_map() and lookup_ucs_map()).
If you are familiar with the programming of Python extension
modules, you will be able to apply the codes to other character
encodings such as EUC-KR and BIG-5 without trouble.  The hashing
function f() is (charcode % 523), and I heuristically chose the
divider (a prime number greater than 256).  I believe that the
value 523 is not bad in many cases.  In general, the larger the
divider is, the faster the look-up functions run, and the bigger
the hash tables are (and vise versa).  Try other prime numbers
if the resulting performances of the look-up functions and sizes
of hash tables are not desirable.

The new codecs in C are very young, and probably have a number
of bugs.  Any kind of feedback is vary appreciated.

Thank you,

-- 
KAJIYAMA, Tamito <kajiyama@grad.sccs.chukyo-u.ac.jp>