[I18n-sig] Codec Language

Andy Robinson andy@reportlab.com
Thu, 23 Mar 2000 23:16:55 -0000


> Allowing for both algorithmic and mapping codecs within the same
> implementation might confuse matters somewhat... what about separating
> things into mapping codecs (which will handle all the Unicode stuff),
> and a separate machine (or possibly extension to the mapping machine)
> that can do algorithmic transformations?  This would whittle down the
> immediate problem to developing the mapping machine, which as far as I
> can tell should only have to support reading, writing, lookup, and
> comparison, at least for doing Unicode conversions.  How does this
> sound?

I've been thinking hard what to do next, and actually I think the highest
priorities are

(a) build some kind if cgi test harness (maybe on Starship?), on which we
can stash all manner of input files, and a front end which lets you specify
input (file or a text field), say what encoding it is oiin, and say what
encoding you want to see it in.  Then, just using web browsers, we can
actually see the results of type conversions, and can accumulate test files
with subtle combinations of text.

(b) write some pure Python Asian codecs, no matter how slow, using simple
dictionaries for the mapping tables.  This gives us a benchmark, documents
the algorithms and features we are going to need, and lets people other than
you and I see what features are needed in a faster codec machine.

We should be able to move on that pretty fast.  What do you think?

BTW, I have often used uniconv.exe, a free utility from BasisTech - it is a
command line program to do encoding conversion and character normalization
transformations.  Another really good test target would be to write a
uniconv.py and a harness to run them both - when they give the same output
for all encodings, we know we've done a good job.

- Andy