Writing new codecs

Sat Jun 16 19:10:48 EDT 2001

Brian Quinlan wrote:
> 
> I just finished writing a modified UTF-7 encoder/decoder in Python and
> am planning on contributing it back.
> 
> My question is regarding form: should new codecs be written in C or
> Python?

For often used codecs which need good performance, C is certainly
the language of choice. Writing such a C module is easy: you
may want to use the _codecsmodule.c code and the UTF-8 codec
in unicodeobject.c as templates.

> AFAIK, Python does not currently include any codecs written in Python
> (I'm not counting the character mapping ones or the trivial wrappers
> around C code). Is this because the current codecs are all "important"
> and future codecs should be implemented elsewhere to avoid adding too
> much junk to codecmodule.c and unicodeobject.c?

No, it's because performance matter ;-)

> Or is the pattern of providing the actual implementation in
> unicodeobject.c going to continue?

Only the most important codecs will be placed into unicodeobject.c.

I think that UTF-7 would be a good candidate, since it is
a native Unicode encoding. UTF-32 would be another candidate.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/