[Python-Dev] Some thoughts on the codecs...

M.-A. Lemburg mal@lemburg.com
Wed, 17 Nov 1999 11:11:05 +0100


Mark Hammond wrote:
> 
> This is leading me to conclude that our "codec registry" should be the
> file system, and Python modules.
> 
> Would it be possible to define a "standard package" called
> "encodings", and when we need an encoding, we simply attempt to load a
> module from that package?  The key benefits I see are:
> 
> * No need to load modules simply to register a codec (which would make
> the number of open calls even higher, and the startup time even
> slower.)  This makes it truly demand-loading of the codecs, rather
> than explicit load-and-register.
> 
> * Making language specific distributions becomes simple - simply
> select a different set of modules from the "encodings" directory.  The
> Python source distribution has them all, but (say) the Windows binary
> installer selects only a few.  The Japanese binary installer for
> Windows installs a few more.
> 
> * Installing new codecs becomes trivial - no need to hack site.py
> etc - simply copy the new "codec module" to the encodings directory
> and you are done.
> 
> * No serious problem for GMcM's installer nor for freeze
> 
> We would probably need to assume that certain codes exist for _all_
> platforms and language - but this is no different to assuming that
> "exceptions.py" also exists for all platforms.
> 
> Is this worthy of consideration?

Why not... using the new registry scheme I proposed in the
thread "Codecs and StreamCodecs" you could implement this
via factory_functions and lazy imports (with the encoding
name folded to make up a proper Python identifier, e.g.
hyphens get converted to '' and spaces to '_').

I'd suggest grouping encodings:

[encodings]
	[iso}
		[iso88591]
		[iso88592]
	[jis]
		...
	[cyrillic]
		...
	[misc]

The unicodec registry could then query encodings.get(encoding,action)
and the package would take care of the rest.

Note that the "walk-me-up-scotty" import patch would probably
be nice in this situation too, e.g. to reach the modules in
[misc] or in higher levels such the ones in [iso] from
[iso88591].

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                    44 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/