[Python-Dev] Codecs and StreamCodecs
Fred L. Drake, Jr.
fdrake@acm.org
Thu, 18 Nov 1999 11:01:47 -0500 (EST)
M.-A. Lemburg writes:
> The problem is that the encoding names are not Python identifiers,
> e.g. iso-8859-1 is allowed as identifier. This and
> the fact that applications may want to ship their own codecs (which
> do not get installed under the system wide encodings package)
> make the registry necessary.
This isn't a substantial problem. Try this on for size (probably
not too different from what everyone is already thinking, but let's
make it clear). This could be in encodings/__init__.py; I've tried to
be really clear on the names. (No testing, only partially complete.)
------------------------------------------------------------------------
import string
import sys
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
class EncodingError(Exception):
def __init__(self, encoding, error):
self.encoding = encoding
self.strerror = "%s %s" % (error, `encoding`)
self.error = error
Exception.__init__(self, encoding, error)
_registry = {}
def registerEncoding(encoding, encode=None, decode=None,
make_stream_encoder=None, make_stream_decoder=None):
encoding = encoding.lower()
if _registry.has_key(encoding):
info = _registry[encoding]
else:
info = _registry[encoding] = Codec(encoding)
info._update(encode, decode,
make_stream_encoder, make_stream_decoder)
def getCodec(encoding):
encoding = encoding.lower()
if _registry.has_key(encoding):
return _registry[encoding]
# load the module
modname = "encodings." + encoding.replace("-", "_")
try:
__import__(modname)
except ImportError:
raise EncodingError("unknown uncoding " + `encoding`)
# if the module registered, use the codec as-is:
if _registry.has_key(encoding):
return _registry[encoding]
# nothing registered, use well-known names
module = sys.modules[modname]
codec = _registry[encoding] = Codec(encoding)
encode = getattr(module, "encode", None)
decode = getattr(module, "decode", None)
make_stream_encoder = getattr(module, "make_stream_encoder", None)
make_stream_decoder = getattr(module, "make_stream_decoder", None)
codec._update(encode, decode,
make_stream_encoder, make_stream_decoder)
class Codec:
__encode = None
__decode = None
__stream_encoder_factory = None
__stream_decoder_factory = None
def __init__(self, name):
self.name = name
def encode(self, u):
if self.__stream_encoder_factory:
sio = StringIO()
encoder = self.__stream_encoder_factory(sio)
encoder.write(u)
encoder.flush()
return sio.getvalue()
else:
raise EncodingError("no encoder available for " + `self.name`)
# similar for decode()...
def make_stream_encoder(self, target):
if self.__stream_encoder_factory:
return self.__stream_encoder_factory(target)
elif self.__encode:
return DefaultStreamEncoder(target, self.__encode)
else:
raise EncodingError("no encoder available for " + `self.name`)
# similar for make_stream_decoder()...
def _update(self, encode, decode,
make_stream_encoder, make_stream_decoder):
self.__encode = encode or self.__encode
self.__decode = decode or self.__decode
self.__stream_encoder_factory = (
make_stream_encoder or self.__stream_encoder_factory)
self.__stream_decoder_factory = (
make_stream_decoder or self.__stream_decoder_factory)
------------------------------------------------------------------------
> I don't see a problem with the registry though -- the encodings
> package can take care of the registration process without any
No problem at all; we just need to make sure the right magic is
there for the "normal" case.
> PS: we could probably even take the whole codec idea one step
> further and also allow other input/output formats to be registered,
File formats are different from text encodings, so let's keep them
separate. Yes, a registry can be a good approach whenever the various
things being registered are sufficiently similar semantically, but the
behavior of the registry/lookup can be very different for each type of
thing. Let's not over-generalize.
-Fred
--
Fred L. Drake, Jr. <fdrake@acm.org>
Corporation for National Research Initiatives