[Python-Dev] Codecs and StreamCodecs
M.-A. Lemburg
mal@lemburg.com
Tue, 16 Nov 1999 17:00:58 +0100
Here is a new proposal for the codec interface:
class Codec:
def encode(self,u,slice=None):
""" Return the Unicode object u encoded as Python string.
If slice is given (as slice object), only the sliced part
of the Unicode object is encoded.
The method may not store state in the Codec instance. Use
SteamCodec for codecs which have to keep state in order to
make encoding/decoding efficient.
"""
...
def decode(self,s,slice=None):
""" Return an equivalent Unicode object for the encoded Python
string s.
If slice is given (as slice object), only the sliced part
of the Python string is decoded and returned as Unicode
object. Note that this can cause the decoding algorithm
to fail due to truncations in the encoding.
The method may not store state in the Codec instance. Use
SteamCodec for codecs which have to keep state in order to
make encoding/decoding efficient.
"""
...
class StreamCodec(Codec):
def __init__(self,stream=None,errors='strict'):
""" Creates a StreamCodec instance.
stream must be a file-like object open for reading and/or
writing binary data depending on the intended codec
action or None.
The StreamCodec may implement different error handling
schemes by providing the errors argument. These parameters
are known (they need not all be supported by StreamCodec
subclasses):
'strict' - raise an UnicodeError (or a subclass)
'ignore' - ignore the character and continue with the next
(a single character)
- replace errorneous characters with the given
character (may also be a Unicode character)
"""
self.stream = stream
def write(self,u,slice=None):
""" Writes the Unicode object's contents encoded to self.stream.
stream must be a file-like object open for writing binary
data.
If slice is given (as slice object), only the sliced part
of the Unicode object is written.
"""
... the base class should provide a default implementation
of this method using self.encode ...
def read(self,length=None):
""" Reads an encoded string from the stream and returns
an equivalent Unicode object.
If length is given, only length Unicode characters are
returned (the StreamCodec instance reads as many raw bytes
as needed to fulfill this requirement). Otherwise, all
available data is read and decoded.
"""
... the base class should provide a default implementation
of this method using self.decode ...
It is not required by the unicodec.register() API to provide a
subclass of these base class, only the given methods must be present;
this allows writing Codecs as extensions types. All Codecs must
provide the .encode()/.decode() methods. Codecs having the .read()
and/or .write() methods are considered to be StreamCodecs.
The Unicode implementation will by itself only use the
stateless .encode() and .decode() methods.
All other conversion have to be done by explicitly instantiating
the appropriate [Stream]Codec.
--
Feel free to beat on this one ;-)
--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 45 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/