[Python-Dev] Codecs and StreamCodecs
M.-A. Lemburg
mal@lemburg.com
Wed, 17 Nov 1999 10:29:34 +0100
Fredrik Lundh wrote:
>
> --------------------------------------------------------------------
> A PIL-like Unicode Codec Proposal
> --------------------------------------------------------------------
>
> In the PIL model, the codecs are called with a piece of data, and
> returns the result to the caller. The codecs maintain internal state
> when needed.
>
> class decoder:
>
> def decode(self, s, offset=0):
> # decode as much data as we possibly can from the
> # given string. if there's not enough data in the
> # input string to form a full character, return
> # what we've got this far (this might be an empty
> # string).
>
> def flush(self):
> # flush the decoding buffers. this should usually
> # return None, unless the fact that knowing that the
> # input stream has ended means that the state can be
> # interpreted in a meaningful way. however, if the
> # state indicates that there last character was not
> # finished, this method should raise a UnicodeError
> # exception.
Could you explain for reason for having a .flush() method
and what it should return.
Note that the .decode method is not so much different
from my Codec.decode method except that it uses a single
offset where my version uses a slice (the offset is probably
the better variant, because it avoids data truncation).
> class encoder:
>
> def encode(self, u, offset=0, buffersize=0):
> # encode data from the given offset in the input
> # unicode string into a buffer of the given size
> # (or slightly larger, if required to proceed).
> # if the buffer size is 0, the decoder is free
> # to pick a suitable size itself (if at all
> # possible, it should make it large enough to
> # encode the entire input string). returns a
> # 2-tuple containing the encoded data, and the
> # number of characters consumed by this call.
Dito.
> def flush(self):
> # flush the encoding buffers. returns an ordinary
> # string (which may be empty), or None.
>
> Note that a codec instance can be used for a single string; the codec
> registry should hold codec factories, not codec instances. In
> addition, you may use a single type or class to implement both
> interfaces at once.
Perhaps I'm missing something, but how would you define
stream codecs using this interface ?
> Implementing stream codecs is left as an exercise (see the zlib
> material in the eff-bot guide for a decoder example).
...?
--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 44 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/