[Python-Dev] Codecs and StreamCodecs

M.-A. Lemburg mal@lemburg.com
Wed, 17 Nov 1999 10:29:34 +0100


Fredrik Lundh wrote:
> 
> --------------------------------------------------------------------
> A PIL-like Unicode Codec Proposal
> --------------------------------------------------------------------
> 
> In the PIL model, the codecs are called with a piece of data, and
> returns the result to the caller.  The codecs maintain internal state
> when needed.
> 
> class decoder:
> 
>     def decode(self, s, offset=0):
>         # decode as much data as we possibly can from the
>         # given string.  if there's not enough data in the
>         # input string to form a full character, return
>         # what we've got this far (this might be an empty
>         # string).
> 
>     def flush(self):
>         # flush the decoding buffers.  this should usually
>         # return None, unless the fact that knowing that the
>         # input stream has ended means that the state can be
>         # interpreted in a meaningful way.  however, if the
>         # state indicates that there last character was not
>         # finished, this method should raise a UnicodeError
>         # exception.

Could you explain for reason for having a .flush() method
and what it should return.

Note that the .decode method is not so much different
from my Codec.decode method except that it uses a single
offset where my version uses a slice (the offset is probably
the better variant, because it avoids data truncation).
 
> class encoder:
> 
>     def encode(self, u, offset=0, buffersize=0):
>         # encode data from the given offset in the input
>         # unicode string into a buffer of the given size
>         # (or slightly larger, if required to proceed).
>         # if the buffer size is 0, the decoder is free
>         # to pick a suitable size itself (if at all
>         # possible, it should make it large enough to
>         # encode the entire input string).  returns a
>         # 2-tuple containing the encoded data, and the
>         # number of characters consumed by this call.

Dito.
 
>     def flush(self):
>         # flush the encoding buffers.  returns an ordinary
>         # string (which may be empty), or None.
> 
> Note that a codec instance can be used for a single string; the codec
> registry should hold codec factories, not codec instances.  In
> addition, you may use a single type or class to implement both
> interfaces at once.

Perhaps I'm missing something, but how would you define
stream codecs using this interface ? 

> Implementing stream codecs is left as an exercise (see the zlib
> material in the eff-bot guide for a decoder example).

...?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                    44 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/