[Python-Dev] Some thoughts on the codecs...

M.-A. Lemburg mal@lemburg.com
Tue, 16 Nov 1999 11:48:13 +0100


Andy Robinson wrote:
> 
> On Mon, 15 Nov 1999 23:54:38 +0100, you wrote:
> 
> >[I'll get back on this tomorrow, just some quick notes here...]
> >The Codecs provide implementations for encoding and decoding,
> >they are not intended as complete wrappers for e.g. files or
> >sockets.
> >
> >The unicodec module will define a generic stream wrapper
> >(which is yet to be defined) for dealing with files, sockets,
> >etc. It will use the codec registry to do the actual codec
> >work.
> >
> >XXX unicodec.file(<filename>,<mode>,<encname>) could be provided as
> >    short-hand for unicodec.file(open(<filename>,<mode>),<encname>) which
> >    also assures that <mode> contains the 'b' character when needed.
> >
> >The Codec interface defines two pairs of methods
> >on purpose: one which works internally (ie. directly between
> >strings and Unicode objects), and one which works externally
> >(directly between a stream and Unicode objects).
> 
> That's the problem Guido and I are worried about.  Your present API is
> not enough to build stream encoders.  The 'slurp it into a unicode
> string in one go' approach fails for big files or for network
> connections.  And you just cannot build a generic stream reader/writer
> by slicing it into strings.   The solution must be specific to the
> codec - only it knows how much to buffer, when to flip states etc.
> 
> So the codec should provide proper stream reading and writing
> services.

I guess I'll have to rethink the Codec specs. Some leads:

1. introduce a new StreamCodec class which is designed for
   handling stream encoding and decoding (and supports
   state)

2. give more information to the unicodec registry: 
   one could register classes instead of instances which the Unicode
   imlementation would then instantiate whenever it needs to
   apply the conversion; since this is only needed for encodings
   maintaining state, the registery would only have to do the
   instantiation for these codecs and could use cached instances for
   stateless codecs.
 
> Unicodec can then wrap those up in labour-saving ways - I'm not fussy
> which but I like the one-line file-open utility.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                    45 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/