[Python-Dev] Some thoughts on the codecs...
M.-A. Lemburg
mal@lemburg.com
Tue, 16 Nov 1999 11:48:13 +0100
Andy Robinson wrote:
>
> On Mon, 15 Nov 1999 23:54:38 +0100, you wrote:
>
> >[I'll get back on this tomorrow, just some quick notes here...]
> >The Codecs provide implementations for encoding and decoding,
> >they are not intended as complete wrappers for e.g. files or
> >sockets.
> >
> >The unicodec module will define a generic stream wrapper
> >(which is yet to be defined) for dealing with files, sockets,
> >etc. It will use the codec registry to do the actual codec
> >work.
> >
> >XXX unicodec.file(<filename>,<mode>,<encname>) could be provided as
> > short-hand for unicodec.file(open(<filename>,<mode>),<encname>) which
> > also assures that <mode> contains the 'b' character when needed.
> >
> >The Codec interface defines two pairs of methods
> >on purpose: one which works internally (ie. directly between
> >strings and Unicode objects), and one which works externally
> >(directly between a stream and Unicode objects).
>
> That's the problem Guido and I are worried about. Your present API is
> not enough to build stream encoders. The 'slurp it into a unicode
> string in one go' approach fails for big files or for network
> connections. And you just cannot build a generic stream reader/writer
> by slicing it into strings. The solution must be specific to the
> codec - only it knows how much to buffer, when to flip states etc.
>
> So the codec should provide proper stream reading and writing
> services.
I guess I'll have to rethink the Codec specs. Some leads:
1. introduce a new StreamCodec class which is designed for
handling stream encoding and decoding (and supports
state)
2. give more information to the unicodec registry:
one could register classes instead of instances which the Unicode
imlementation would then instantiate whenever it needs to
apply the conversion; since this is only needed for encodings
maintaining state, the registery would only have to do the
instantiation for these codecs and could use cached instances for
stateless codecs.
> Unicodec can then wrap those up in labour-saving ways - I'm not fussy
> which but I like the one-line file-open utility.
--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 45 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/