[Python-Dev] Some thoughts on the codecs...
Andy Robinson
andy@robanal.demon.co.uk
Mon, 15 Nov 1999 22:30:53 GMT
On Mon, 15 Nov 1999 16:37:28 -0500, you wrote:
># assuming variables input_file, input_encoding, output_file,
># output_encoding, and constant BUFFER_SIZE
>
>f = open(input_file, "rb")
>f1 = unicodec.codecs[input_encoding].stream_reader(f)
>g = open(output_file, "wb")
>g1 = unicodec.codecs[output_encoding].stream_writer(f)
>
>while 1:
> buffer = f1.read(BUFFER_SIZE)
> if not buffer:
> break
> f2.write(buffer)
>
>f2.close()
>f1.close()
>
>Note that we could possibly make these the only API that a codec needs
>to provide; the string object <--> unicode object conversions can be
>done using this and the cStringIO module. (On the other hand it seems
>a common case that would be quite useful.)
Perfect. I'd keep the string ones - easy to implement but a big
convenience.
The proposal also says:
>For explicit handling of Unicode using files, the unicodec module
>could provide stream wrappers which provide transparent
>encoding/decoding for any open stream (file-like object):
>
> import unicodec
> file = open('mytext.txt','rb')
> ufile = unicodec.stream(file,'utf-16')
> u = ufile.read()
> ...
> ufile.close()
It seems to me that if we go for stream_reader, it replaces this bit
of the proposal too - no need for unicodec to provide anything. If
you want to have a convenience function there to save a line or two,
you could have
unicodec.open(filename, mode, encoding)
which returned a stream_reader.
- Andy