[Python-Dev] Some thoughts on the codecs...

Andy Robinson andy@robanal.demon.co.uk
Mon, 15 Nov 1999 22:30:53 GMT


On Mon, 15 Nov 1999 16:37:28 -0500, you wrote:

># assuming variables input_file, input_encoding, output_file,
># output_encoding, and constant BUFFER_SIZE
>
>f = open(input_file, "rb")
>f1 = unicodec.codecs[input_encoding].stream_reader(f)
>g = open(output_file, "wb")
>g1 = unicodec.codecs[output_encoding].stream_writer(f)
>
>while 1:
>      buffer = f1.read(BUFFER_SIZE)
>      if not buffer:
>	 break
>      f2.write(buffer)
>
>f2.close()
>f1.close()
>
>Note that we could possibly make these the only API that a codec needs
>to provide; the string object <--> unicode object conversions can be
>done using this and the cStringIO module.  (On the other hand it seems
>a common case that would be quite useful.)
Perfect.  I'd keep the string ones - easy to implement but a big
convenience.

The proposal also says:
>For explicit handling of Unicode using files, the unicodec module
>could provide stream wrappers which provide transparent
>encoding/decoding for any open stream (file-like object):
>
>  import unicodec
>  file = open('mytext.txt','rb')
>  ufile = unicodec.stream(file,'utf-16')
>  u = ufile.read()
>  ...
>  ufile.close()

It seems to me that if we go for stream_reader, it replaces this bit
of the proposal too - no need for unicodec to provide anything.  If
you want to have a convenience function there to save a line or two,
you could have
	unicodec.open(filename, mode, encoding)
which returned a stream_reader.


- Andy