[issue8796] Deprecate codecs.open()

STINNER Victor report at bugs.python.org
Wed May 18 11:44:04 CEST 2011


STINNER Victor <victor.stinner at haypocalc.com> added the comment:

> This ticket is about deprecating codecs.open(), not about
> StreamWriter and StreamReader.

Right. I may open a different issue.

Can we start by modifying codecs.open() to use the builtin open() (to reuse TextIOWrapper)?

> I'm -1 on deprecating StreamWriter and StreamReader as they provide
> different mechanisms than the io layer which has a specific focus
> on files and buffers.

What are the usecases of StreamReader and StreamWriter, not covered by TextIOWrapper?

TextIOWrapper are used in Python for:

 - files (e.g. open)
 - processes (e.g. open.popen)
 - emails (e.g. mailbox.Message)
 - sockets (e.g. socket.makefile)
 - and maybe other things

StreamReader and StreamWriter are used for:

 - read/write files in Sphinx 1.0.7 (written for Python 2)
 - write the output in pygment 1.3.1 (written for Python 2)
 - but not in the Python interpreter or standard library

*Quick* search of other usages of StreamReader and StreamWriter on the WWW:

 - twisted/mail/imap4.py
 - feeds2imap implements a 'mod-utf-7' codec, pyflag implements a 'ms-pst' codec, pygsm implements a 'gsm0338' codec, so they have StreamReader and StreamWriter classes (but I don't know if these classes are used)

> It would certainly be possible to make the implementations of
> the codecs you mentioned smarter to handle writing BOMs correctly,
> e.g. by making use of the incremental encoder/decoders, if there's
> interest.

Yes, it is possible to fix StreamReader and StreamWriter classes of the mentionned codecs, but it's not possible to write a generic fix in codecs.py. This is exactly why I dislike StreamReader and StreamWriter: they are not incremental and so don't have reset() or setstate() methods. When you implement a StreamReader or StreamWriter class, you have to reimpelment a pseudo-incremental encoder. Compare for example IncrementalEncoder and StreamWriter classes of UTF-16: most code is duplicated.

Because StreamReader and StreamWriter are not incremental, they are not efficient, and it's difficult to handle some issues like BOM which require to handle the codec state.

TextIOWrapper "simply" reuses incremental encoders and decoders, and so use reset() and setstate() methods.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8796>
_______________________________________


More information about the Python-bugs-list mailing list