[Python-Dev] Draft PEP: Deprecate codecs.StreamReader and codecs.StreamWriter

Thu Jul 7 14:08:45 CEST 2011

On Thu, Jul 7, 2011 at 8:53 PM, Vinay Sajip <vinay_sajip at yahoo.co.uk> wrote:
> I've no issue with telling people to use open() rather than codecs.open() when
> moving code from 2.x to 3.x. But in 2.x, is there any other API which allows you
> to wrap arbitrary streams? If not, then ISTM that removing the Stream* classes
> would give 2.x->3.x porting projects more trouble than codecs.open() -> open().

No, using the io module is a far more robust way to wrap arbitrary
streams than using the codecs module.

It's unfortunate that nobody pointed out the redundancy when PEP 3116
was discussed and implemented, as I expect PEP 100 would have been
updated and the Stream* APIs would have been either reused or
officially jettisoned as part of the Py3k migration.

However, we're now in a situation where we have:

1. A robust Unicode capable IO implementation (the io module, based on
PEP 3116) that is available in both 2.x and 3.x that is designed to
minimise the amount of work involved in writing new codecs
2. A legacy IO implementation (the codecs module) that is available in
both 2.x and 3.x, but requires additional work on the part of codec
authors and isn't as robust as the PEP 3116 implementation

So the options are:

A. Bring the codecs module IO implementation up to the standard of the
io module implementation (less the C acceleration) and maintain the
requirement that codec authors provide StreamReader and StreamWriter
implementations.

B. Retain the full codecs module API, but reimplement relevant parts
in terms of the io module.

C. Deprecate the codecs.Stream* interfaces and make codecs.open() a
simple wrapper around the builtin open() function. Formally drop the
requirement that codec authors provide StreamReader/Writer instances
(since they are not used by the core IO implementation)

Currently, nobody has stepped forward to do the work of maintaining
the codecs IO implementation independently of the io module, so the
only two options seriously on the table are B and C. That may change
if someone actually goes through and *fixes* all the error cases that
are handled correctly by the io module but not by the codecs module
and credibly promises to continue to do so for at least the life of
3.3.

A 2to3 fixer that simply changes "codecs.open" to "open" is not
viable, as the function signatures are not compatible (the buffering
parameter appears in a different location):
    codecs.open(): open(filename, mode='rb', encoding=None,
errors='strict', buffering=1)
    3.x builtin open(): open(file, mode='r', buffering=-1,
encoding=None, errors=None, newline=None, closefd=True)

Now, the backported io module does make it possible to write correct
code as far back as 2.6 that can be forward ported cleanly to 3.x
without requiring code modifications. However, it would be nice to
transparently upgrade code that uses codecs.open to the core IO
implementation in 3.x. For people new to Python, the parallel (and
currently deficient) alternative IO implementation also qualifies at
the very least as an attractive nuisance.

Now, it may be that this PEP runs afoul of Guido's stated preference
not to introduce any more backwards incompatibilities between 2.x and
3.x that aren't absolutely essential. In that case, it may be
reasonable to add an option D to the mix, where we just add
documentation notes telling people not to use the affected codecs
module APIs and officially declare that bug reports on those APIs will
be handled with "don't use these, use the io module instead", as that
would also deal with the maintenance problem. It's pretty ugly from an
end user's point of view, though.

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia