[Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

Tue May 24 12:14:10 CEST 2011

Victor Stinner wrote:
> Le mardi 24 mai 2011 à 10:03 +0200, M.-A. Lemburg a écrit :
>> Please read PEP 100 regarding StreamReader and StreamWriter.
>> Those codecs parts were explicitly designed to be stateful,
>> unlike the stateless encoder/decoder methods.
> 
> Yes, it is possible to implement stateful StreamReader and StreamWriter
> classes and we have such codecs (I gave the example of UTF-16), but the
> state is not exposed (getstate / setstate), and so it's not possible to
> write generic code to handle the codec state in the base StreamReader
> and StreamWriter classes. io.TextIOWrapper requires encoder.setstate(0)
> for example.

So instead of always suggesting to deprecate everything,
how about you come up with a proposal to add meaningful
new methods to those base classes ?

>> Each codec can, however, implement variants which are optimized
>> for the specific encoding or intercept certain stream methods
>> to add functionality or improve the encoding/decoding
>> performance.
> 
> Can you give me some examples?

See the UTF-16 codec in the stdlib for example. This uses
some of the available possibilities to interpret the BOM mark
and then switches the encoder/decoder methods accordingly.

A lot more could be done for other variable length encoding
codecs, e.g. UTF-8, since these often have problems near
the end of a read due to missing bytes.

The base class implementation provides a general purpose
implementation to cover the case, but it's not efficient,
since it doesn't know anything about the encoding
characteristics.

Such an implementation would have to be done per codec
and that's why we have per codec StreamReader/Writer
APIs.

>> TextIOWrapper and StreamReaderWriter are merely wrappers
>> around streams that make use of the codecs. They don't
>> provide any codec logic themselves. That's the conceptual
>> difference.
>> ...
>> StreamReader and StreamWriters ... work efficiently and
>> directly on streams rather than buffers.
> 
> StreamReader, StreamWriter, TextIOWrapper and StreamReaderWriter all
> have a file-like API: tell(), seek(), read(),  readline(), write(), etc.
> The implementation is maybe different, but the API is just the same, and
> so the usecases are just the same.
> 
> I don't see in which case I should use StreamReader or StreamWriter
> instead TextIOWrapper. I thought that TextIOWrapper is specific to files
> on disk, but TextIOWrapper is already used for other usages like
> sockets.

I have no idea why TextIOWrapper was added to the stdlib
instead of making StreamReaderWriter more capable,
since StreamReaderWriter had already been available in Python
since Python 1.6 (and this is being used by codecs.open()).

Perhaps we should deprecate TextIOWrapper instead and
replace it with codecs.StreamReaderWriter ? ;-)

Seriously, I don't see use of TextIOWrapper as an argument
for removing StreamReader/Writer parts of the codecs API.

>> Here's my reply from the ticket regarding using incremental
>> encoders/decoders for the StreamReader/Writer parts of the
>> codec set of APIs:
>>
>> """
>> The point about having them use incremental codecs for encoding and
>> decoding is a good one and would
>> need to be investigated. If possible, we could use incremental
>> encoders/decoders for the standard
>> StreamReader/Writer base classes or add new
>> IncrementalStreamReader/Writer classes which then use
>> the IncrementalEncode/Decoder per default.
> 
> Why do you want to write a duplicate feature? TextIOWrapper is already
> here, it's working and widely used.

See above and please also try to understand why we have per-codec
implementations for streams. I'm tired of repeating myself.

I would much prefer to see the codec-specific functionality
in TextIOWrapper added back to the codecs where it
belongs.

> I am working on codec issues (like CJK encodings, see #12100, #12057,
> #12016) and I would like to remove StreamReader and StreamWriter to have
> *less* code to maintain.
>
> If you want to add more code, will be available to maintain it? It looks
> like you are busy, some people (not me ;-)) are still
> waiting .transform()/.untransform()!

I dropped the ball on the idea after the strong wave of
comments against those methods. People will simply have
to use codecs.encode() and codecs.decode().

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 24 2011)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2011-06-20: EuroPython 2011, Florence, Italy               27 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/