This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Feed style codec API
Type: Stage:
Components: Library (Lib) Versions:
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: lemburg Nosy List: doerwalter, lemburg
Priority: normal Keywords: patch

Created on 2005-01-12 18:14 by doerwalter, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
diff.txt doerwalter, 2005-01-12 18:14
diff2.txt doerwalter, 2006-01-11 21:48
Messages (8)
msg47521 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2005-01-12 18:14
The attached patch implements a feed style codec API by
adding feed methods to StreamReader and StreamWriter
(see SF patch #998993 for a history of this issue).
msg47522 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2006-01-11 21:48
Logged In: YES 
user_id=89016

The second version of the patch is updated for the current
svn head and includes patches to the documentation.
msg47523 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2006-01-12 14:20
Logged In: YES 
user_id=38388

I don't like the name of the methods, since feed style APIs
usually take data and store in the object's state whereas
the method you are suggesting is merely an encode method
that takes the current state into account. The idea is to
allow incremental processing.

This is not what your versions implement.

The StreamWriter would have to grow buffering for this.
The .feed() method on the StreamReader would have to be
adjusted to store the input in the .charbuffer only and not
return anything.

If you just want to make the code easier to follow, I'd
suggest you use private methods, e.g. ._stateful_encode()
and ._stateful_decode() - which is what these method do
implement.

Please also explain "If only the \method{feed()} method is
used, \var{stream} will be ignored and can be
\constant{None}.". I don't see this being true - .write()
will still require a .stream object.

msg47524 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2006-01-12 15:41
Logged In: YES 
user_id=89016

Basically what I want to have is a decoupling of the
stateful encoding/decoding from the stream API.

An example: Suppose I have a generator:

def foo():
   yield u"Hello"
   yield u"World"

I want to wrap this generator into another generator that
does a stateful encoding of the strings from the first
generator:

def encode(it, encoding, errors):
   writer = codecs.getwriter(encoding)(None, errors)
   for data in it:
      yield writer.feed(data)

for x in encode(foo(), "utf-16", "strict"):
   print repr(x)

'\xff\xfeH\x00e\x00l\x00l\x00o\x00'
'W\x00o\x00r\x00l\x00d\x00'

The writer itself shouldn't write anything to the stream (in
fact, there is no stream), it should just encode what it
gets fed and spit out the result.

The reason why StreamWriter.feed() is implemented the way it
is, is that currently there are no Python encodings where
encode(string)[1] != len(string). If we want to handle that
case the StreamWriter would have to grow a charbuffer.
Should I add that to the patch?

For decoding I want the same functionality:

def blocks(name, size=8192):
   f = open(name, "rb")
   while True:
      data = f.read(size)
      if data:
         yield data
      else:
         break

def decode(it, encoding, errors):
   reader = codecs.getreader(encoding)(None, errors)
   for data in it:
      yield reader.feed(data)

decode(blocks("foo.xml"))

Again, here the StreamReader doesn't read for a stream, it
just decodes what it gets fed and spits it back out.

I'm not attached to the name "feed". Of course the natural
choice for the method names would be "encode" and "decode",
but those are already taken. Would "handle" or "convert" be
better names?

I don't know what the "this" refers to in "This is not what
your versions implement". If "this" refers to "The idea is
to allow incremental processing", this is exactly what the
patch tries to achieve: Incremental processing without tying
this processing to a stream API. If "this" refers to "feed
style APIs usually take data and store it in the object's
state" that's true, but that's not the purpose of the patch,
so maybe the name *is* misleading.
msg47525 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2006-02-09 15:56
Logged In: YES 
user_id=89016

Looking at PEP 342 I think the natural name for this method
would be send(). It does exactly what send() does for
generators: in sends data into the codec, which processes
it, returns a result and keeps state for the next call.
msg47526 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2006-02-09 17:58
Logged In: YES 
user_id=38388

I can see your point in wanting a way to use the stateful
encoding/decoding, but still don't understand why you
have to sidestep the stream API for doing this.

Wouldn't using a StringIO buffer as stream be the more
natural choice for the writer and for the reader (StringIO
supports Unicode as well). 

You can then use the standard  .write() API to "send" in the
data and the .getvalue() method on the StringIO buffer to
fetch the results. For the reader, you'd write to the
StringIO buffer and then fetch the results using the
standard .read() API.

This is how you'd normally use a file or stream IO based API
in a string context and it doesn't require adding methods to
the StreamReader/Writer API. I'm not opposed to adding new
methods, but you see, the whole point of StreamReader/Writer
is to read from and write to streams. If you just want a
stateful encoder/decoder it would be better to create a
separate implementation for that, say
StatefulEncoder/StatefulDecoder (which could then be used by
the StreamReader/Writer).
msg47527 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2006-02-11 19:50
Logged In: YES 
user_id=89016

> I can see your point in wanting a way to use the stateful
> encoding/decoding, but still don't understand why you
> have to sidestep the stream API for doing this.
>
> Wouldn't using a StringIO buffer as stream be the more
> natural choice for the writer and for the reader (StringIO
> supports Unicode as well).
>
> You can then use the standard  .write() API to "send"
> in the
> data and the .getvalue() method on the StringIO buffer to
> fetch the results.

This doesn't work, because getvalue() doesn't remove the
bytes from the buffer:

import codecs, StringIO
stream = StringIO.StringIO()
writer = codecs.getwriter("utf-16")(stream)
for c in u"foo":
   writer.write(c)
   print repr(stream.getvalue())

This prints:

'\xff\xfef\x00'
'\xff\xfef\x00o\x00'
'\xff\xfef\x00o\x00o\x00'

instead of
'\xff\xfef\x00'
'o\x00'
'o\x00'


> For the reader, you'd write to the
> StringIO buffer and then fetch the results using the
> standard .read() API.

This doesn't work either because the StringIO buffer doesn't
keep separate read and write positions:

import codecs, StringIO
stream = StringIO.StringIO()
reader = codecs.getreader("utf-16")(stream)
for c in u"foo".encode("utf-16"):
   stream.write(c)
   print repr(reader.read())

This outputs:
u''
u''
u''
u''
u''
u''
u''
u''

because after the write() call the read() call done trough
reader.read() reads from the end of the buffer.

BTW, we have been through this before, see:
http://mail.python.org/pipermail/python-dev/2004-July/046497.html


> This is how you'd normally use a file or stream IO based
> API
> in a string context and it doesn't require adding methods
> to
> the StreamReader/Writer API. I'm not opposed to adding new
> methods, but you see, the whole point of StreamReader/Writer
> is to read from and write to streams. If you just want a
> stateful encoder/decoder it would be better to create a
> separate implementation for that, say
> StatefulEncoder/StatefulDecoder (which could then be used by
> the StreamReader/Writer).

See
http://mail.python.org/pipermail/python-dev/2004-August/047568.html
for a proposal. I *do* have a patch lying around that
implements part of that (i.e. codecs.lookup() returns
stateful encoders/decoders instead of stream
readers/writers), but IMHO this patch is IMHO much to
pervasive. We can have the same effect with a small patch to
codecs.py.
msg47528 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2006-02-17 16:16
Logged In: YES 
user_id=38388

See
http://mail.python.org/pipermail/python-dev/2006-February/061230.html
for details why I'm rejecting this patch.
History
Date User Action Args
2022-04-11 14:56:09adminsetgithub: 41432
2005-01-12 18:14:42doerwaltercreate