[Patches] [ python-Patches-1101097 ] Feed style codec API

Thu Feb 9 16:56:23 CET 2006

Patches item #1101097, was opened at 2005-01-12 19:14
Message generated for change (Comment added) made by doerwalter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1101097&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: M.-A. Lemburg (lemburg)
Summary: Feed style codec API

Initial Comment:
The attached patch implements a feed style codec API by
adding feed methods to StreamReader and StreamWriter
(see SF patch #998993 for a history of this issue).

----------------------------------------------------------------------

>Comment By: Walter Dörwald (doerwalter)
Date: 2006-02-09 16:56

Message:
Logged In: YES 
user_id=89016

Looking at PEP 342 I think the natural name for this method
would be send(). It does exactly what send() does for
generators: in sends data into the codec, which processes
it, returns a result and keeps state for the next call.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2006-01-12 16:41

Message:
Logged In: YES 
user_id=89016

Basically what I want to have is a decoupling of the
stateful encoding/decoding from the stream API.

An example: Suppose I have a generator:

def foo():
   yield u"Hello"
   yield u"World"

I want to wrap this generator into another generator that
does a stateful encoding of the strings from the first
generator:

def encode(it, encoding, errors):
   writer = codecs.getwriter(encoding)(None, errors)
   for data in it:
      yield writer.feed(data)

for x in encode(foo(), "utf-16", "strict"):
   print repr(x)

'\xff\xfeH\x00e\x00l\x00l\x00o\x00'
'W\x00o\x00r\x00l\x00d\x00'

The writer itself shouldn't write anything to the stream (in
fact, there is no stream), it should just encode what it
gets fed and spit out the result.

The reason why StreamWriter.feed() is implemented the way it
is, is that currently there are no Python encodings where
encode(string)[1] != len(string). If we want to handle that
case the StreamWriter would have to grow a charbuffer.
Should I add that to the patch?

For decoding I want the same functionality:

def blocks(name, size=8192):
   f = open(name, "rb")
   while True:
      data = f.read(size)
      if data:
         yield data
      else:
         break

def decode(it, encoding, errors):
   reader = codecs.getreader(encoding)(None, errors)
   for data in it:
      yield reader.feed(data)

decode(blocks("foo.xml"))

Again, here the StreamReader doesn't read for a stream, it
just decodes what it gets fed and spits it back out.

I'm not attached to the name "feed". Of course the natural
choice for the method names would be "encode" and "decode",
but those are already taken. Would "handle" or "convert" be
better names?

I don't know what the "this" refers to in "This is not what
your versions implement". If "this" refers to "The idea is
to allow incremental processing", this is exactly what the
patch tries to achieve: Incremental processing without tying
this processing to a stream API. If "this" refers to "feed
style APIs usually take data and store it in the object's
state" that's true, but that's not the purpose of the patch,
so maybe the name *is* misleading.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2006-01-12 15:20

Message:
Logged In: YES 
user_id=38388

I don't like the name of the methods, since feed style APIs
usually take data and store in the object's state whereas
the method you are suggesting is merely an encode method
that takes the current state into account. The idea is to
allow incremental processing.

This is not what your versions implement.

The StreamWriter would have to grow buffering for this.
The .feed() method on the StreamReader would have to be
adjusted to store the input in the .charbuffer only and not
return anything.

If you just want to make the code easier to follow, I'd
suggest you use private methods, e.g. ._stateful_encode()
and ._stateful_decode() - which is what these method do
implement.

Please also explain "If only the \method{feed()} method is
used, \var{stream} will be ignored and can be
\constant{None}.". I don't see this being true - .write()
will still require a .stream object.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2006-01-11 22:48

Message:
Logged In: YES 
user_id=89016

The second version of the patch is updated for the current
svn head and includes patches to the documentation.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1101097&group_id=5470