[Python-3000] [Python-3000-checkins] r54742 - in python/branches/p3yk/Lib: io.py test/test_io.py

Walter Dörwald walter at livinglogic.de
Thu Apr 12 10:10:13 CEST 2007


Guido van Rossum wrote:
> On 4/11/07, Walter Dörwald <walter at livinglogic.de> wrote:
>> Would it make sense to make the state of the decoder public, e.g. by
>> adding setstate() and getstate() methods? This would give a cleaner API.
> 
> I've been thinking of the same thing!
> 
> I wonder if it would be possible to return the state as a pair
> (unread, flags) where unread is a (byte) string of unprocessed bytes
> and flags is some other state, with the constraint that in the initial
> state the flags must be zero. Then I can optimize the case where flags
> is returned as zero by subtracting len(unread) from the current
> position and that'd be the correct seek position.

I'd say that bytestream.tell() is the correct position.

Or should seek() return to the last position where the codec was in a
default state without anything buffered? (This can't work for UTF-16,
because the codec almost never is in the default state.)

> I imagine most
> decoders have only very few flags they care about. (The worst might be
> the utf-16 decoder which must have a flag to remember whether it
> already saw a byte order marker, and another indicating the byte
> order. Maybe we'll have to special-case that one, so don't worry too
> much about it.)
> 
>> Should I work on a patch?
> 
> That would be great!

OK, here's the patch: http://bugs.python.org/1698994

The state returned from getstate() should be treated as an opaque value
(e.g. for the buffered incremental codecs it is the buffered string, for
the UTF-16 encoder it's the flag indicating whether a BOM has been
written etc.). The codecs try to return None, if they are in some kind
of default state (e.g. there's nothing buffered).

I'm going to add tests next.

Servus,
   Walter



More information about the Python-3000 mailing list