[Python-Dev] Decoding incomplete unicode

Walter Dörwald walter at livinglogic.de
Wed Aug 18 21:46:10 CEST 2004


M.-A. Lemburg wrote:

> Walter Dörwald wrote:
> 
>> M.-A. Lemburg wrote:
>>
>>> Overall, I don't like the idea of adding extra
>>> APIs breaking the existing codec API.
>>
>> Adding a final argument that defaults to False doesn't
>> break the API for the callers, only for the implementors.
>>
>> And if we drop the final argument, the API is completely
>> backwards compatible both for users and implementors.
>> The only thing that gets added is the feed() method,
>> that implementors don't have to overwrite.
>>
>>> I believe that we can
>>> extend stream codecs to also work in a feed mode without
>>> breaking the API.
>>
>> Abandoning the final argument and adding a feed() method
>> would IMHO be the simplest way to do this.
>>
>> But then there's no way to make sure that every byte from
>> the input stream is really consumed.
> 
> I've thought about this some more. Perhaps I'm still missing
> something, but wouldn't it be possible to add a feeding
> mode to the existing stream codecs by creating a new queue
> data type (much like the queue you have in the test cases of
> your patch) and using the stream codecs on these ?

No, because when the decode method encounters an incomplete
chunk (and so return a size that is smaller then size of the
input) read() would have to push the remaining bytes back into
the queue. This would be code similar in functionality
to the feed() method from the patch, with the difference that
the buffer lives in the queue not the StreamReader. So
we won't gain any code simplification by going this route.

> I think such a queue would be generally useful in other
> contexts as well, e.g. for implementing fast character based
> pipes between threads, non-Unicode feeding parsers, etc.
> Using such a type you could potentially add a feeding
> mode to stream or file-object API based algorithms very
> easily.

Yes, so we could put this Queue class into a module with
string utilities. Maybe string.py?

> We could then have a new class, e.g. FeedReader, which
> wraps the above in a nice API, much like StreamReaderWriter
> and StreamRecoder.

But why should we, when decode() does most of what we need,
and the rest has to be implemented in both versions?

Bye,
    Walter Dörwald




More information about the Python-Dev mailing list