[Python-Dev] Decoding incomplete unicode

M.-A. Lemburg mal at egenix.com
Wed Aug 18 22:07:15 CEST 2004


Walter Dörwald wrote:
>> I've thought about this some more. Perhaps I'm still missing
>> something, but wouldn't it be possible to add a feeding
>> mode to the existing stream codecs by creating a new queue
>> data type (much like the queue you have in the test cases of
>> your patch) and using the stream codecs on these ?
> 
> No, because when the decode method encounters an incomplete
> chunk (and so return a size that is smaller then size of the
> input) read() would have to push the remaining bytes back into
> the queue. This would be code similar in functionality
> to the feed() method from the patch, with the difference that
> the buffer lives in the queue not the StreamReader. So
> we won't gain any code simplification by going this route.

Maybe not code simplification, but the APIs will be well-
separated.

If we require the queue type for feeding mode operation
we are free to define whatever APIs are needed to communicate
between the codec and the queue type, e.g. we could define
a method that pushes a few bytes back onto the queue end
(much like ungetc() in C).

>> I think such a queue would be generally useful in other
>> contexts as well, e.g. for implementing fast character based
>> pipes between threads, non-Unicode feeding parsers, etc.
>> Using such a type you could potentially add a feeding
>> mode to stream or file-object API based algorithms very
>> easily.
> 
> Yes, so we could put this Queue class into a module with
> string utilities. Maybe string.py?

Hmm, I think a separate module would be better since we
could then recode the implementation in C at some point
(and after the API has settled).

We'd only need a new name for it, e.g. StreamQueue or
something.

>> We could then have a new class, e.g. FeedReader, which
>> wraps the above in a nice API, much like StreamReaderWriter
>> and StreamRecoder.
> 
> But why should we, when decode() does most of what we need,
> and the rest has to be implemented in both versions?

To hide the details from the user. It should be possible
to instantiate one of these StreamQueueReaders (named
after the queue type) and simply use it in feeding
mode without having to bother about the details behind
the implementation.

StreamReaderWriter and StreamRecoder exist for the same
reason.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 18 2004)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::


More information about the Python-Dev mailing list