[Python-Dev] Decoding incomplete unicode

M.-A. Lemburg mal at egenix.com
Thu Aug 19 14:25:55 CEST 2004


Hye-Shik Chang wrote:
> On Thu, 19 Aug 2004 12:29:12 +0200, M.-A. Lemburg <mal at egenix.com> wrote:
> 
>>Walter Dörwald wrote:
>>
>>>Without the feed method(), we need the following:
>>>
>>>1) A StreamQueue class that
>>>   a) supports writing at one end and reading at the other end
>>>   b) has a method for pushing back unused bytes to be returned
>>>      in the next call to read()
>>
>>Right.
>>
>>It also needs a method giving the number of pending bytes in
>>the queue or just an API .has_pending_data() that returns
>>True/False.
>>
> 
> 
> +1 for adding .has_pending_data() stuff.  But it'll need a way to
> flush pending data out for encodings where incomplete sequence not
> always invalid. <wink> This is true for JIS X 0213 encodings.
> 
> 
>>>>u'\u00e6'.encode('euc-jisx0213')
> 
> '\xa9\xdc'
> 
>>>>u'\u3000'.encode('euc-jisx0213')
> 
> '\xa1\xa1'
> 
>>>>u'\u00e6\u0300'.encode('euc-jisx0213')
> 
> '\xab\xc4'


I'm not sure I understand. The queue will also have an .unread()
method (or similiar) to write data back into the queue at the
reading head position. Are you suggesting that we add a .truncate()
method to truncate the read buffer at the current position ?

Since the queue will be in memory, we can also add .writeseek()
and .readseek() if that helps.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 19 2004)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::


More information about the Python-Dev mailing list