[Python-Dev] [Python-3000] Betas today - I hope

Fri Jun 13 12:14:49 CEST 2008

On 2008-06-13 11:32, Walter Dörwald wrote:
> M.-A. Lemburg wrote:
>> On 2008-06-12 16:59, Walter Dörwald wrote:
>>> M.-A. Lemburg wrote:
>>>> .transform() and .untransform() use the codecs to apply same-type
>>>> conversions. They do apply type checks to make sure that the
>>>> codec does indeed return the same type.
>>>>
>>>> E.g. text.transform('xml-escape') or data.transform('base64').
>>>
>>> So what would a base64 codec do with the errors argument?
>>
>> It could use it to e.g. try to recover as much data as possible
>> from broken input data.
>>
>> Currently (in Py2.x), it raises an exception if you pass in anything
>> but "strict".
>>
>>>>> I think for transformations we don't need the full codec machinery:
>>>>  > ...
>>>>
>>>> No need to invent another wheel :-) The codecs already exist for
>>>> Py2.x and can be used by the .encode()/.decode() methods in Py2.x
>>>> (where no type checks occur).
>>>
>>> By using a new API we could get rid of old warts. For example: Why 
>>> does the stateless encoder/decoder return how many input 
>>> characters/bytes it has consumed? It must consume *all* bytes anyway!
>>
>> No, it doesn't and that's the point in having those return values :-)
>>
>> Even though the encoder/decoders are stateless, that doesn't mean
>> they have to consume all input data. The caller is responsible to
>> make sure that all input data was in fact consumed.
>>
>> You could for example have a decoder that stops decoding after
>> having seen a block end indicator, e.g. a base64 line end or
>> XML closing element.
> 
> So how should the UTF-8 decoder know that it has to stop at a closing 
> XML element?

The UTF-8 decoder doesn't support this, but you could write a codec
that applies this kind of detection, e.g. to not try to decode
partial UTF-8 byte sequences at the end of input, which would then
result in error.

>> Just because all codecs that ship with Python always try to decode
>> the complete input doesn't mean that the feature isn't being used.
> 
> I know of no other code that does. Do you have an example for this use.

I already gave you a few examples.

>> The interface was designed to allow for the above situations.
> 
> Then could we at least have a new codec method that does:
> 
> def statelesencode(self, input):
>    (output, consumed) = self.encode(input)
>    assert len(input) == consumed
>    return output

You mean as method to the Codec class ?

Sure, we could do that, but please use a different name,
e.g. .encodeall() and .decodeall() - .encode() and .decode()
are already stateles (and so would the new methods be), so
"stateless" isn't all that meaningful in this context.

We could also add such a check to the PyCodec_Encode() and _Decode()
functions. They currently do not apply the above check.

In Python, those two functions are exposed as codecs.encode()
and codecs.decode().

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jun 13 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-07-07: EuroPython 2008, Vilnius, Lithuania            23 days to go

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::

    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611