[Python-Dev] [Python-3000] Betas today - I hope

Fri Jun 13 15:26:47 CEST 2008

M.-A. Lemburg wrote:
> On 2008-06-13 11:32, Walter Dörwald wrote:
>> M.-A. Lemburg wrote:
>>> On 2008-06-12 16:59, Walter Dörwald wrote:
>>>> M.-A. Lemburg wrote:
>>>>> .transform() and .untransform() use the codecs to apply same-type
>>>>> conversions. They do apply type checks to make sure that the
>>>>> codec does indeed return the same type.
>>>>>
>>>>> E.g. text.transform('xml-escape') or data.transform('base64').
>>>>
>>>> So what would a base64 codec do with the errors argument?
>>>
>>> It could use it to e.g. try to recover as much data as possible
>>> from broken input data.
>>>
>>> Currently (in Py2.x), it raises an exception if you pass in anything
>>> but "strict".
>>>
>>>>>> I think for transformations we don't need the full codec machinery:
>>>>>  > ...
>>>>>
>>>>> No need to invent another wheel :-) The codecs already exist for
>>>>> Py2.x and can be used by the .encode()/.decode() methods in Py2.x
>>>>> (where no type checks occur).
>>>>
>>>> By using a new API we could get rid of old warts. For example: Why 
>>>> does the stateless encoder/decoder return how many input 
>>>> characters/bytes it has consumed? It must consume *all* bytes anyway!
>>>
>>> No, it doesn't and that's the point in having those return values :-)
>>>
>>> Even though the encoder/decoders are stateless, that doesn't mean
>>> they have to consume all input data. The caller is responsible to
>>> make sure that all input data was in fact consumed.
>>>
>>> You could for example have a decoder that stops decoding after
>>> having seen a block end indicator, e.g. a base64 line end or
>>> XML closing element.
>>
>> So how should the UTF-8 decoder know that it has to stop at a closing 
>> XML element?
> 
> The UTF-8 decoder doesn't support this, but you could write a codec
> that applies this kind of detection, e.g. to not try to decode
> partial UTF-8 byte sequences at the end of input, which would then
> result in error.
> 
>>> Just because all codecs that ship with Python always try to decode
>>> the complete input doesn't mean that the feature isn't being used.
>>
>> I know of no other code that does. Do you have an example for this use.
> 
> I already gave you a few examples.

Maybe I was unclear, I meant real world examples, not hypothetical ones.

>>> The interface was designed to allow for the above situations.
>>
>> Then could we at least have a new codec method that does:
>>
>> def statelesencode(self, input):
>>    (output, consumed) = self.encode(input)
>>    assert len(input) == consumed
>>    return output
> 
> You mean as method to the Codec class ?

No, I meant as a method for the CodecInfo clas.

> Sure, we could do that, but please use a different name,
> e.g. .encodeall() and .decodeall() - .encode() and .decode()
> are already stateles (and so would the new methods be), so
> "stateless" isn't all that meaningful in this context.

I like the names encodeall/decodeall!

> We could also add such a check to the PyCodec_Encode() and _Decode()
> functions. They currently do not apply the above check.
> 
> In Python, those two functions are exposed as codecs.encode()
> and codecs.decode().

This change will probably have to wait for the 2.7 cycle.

Servus,
    Walter