[Python-Dev] Stateful codecs [Was: str object going in Py3K]

Walter Dörwald walter at livinglogic.de
Fri Feb 17 17:44:56 CET 2006


M.-A. Lemburg wrote:
> Walter Dörwald wrote:
>> M.-A. Lemburg wrote:
>>> Walter Dörwald wrote:
 >>>> [...]
>>>> So maybe
>>>> codecs.lookup() should return an instance of a subclass of tuple which
>>>> has the StatefulEncoder/Decoder as attributes. But then codecs.lookup()
>>>> must be able to handle old 4-tuples returned by old search functions and
>>>> update those to the new 6-tuples. (But we could drop this again after
>>>> several releases, once all third party codecs are updated).
>>> This was a design error: I should have not made
>>> codecs.lookup() a documented function.
>>>
>>> I'd suggest we keep codecs.lookup() the way it is and
>>> instead add new functions to the codecs module, e.g.
>>> codecs.getencoderobject() and codecs.getdecoderobject().
>>>
>>> Changing the codec registration is not much of a problem:
>>> we could simply allow 6-tuples to be passed into the
>>> registry.
>> OK, so codecs.lookup() returns 4-tuples, but the registry stores
>> 6-tuples and the search functions must return 6-tuples. And we add
>> codecs.getencoderobject() and codecs.getdecoderobject() as well as new
>> classes codecs.StatefulEncoder and codecs.StatefulDecoder. What about
>> old search functions that return 4-tuples?
> 
> The registry should then simply set the missing entries to None
> and the getencoderobject()/getdecoderobject() would then have
> to raise an error.

Sounds simple enough and we don't loose backwards compatibility.

> Perhaps we should also deprecate codecs.lookup() in Py 2.5 ?!

+1, but I'd like to have a replacement for this, i.e. a function that 
returns all info the registry has about an encoding:

1. Name
2. Encoder function
3. Decoder function
4. Stateful encoder factory
5. Stateful decoder factory
6. Stream writer factory
7. Stream reader factory

and if this is an object with attributes, we won't have any problems if 
we extend it in the future.

BTW, if we change the API, can we fix the return value of the stateless 
functions? As the stateless function always encodes/decodes the complete 
string, returning the length of the string doesn't make sense. 
codecs.getencoder() and codecs.getdecoder() would have to continue to 
return the old variant of the functions, but 
codecs.getinfo("latin-1").encoder would be the new encoding function.

Bye,
    Walter Dörwald


More information about the Python-Dev mailing list