[Python-Dev] Adding .decode() method to Unicode

M.-A. Lemburg mal@lemburg.com
Tue, 12 Jun 2001 09:09:05 +0200


"Martin v. Loewis" wrote:
> 
> > I would like to add a .decode() method to Unicode objects and also
> > enable the builtin unicode() to accept Unicode object as input.
> 
> -1. What is this good for?

See below :)
 
> > While this may seem useless for the currently available encodings,
> > it does have some use for codecs which recode Unicode to Unicode,
> > e.g. codecs which do XML escaping or Unicode compression.
> 
> I still can see the value. If you think the codec API is good for such
> transformation, why not use it? I.e.
> 
> enc,dec,_,_ = codecs.lookup("compress-form-foo")
> s = dec(s)

Sure and that's the point. I would like to add the .decode()
method to make this just as simple as encoding Unicode to UTF-8.
Note that strings already have this method:

str.encode()
str.decode()
uni.encode()
#uni.decode() # still missing
 
> Furthermore, this seems like a form of hypergeneralization. If you
> have this, why not also add
> 
> s = s.decode("capitalize") # instead of s.capitalize()
> i = s.decode("int")        # instead of int(s)

No, that's not the intention.

One very useful application for this method is XML unescaping
which turns numeric XML entities into Unicode chars. Others
are Unicode decompression (using the Unicode compression algorithm)
and certain forms of Unicode normalization.

The key argument for these interfaces is that they provide
an extensible transformation mechanism for string and binary
data. 

> > Any objections ?
> 
> Yes, I think this should not be added.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/