[Python-Dev] Adding .decode() method to Unicode

Paul Prescod paulp@ActiveState.com
Tue, 12 Jun 2001 11:51:25 -0700


"Martin v. Loewis" wrote:
> 
>...
> 
> Why is that? An encoding, by nature, is something that produces a byte
> sequence from some input. So you can only decode byte sequences, not
> character strings.

According to this logic, it is not logical to "encode" a Unicode string
into a base64'd Unicode string or "decode" a Unicode string from a
base64'd Unicode string. But I have seen circumstances where one XML
document is base64'd into another. In that circumstance, it would be
useful to say node.nodeValue.decode("base64").

Let me turn the argument around? What would the *harm* in having 8-bit
strings and Unicode strings behave similarly in this manner?

>...
> Not at all. Byte strings and character strings are as different as are
> byte strings and lists of DOM child nodes (i.e. the only common thing
> is that they are sequences).

8-bit strings are not purely byte strings. They are also "character
strings". That's why they have methods like "capitalize", "isalpha",
"lower", "swapcase", "title" and so forth. DOM nodes and byte strings
have virtually no methods in common.

We could argue angels on the head of a pin until the cows come home but
90% of all Python users think of 8-bit strings as strings of characters.
So arguments based on the idea that they are not "really" character
strings are wishful thinking.

-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook