encode() question

Tue Jul 31 13:18:59 EDT 2007

En Tue, 31 Jul 2007 13:53:11 -0300, 7stud <bbxx789_05ss at yahoo.com>  
escribió:

> s1 = "hello"
> s2 = s1.encode("utf-8")
>
> s1 = "an accented 'e': \xc3\xa9"
> s2 = s1.encode("utf-8")
>
> The last line produces the error:
>
> ---
> Traceback (most recent call last):
>   File "test1.py", line 6, in ?
>     s2 = s1.encode("utf-8")
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
> 17: ordinal not in range(128)
> ---
>
> The error is a "decode" error, and as far as I can tell, decoding
> happens when you convert a regular string to a unicode string.  So, is
> there an implicit conversion taking place from s1 to a unicode string
> before encode() is called?  By what mechanism?

Converting from unicode characters into a string of bytes is the "encode"  
operation: unicode.encode() -> str
Converting from string of bytes to unicode characters is the "decode"  
operation: str.decode() -> unicode
str.decode and unicode.encode should NOT exist, or at least issue a  
warning (IMHO).
When you try to do str.encode, as the encode operation requires an unicode  
source, the string is first decoded using the default encoding - and fails.

-- 
Gabriel Genellina