[Tutor] symbol encoding and processing problem

Kent Johnson kent37 at tds.net
Wed Oct 17 18:07:49 CEST 2007


Evert Rol wrote:
>>> raw = unicode("125° 15' 5.55''", 'utf-8')
>> Again, I think this can be simplified to
>>    raw = u"125° 15' 5.55''"
> 
> It does, but it's getting confusing when I compare the following:
> 
>  >>> raw = u"125° 15' 5.55''"
> 125° 15' 5.55''

Where does that output come from?
> 
>  >>> print u"125° 15' 5.55''"
> UnicodeEncodeError: 'ascii' codec can't encode characters in position  
> 3-4: ordinal not in range(128)

print must encode unicode strings. It tries to encode them using the 
default encoding which doesnt' work because the source is not ascii.
> 
>  >>> print u"125° 15' 5.55''".encode('utf-8')
> 125° 15' 5.55''

That is the way to get it to work.

>  >>> print unicode("125° 15' 5.55''")
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position  
> 3: ordinal not in range(128)

Here the problem is trying to create the unicode string using the 
default encoding, again it doesn't work because the source contains 
non-ascii characters.

>  >>> print unicode("125° 15' 5.55''", 'utf-8')
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in  
> position 3: ordinal not in range(128)

This is the same as the first encode error.

> So apart from the errors all being slightly different, is there  
> perhaps some difference between the str() and repr() functions (looks  
> like repr uses escape backslashes)?

Right.

> And checking the default encoding inside the python cmdline, I see  
> that my sys module doesn't actually have a setdefaultencoding()  
> method; was that something that should have been properly configured  
> at compile time? The documentation mentions something about the site  
> module, but I can't find it there either.

The setdefaultencoding() function (it's not a method, it is a 
module-level function) is removed from the sys module as part of startup 
(I think by the site module). That is why you have to call it from 
sitecustomize.py. You can also
   reload(sys)
to restore it but it's better to write your app so it doesn't require 
the default encoding to be changed.

Kent


More information about the Tutor mailing list