[Tutor] symbol encoding and processing problem
Kent Johnson
kent37 at tds.net
Wed Oct 17 18:07:49 CEST 2007
Evert Rol wrote:
>>> raw = unicode("125° 15' 5.55''", 'utf-8')
>> Again, I think this can be simplified to
>> raw = u"125° 15' 5.55''"
>
> It does, but it's getting confusing when I compare the following:
>
> >>> raw = u"125° 15' 5.55''"
> 125° 15' 5.55''
Where does that output come from?
>
> >>> print u"125° 15' 5.55''"
> UnicodeEncodeError: 'ascii' codec can't encode characters in position
> 3-4: ordinal not in range(128)
print must encode unicode strings. It tries to encode them using the
default encoding which doesnt' work because the source is not ascii.
>
> >>> print u"125° 15' 5.55''".encode('utf-8')
> 125° 15' 5.55''
That is the way to get it to work.
> >>> print unicode("125° 15' 5.55''")
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
> 3: ordinal not in range(128)
Here the problem is trying to create the unicode string using the
default encoding, again it doesn't work because the source contains
non-ascii characters.
> >>> print unicode("125° 15' 5.55''", 'utf-8')
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in
> position 3: ordinal not in range(128)
This is the same as the first encode error.
> So apart from the errors all being slightly different, is there
> perhaps some difference between the str() and repr() functions (looks
> like repr uses escape backslashes)?
Right.
> And checking the default encoding inside the python cmdline, I see
> that my sys module doesn't actually have a setdefaultencoding()
> method; was that something that should have been properly configured
> at compile time? The documentation mentions something about the site
> module, but I can't find it there either.
The setdefaultencoding() function (it's not a method, it is a
module-level function) is removed from the sys module as part of startup
(I think by the site module). That is why you have to call it from
sitecustomize.py. You can also
reload(sys)
to restore it but it's better to write your app so it doesn't require
the default encoding to be changed.
Kent
More information about the Tutor
mailing list