Unicode confusion
Mark Tolonen
M8R-yfto6h at mailinator.com
Tue Jul 15 03:03:51 EDT 2008
"Jerry Hill" <malaclypse2 at gmail.com> wrote in message
news:mailman.14.1216054283.922.python-list at python.org...
> On Mon, Jul 14, 2008 at 12:40 PM, Tim Cook <timothywayne.cook at gmail.com>
> wrote:
>> if I say units=unicode("°"). I get
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0:
>> ordinal not in range(128)
>>
>> If I try x=unicode.decode(x,'utf-8'). I get
>> TypeError: descriptor 'decode' requires a 'unicode' object but received
>> a 'str'
>>
>> What is the correct way to interpret these symbols that come to me as a
>> string?
>
> Part of it depends on where you're getting them from. If they are in
> your source code, just define them like this:
>
>>>> units = u"°"
>>>> print units
> °
>>>> print repr(units)
> u'\xb0'
>
> If they're coming from an external source, you have to know the
> encoding they're being sent in. Then you can decode them into
> unicode, like this:
>
>>>> units = "°"
>>>> unicode_units = units.decode('Latin-1')
>>>> print repr(unicode_units)
> u'\xb0'
>>>> print unicode_units
> °
>
> --
> Jerry
>
Even with source code you have to know the encoding. for pre-3.x, Python
defaults to ascii encoding for source files:
test.py contains:
units = u"°"
>>> import test
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "test.py", line 1
SyntaxError: Non-ASCII character '\xb0' in file test.py on line 1, but no
encoding declared; see http://www.python.org/peps/pep-0263.html for details
The encoding of the source file can be declared:
# coding: latin-1
units = u"°"
>>> import test
>>> test.units
u'\xb0'
>>> print test.units
°
Make sure to use the correct encoding! Here the file was saved in latin-1,
but declared utf8:
# coding: utf8
units = u"°"
>>> import test
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 0:
unexpected code byte
>>>
--
Mark
More information about the Python-list
mailing list