Unicode (was: Old Man Yells At Cloud)

Paul Moore p.f.moore at gmail.com
Sun Sep 17 07:51:21 EDT 2017


On 17 September 2017 at 12:38, Leam Hall <leamhall at gmail.com> wrote:
> On 09/17/2017 07:25 AM, Steve D'Aprano wrote:
>>
>> On Sun, 17 Sep 2017 08:03 pm, Leam Hall wrote:
>>
>>> I'm still trying to figure out how to convert a string to unicode in
>>> Python 2.
>>
>>
>>
>> A Python 2 string is a string of bytes, so you need to know what encoding
>> they
>> are in. Let's assume you got them from a source using UTF-8. Then you
>> would do:
>>
>> mystring.decode('utf-8')
>>
>> and it will return a Unicode string of "code points" (think: more or less
>> characters).
>
>
>
> Still trying to keep this Py2 and Py3 compatible.
>
> The Py2 error is:
>         UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6'
>         in position 8: ordinal not in range(128)
>
> even when the string is manually converted:
>         name    = unicode(self.name)
>
> Same sort of issue with:
>         name    = self.name.decode('utf-8')
>
>
> Py3 doesn't like either version.

Your string is likely not UTF-8 with a character \xf6 in it. Maybe
it's latin-1? The key here is there's no way for Python (or any
program) to know the encoding of the byte string, so you have to tell
it.

Paul



More information about the Python-list mailing list