Unicode

Sun Sep 17 08:44:24 EDT 2017

On 09/17/2017 08:30 AM, Chris Angelico wrote:
> On Sun, Sep 17, 2017 at 9:38 PM, Leam Hall <leamhall at gmail.com> wrote:
>> Still trying to keep this Py2 and Py3 compatible.
>>
>> The Py2 error is:
>>          UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6'
>>          in position 8: ordinal not in range(128)
>>
>> even when the string is manually converted:
>>          name    = unicode(self.name)
>>
>> Same sort of issue with:
>>          name    = self.name.decode('utf-8')
>>
>>
>> Py3 doesn't like either version.
> 
> You got a Unicode *EN*code error when you tried to *DE* code. That's a
> quirk of Py2's coercion behaviours, so the error's a bit obscure, but
> it means that you (most likely) actually have a Unicode string
> already. Check what type(self.name) is, and see if the problem is
> actually somewhere else.
> 
> (It's hard to give more specific advice based on this tiny snippet, sorry.)
> 
> ChrisA
> 

Chris, thanks! I see what you mean.

The string source is a SQLite3 database with a bunch of names. Some have 
non-ASCII characters. The database is using varchar which seems to be 
utf-8, utf-16be or utf-16le. I probably need to purge the data.

What I find interesting is that utf-8 works in the Ruby script that 
pulls from the same database. That's what makes me think it's utf-8.

I've tried different things in lines 45 and 61.

https://gist.github.com/LeamHall/054f9915af17dc1b1a33597b9b45d2da

Leam