Convert a list with wrong encoding to utf8

Gregory Ewing greg.ewing at canterbury.ac.nz
Fri Feb 15 02:27:34 EST 2019


vergos.nikolas at gmail.com wrote:
> I just tried:
> 
> names = tuple( [s.encode('latin1').decode('utf8') for s in names] )
> 
> but i get
> UnicodeEncodeError('latin-1', 'Άκης Τσιάμης', 0, 4, 'ordinal not in range(256)')

This suggests that the string you're getting from the database *has*
already been correctly decoded, and there is no need to go through the
latin1 re-coding step.

What do you get if you do

    print(names)

immediately *before* trying to re-code them?

What *may* be happening is that most of your data is stored in the
database encoded as utf-8, but some of it is actually using a different
encoding, and you're getting confused by the resulting inconsistencies.

I suggest you look carefully at *all* the names in the list, straight
after getting them from the database. If some of them look okay and
some of them look like mojibake, then you have bad data in the database
in the form of inconsistent encodings.

-- 
Greg



More information about the Python-list mailing list