Unicode

Vlastimil Brom vlastimil.brom at gmail.com
Mon Dec 17 05:55:10 EST 2012


2012/12/17 Anatoli Hristov <tolidtm at gmail.com>:
>> if you only see encoding problems on printing results to your
>> terminal, its settings or unicode capability might be the cause,
>> however, if you also get badly encoding items in the database, you are
>> likely using an inappropriate encoding in some step.
>
> I get badly encoding into my DB
>
>> you seem to be doing something like the following (explicitly or
>> partly implicitly, based on your system defaults):
>>
>>>>> print u"étroits, en utilisant un portable extrêmement puissant".encode("utf-8").decode("windows-1252")
>> étroits, en utilisant un portable extrêmement puissant
>>>>>
>>
>> i.e. encode a text using utf-8 and handling it like windows-1252
>> afterwards (or take an already encoded text and decode it with the
>> inappropriate ANSI encoding.
>
> Thank you Vlastimil,
>
> I tried to print it as you sholed mr, but I receive an erro:
>>>> print u"étroits, en utilisant un portable extrêmement puissant".encode("utf-8").decode("windows-1252")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeEncodeError: 'latin-1' codec can't encode character u'\u0192'
> in position 1: ordinal not in range(256)
>>>>

Hi,
this seems to be an encoding error of your terminal on printing.
You may need to describe (or better post the respective parts of the
source) where the text is coming from (external text file, database
entry, harcoded in the python source ...), how it is stored, retrieved
and possibly manipulated before you insert it to the database.

You may try to print a repr(...) of the string to be inserted to the
database to see, whether it isn't already mangled in some previous
part of the processing.

hth,

    vbr



More information about the Python-list mailing list