unicode to human readable format

Sat Dec 28 02:48:37 EST 2013

Le vendredi 27 décembre 2013 12:37:17 UTC+1, Steven D'Aprano a écrit :
> tomasz.kaczorek at gmail.com wrote:
> 
> 
> 
> > hello,
> 
> > can I ask you for help? when I try to print s[0] i vane the message:
> 
> > UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1:
> 
> > ordinal not in range(128). how to solve my problem, please?
> 
> 
> 
> What version of Python?
> 
> 
> 
> What operating system?
> 
> 
> 
> What environment are you running in? IDLE? The shell or cmd.exe? Powershell?
> 
> xterm? Something else?
> 
> 
> 
> Please copy and paste the complete traceback, starting from the line
> 
> 
> 
>     Traceback (most recent call last):
> 
> 
> 
> to the end.
> 
> 
> 
> Please print repr(s[0]) and show us the output.
> 
> 

What do you expect?
The representation is - and should be -

>>> print repr(s[0])
u'\u0105\u017c\u0119\u0142\u0144'

independently of the tool one uses to process such
a code.

Now, if one prints s[0], the result may be - and should be -
different from the tool.

win console, cp850

>>> print s[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\python27\lib\encodings\cp850.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-4: cha
racter maps to <undefined>
>>>

win console, cp1252

>>> print s[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\python27\lib\encodings\cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-4: cha
racter maps to <undefined>
>>>

win console, cp1250

>>> s = [u'\u0105\u017c\u0119\u0142\u0144']
>>> print s[0]
ążęłń
>>>

SciTE editor, output pane "locale", cp1252 for me.

Traceback (most recent call last):
  File "utrick.py", line 18, in <module>
    print u'\u0105\u017c\u0119\u0142\u0144'
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)
>Exit code: 1

SciTE editor, output pane 65001

Traceback (most recent call last):
  File "utrick.py", line 18, in <module>
    print u'\u0105\u017c\u0119\u0142\u0144'
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)
>Exit code: 1

Now in IDLE, Western European version of Windows, 
one get this

>>> print s[0]
ążęłń

Note, by chance it is printing something. It may
come it does not print, understand, render chars
at all. *This is wrong*.

My interactive interpreter I wrote for Py2.*
(full of dirty tricks).

>>> print repr(s[0])
u'\u0105\u017c\u0119\u0142\u0144'
>>> print s[0]
?????

*This is correct*, it is an expected result and it
works for all chars.

A (the) correct way to print s[0] with a console (all
platforms).

>>> print s[0].encode(sys.stdout.encoding, 'replace')
?????
>>>

See the another thread about printing repr().

jmf