Print encoding problems in console

Dan Stromberg drsalists at gmail.com
Fri Jul 15 20:46:30 EDT 2011


I've used the code below successfully to deal with such a problem when
outputting filenames.  Python2x3 is at
http://stromberg.dnsalias.org/svn/python2x3/ , but here it's just being used
to convert Python 3.x's byte strings to strings (to eliminate the b''
stuff), while on 2.x it's an identity function - if you're targeting 3.x
alone, there's no need to take a dependency on python2x3.

If you really do need to output such characters, rather than replacing them
with ?'s, you could use os.write() to filedescriptor 1 - that works in both
2.x and 3.x.

def ascii_ize(binary):
   '''Replace non-ASCII characters with question marks; otherwise writing to
sys.stdout tracebacks'''
   list_ = []
   question_mark_ordinal = ord('?')
   for ordinal in python2x3.binary_to_intlist(binary):
      if 0 <= ordinal <= 127:
         list_.append(ordinal)
      else:
         list_.append(question_mark_ordinal)
   return python2x3.intlist_to_binary(list_)


def output_filename(filename, add_eol=True):
   '''Output a filename to the tty (stdout), taking into account that some
tty's do not allow non-ASCII characters'''

   if sys.stdout.encoding == 'US-ASCII':
      converted = python2x3.binary_to_string(ascii_ize(filename))
   else:
      converted = python2x3.binary_to_string(filename)

   replaced = converted.replace('\n', '?').replace('\r', '?').replace('\t',
'?')

   sys.stdout.write(replaced)

   if add_eol:
      sys.stdout.write('\n')


On Fri, Jul 15, 2011 at 5:02 PM, Pedro Abranches <pedrof.abranches at gmail.com
> wrote:

> Hello everyone.
>
> I'm having a problem when outputing UTF-8 strings to a console.
> Let me show a simple example that explains it:
>
> $ python -c 'import sys; print sys.stdout.encoding; print u"\xe9"'
> UTF-8
> é
>
> It's everything ok.
> Now, if you're using your python script in some shell script you might have
> to store the output in some variable, like this:
>
> $ var=`python -c 'import sys; print sys.stdout.encoding; print u"\xe9"'`
>
> And what you get is:
>
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
> position 0: ordinal not in range(128)
>
> So, python is not being able to detect the encoding of the output in a
> situation like that, in which the python script is called not directly but
> around ``.
>
> Why does happen? Is there a way to solve it either by python or by shell
> code?
>
> Thanks,
> Pedro Abranches
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110715/97618548/attachment-0001.html>


More information about the Python-list mailing list