Print encoding problems in console
Dan Stromberg
drsalists at gmail.com
Fri Jul 15 20:46:30 EDT 2011
I've used the code below successfully to deal with such a problem when
outputting filenames. Python2x3 is at
http://stromberg.dnsalias.org/svn/python2x3/ , but here it's just being used
to convert Python 3.x's byte strings to strings (to eliminate the b''
stuff), while on 2.x it's an identity function - if you're targeting 3.x
alone, there's no need to take a dependency on python2x3.
If you really do need to output such characters, rather than replacing them
with ?'s, you could use os.write() to filedescriptor 1 - that works in both
2.x and 3.x.
def ascii_ize(binary):
'''Replace non-ASCII characters with question marks; otherwise writing to
sys.stdout tracebacks'''
list_ = []
question_mark_ordinal = ord('?')
for ordinal in python2x3.binary_to_intlist(binary):
if 0 <= ordinal <= 127:
list_.append(ordinal)
else:
list_.append(question_mark_ordinal)
return python2x3.intlist_to_binary(list_)
def output_filename(filename, add_eol=True):
'''Output a filename to the tty (stdout), taking into account that some
tty's do not allow non-ASCII characters'''
if sys.stdout.encoding == 'US-ASCII':
converted = python2x3.binary_to_string(ascii_ize(filename))
else:
converted = python2x3.binary_to_string(filename)
replaced = converted.replace('\n', '?').replace('\r', '?').replace('\t',
'?')
sys.stdout.write(replaced)
if add_eol:
sys.stdout.write('\n')
On Fri, Jul 15, 2011 at 5:02 PM, Pedro Abranches <pedrof.abranches at gmail.com
> wrote:
> Hello everyone.
>
> I'm having a problem when outputing UTF-8 strings to a console.
> Let me show a simple example that explains it:
>
> $ python -c 'import sys; print sys.stdout.encoding; print u"\xe9"'
> UTF-8
> é
>
> It's everything ok.
> Now, if you're using your python script in some shell script you might have
> to store the output in some variable, like this:
>
> $ var=`python -c 'import sys; print sys.stdout.encoding; print u"\xe9"'`
>
> And what you get is:
>
> Traceback (most recent call last):
> File "<string>", line 1, in <module>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
> position 0: ordinal not in range(128)
>
> So, python is not being able to detect the encoding of the output in a
> situation like that, in which the python script is called not directly but
> around ``.
>
> Why does happen? Is there a way to solve it either by python or by shell
> code?
>
> Thanks,
> Pedro Abranches
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110715/97618548/attachment-0001.html>
More information about the Python-list
mailing list