[Baypiggies] Unicode woes using print and redirects...

Shannon -jj Behrens jjinux at gmail.com
Sat Aug 22 02:24:20 CEST 2009


On Fri, Aug 21, 2009 at 4:56 PM, Mitch Patenaude<patenaude at gmail.com> wrote:
> I have a problem when outputting some strings using the print builtin.
>  If there is a unicode specific string it does the right thing as long
> as stdout hasn't been redirected.
>
> I have tired to use the codec.EncodedFile to wrap sys.stdout and try
> to get it to recode the output, or fool it into thinking that stdout
> can handle utf8 in either case, but that only causes *both* cases to
> fail, even though I pass in either errors='ignore' or
> errors='replace'.  I'm stumped.  Can anyone enlighten me?
>
> Details:
> When I run it from the command line without redirection it works fine:
> mitch at phobos:~/src/pylib/twarkov$ ./enc_test.py
> isatty
> Encoding: UTF-8
> foo⑵
>
> Yay!
>
> but when I redirect the output at all, it fails:
> mitch at phobos:~/src/pylib/twarkov$ ./enc_test.py | cat
> notatty
> Encoding: None
> Damnit!  'ascii' codec can't encode character u'\u2475' in position 3:
> ordinal not in range(128)
>
> enc_test.py:
> #!/usr/bin/python
>
> import sys
> import codecs
>
> ttyout = sys.stderr
>
> if sys.stdout.isatty():
>  ttyout.write('isatty\n')
> else:
>  ttyout.write('notatty\n')
>
> ttyout.write('Encoding: %s\n' % sys.stdout.encoding)
>
> fooout = codecs.EncodedFile(sys.stdout, 'utf-8',
> file_encoding='utf-8', errors='ignore')
>
> trouble=u'foo\u2475\n'
>
> try:
>  # fooout.write(trouble)
>  print trouble
>  ttyout.write('Yay!\n')
> except UnicodeEncodeError, e:
>  ttyout.write('Damnit!  %s\n' % (e,))
>
> mitch at phobos:~/src/pylib/twarkov$ uname -a
> Linux phobos 2.6.24-24-generic #1 SMP Tue Aug 18 17:04:53 UTC 2009
> i686 GNU/Linux
> mitch at phobos:~/src/pylib/twarkov$ cat /etc/lsb-release
> DISTRIB_ID=Ubuntu
> DISTRIB_RELEASE=8.04
> DISTRIB_CODENAME=hardy
> DISTRIB_DESCRIPTION="Ubuntu 8.04.3 LTS"
> mitch at phobos:~/src/pylib/twarkov$ locale
> LANG=en_US.UTF-8
> LC_CTYPE="en_US.UTF-8"
> LC_NUMERIC="en_US.UTF-8"
> LC_TIME="en_US.UTF-8"
> LC_COLLATE="en_US.UTF-8"
> LC_MONETARY="en_US.UTF-8"
> LC_MESSAGES="en_US.UTF-8"
> LC_PAPER="en_US.UTF-8"
> LC_NAME="en_US.UTF-8"
> LC_ADDRESS="en_US.UTF-8"
> LC_TELEPHONE="en_US.UTF-8"
> LC_MEASUREMENT="en_US.UTF-8"
> LC_IDENTIFICATION="en_US.UTF-8"
> LC_ALL=

I don't know the answer, but does it work if you switch to using write?

-jj

-- 
In this life we cannot do great things. We can only do small things
with great love. -- Mother Teresa
http://jjinux.blogspot.com/


More information about the Baypiggies mailing list