Unicode failure

Mon Dec 7 05:48:35 EST 2015

On Sun, 6 Dec 2015 at 23:11 Quivis <quivis at domain.invalid> wrote:

> On Fri, 04 Dec 2015 13:07:38 -0500, D'Arcy J.M. Cain wrote:
>
> > I thought that going to Python 3.4 would solve my Unicode issues but it
> > seems I still don't understand this stuff.  Here is my script.
> >
> > #! /usr/bin/python3 # -*- coding: UTF-8 -*-
> > import sys print(sys.getdefaultencoding())
> > print(u"\N{TRADE MARK SIGN}")
> >
> > And here is my output.
> >
> > utf-8 Traceback (most recent call last):
> >   File "./g", line 5, in <module>
> >     print(u"\N{TRADE MARK SIGN}")
> > UnicodeEncodeError: 'ascii' codec can't encode character '\u2122' in
> > position 0: ordinal not in range(128)
>
> Hmmmm, interesting:
>
> Python 2.7.3 (default, Jun 22 2015, 19:43:34)
> [GCC 4.6.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import sys
> >>> print sys.getdefaultencoding()
> ascii
> >>> print u'\N{TRADE MARK SIGN}'
> ™
>
>
sys.getdefaultencoding() returns the default encoding used when opening a
file if an encoding is not explicitly given in the open call. What matters
here is the encoding associated with stdout which is sys.stdout.encoding.

$ python2.7 -c 'import sys; print(sys.stdout.encoding); print(u"\u2122")'
UTF-8
™

$ LANG=C python2.7 -c 'import sys; print(sys.stdout.encoding);
print(u"\u2122")'
ANSI_X3.4-1968
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in
position 0: ordinal not in range(128)

--
Oscar