(A Possible Solution) Re: preferred way to set encoding for print

~flow wolfgang.lipp at gmail.com
Wed Sep 16 15:39:55 EDT 2009


On Sep 16, 7:16 am, "Mark Tolonen" <metolone+gm... at gmail.com> wrote:
> Setting PYTHONIOENCODING overrides the encoding used for stdin/stdout/stderr
> (See the Python help for details), but if your terminal doesn't support the
> encoding that won't help.

thx for these two tips. of course, that was a bit misleading by me to
complain that a cp850 terminal can't display chinese characters from
python----it cannot do it all, of course.

i've gone on to experiment. what i do not want is python to stop
execution when an encoding error occurs on printing and perhaps
logging. so far, i used to do this by convincing python to use utf-8
in any and all cases, and then live with the amount garbish that
appears on screen when using cp850 and cp1252 terminals.

what has changed in python is that they now somehow find out about the
terminal's encoding, and then put that encoding into place and defend
it with teeth and claws. it is simply not easy to take control of that
setting.

this is in itself unfortunate; i believe that users should have a
right to determine what to do in case of stdout encoding problems.
these are a little different from i-wrote-to-that-file-and-boom
experiences. *there* the encoding exception is fully warranted, and
could be easily fixed by allowing a less-than-strict encoding mode.

but print is different, and of all situations where encoding errors
can occur, this is the hardest to take hold of. and much more so in
python3 it seems than in python2.

printing to the screen is often purely meta-informative in nature, a
side-effect e.g. of a webserver really doing web pages. i don't want
to bring my entire system down just because some output into some
terminal in the back orifice produced a some amount of grabish. maybe
only a single chinese character amongst thousands of done this done
that red tape.

i think web browsers are a good example here. i don't know whether it
was a good idea to let clients reassemble broken web pages in an order
as they see fit, but the policy to just output broken encoding
character instances instead of terminating the browser process with a
lengthy stacktrace was probably somehow good for the poopularity of
the web as we know it.

my current patch looks like this:

  class Stdout_writer_with ncrs( object ):

    def write( self, p ):
      """See to it that all write encodings are done using numerical
character references (NCRs) that
      circumvents Python’s default behavior of raising an exception
whenever it encounters an
      unrepresentable character while printing."""
      enc   = sys.__stdout__.encoding
      p     = p if isinstance( p, str ) else str( p )
      p     = p.encode( enc, 'xmlcharrefreplace' ).decode( enc )
      sys.__stdout__.write( p )

  sys.stdout = Stdout_writer_with ncrs()

this method picks up anything to be printed, makes sure it is a text,
and then encodes it to the terminal encoding using numerical character
references (NCRs), then decodes it again since the underlying wrapper
class wants to do encodings itself and refuses bytes in place of
strings to be sent (again, this is not nice: an array of byte values
sent to the print method is a clear request to send exactly those
bytes, verbatim, one by one, to the terminal. no mucking around with
my bytes, pls! maybe i can implement that in the code above, too.)

of course, this simplistic scaffold will break if anyone uses
sys.stdout for anything but issue sys.stdout.write(), but so far it
has worked fine despite of being a defective, tiny shim. maybe
inheriting from sys.stdout.__class__ would help.









> "_wolf" <wolfgang.l... at gmail.com> wrote in message
>
> news:22991c72-d00f-45cd-9bf7-0b80fc4319bd at k26g2000vbp.googlegroups.com...
>
>
>
> > hi folks,
>
> > i am doing my first steps in the wonderful world of python 3.
>
> > some things are good.
> > some things have to be relearned.
> > some things drive me crazy.
>
> > sadly, i'm working on a windows box. which, in germany, entails that
> > python thinks it to be a good idea to take cp1252 as the default
> > encoding.
>
> > so just coz i got my box in germany means i can never print out a
> > chinese character? say what?
>
> > i have no troubles with people configuring their python installation
> > to use any encoding in the world, but wouldn't it have been less of a
> > surprise to just assume utf-8 for any file in/output? after all, it is
> > already the default for python source files as far as i understand.
> > someone might think they're clever to sniff into the system and make
> > the somehwat educated guess that this dude's using cp1252 for his
> > files. but they would be wrong.
>
> > so: how can i tell python, in a configuration or using a setting in
> > sitecustomize.py, or similar, to use utf-8 as a default encoding?
> > there used to be a trick to say `reload(sys);sys.setdefaultencoding
> > ('utf-8')`, but that has no effect in py3.0.1. also, i cannot set
> > `sys.stdout.encoding`; is there a way to re-open that stream with a
> > different encoding?
>
> > in all, i believe it is quite unsettling to me to see that, on my py3
> > installation,
>
> > sys.getdefaultencoding() == 'utf-8'
> > sys.stdout.encoding == 'cp1252'
> > locale.getlocale() == (None, None)
> > locale.getdefaultlocale() == ('de_DE', 'cp1252')
>
> > which to me makes as much sense as a blackcurrant tart thrown into
> > space. worse,
>
> > locale.setlocale( locale.LC_ALL, locale.getdefaultlocale() )
>
> > results in
>
> > locale.Error: unsupported locale setting
>
> > this bloody thing doesn't accept its *own* output. attempts to feed
> > that locale beast with anything but the empty string or 'C' were all
> > doomed. it would take a very patient and eloquent person to explain
> > that in a credible fashion to me. my word for this is, 'broken'.
>
> > i would very much like to rid myself of these considerations. just say
> > it's all utf-8, wash'n'go.
>
> > my attempts of changing python's mind using the locale module have
> > failed so far. otherwise, i for one don't want to touch that locale
> > thing with a very long pole. as far as i can see, it does not work as
> > documented. the platform dependencies are also a clear OFF LIMITS sign
> > to me.
>
> > any suggestions?
>
> What specifically do you want to do?  I work with Chinese all the time on a
> U.S. Windows system.  Do you want to print Chinese characters in a console
> window?  In a Python IDE?  FYI, I don't use the locale module for much at
> all.
>
> I can't type or print Chinese to a console window unless I change Control
> Panel, Regional and Language Options, Advanced Tab, Language for non-Unicode
> Programs to a Chinese selection (and reboot).  Then the default
> sys.stdout.encoding is something like cp936.
>
> The Pythonwin IDE in the latest version of pywin32, however, supports UTF-8
> in its interactive window and displays Chinese fine.
>
>
> Let me know what you're trying to do.
>
> -Mark




More information about the Python-list mailing list