Python cannot print - Re: Unicode style in win32/PythonWin

Sun Jan 15 04:37:02 EST 2006

Neil Hodgson schrieb:
> Thomas Heller:
>
> > Hm, I don't know.  I try to avoid converting questionable characters at
> > all, if possible.  Then, it seems the error-mode doesn't seem to change
> > anything with "mbcs" encoding.  WinXP, Python 2.4.2 on the console:
> >
> >>>> u"abc\u034adef".encode("mbcs", "ignore")
> > 'abc?def'
> >>>> u"abc\u034adef".encode("mbcs", "strict")
> > 'abc?def'
> >>>> u"abc\u034adef".encode("mbcs", "error")
> > 'abc?def'

yes I know, thats why 'mbcs' can also be set in site(customize).py to
solve some of the problems discussed. (site.py mechanism doesn't allow
to set the mode as in ctypes

> > With "latin-1", it is different:
>
>     Yes, there are no 'ignore' or 'strict' modes for mbcs. It is a
> simple call to WideCharToMultiByte with no options set. 'ignore' may
> need two calls with different values of the default character to allow
> identification and removal of default characters as any given default
> character may also appear naturally in the output. 'strict' and 'error'
> would be easier to implement by checking both the return status and
> lpUsedDefaultChar which is set when any default character insertion is done.

But as discussed, I would not recommend this as encouragement to dig
for a real 'strict' or 'ignore' for mbcs.
('replace' also creates no invalid chars. both 'ignore' and 'replace'
change the stream and equality cannot be preserved by principle.)
Its a political discussion if the default mode should go through, or be
picky. (detailed in
<1137059888.538391.119110 at o13g2000cwo.googlegroups.com>)

Better change consciously to ('mbcs','replace').

The default behaviour of Python is a horror for any new Programmer and
a reason to quickly go away to mature unicode platforms like Java. It
takes many many hours to find out how everything depends in Python and
how to make simple print actions not break the application (especially,
when PythonWin is involved). This creates a lot of anger for users and
programmers. When strict converstion is really required for some
technical strings (very rare),  programmers are naturally very aware.

Be a new programmer and try:

>>> print '\n'.join( glob.glob(u'test/*') )
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "C:\PYTHON24\lib\encodings\cp850.py", line 18, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position
146-150:
 character maps to <undefined>
>>>

=> "What is this cp850.py and has to do with my (undefined?) files? A
very cice language, which cannot print by default... go to Java ...
Bye"

My recommendation is to use 'backslashreplace as default mode. Nobody
is angry when alien chars are printed in this style.

Robert