(Fucking) Unicode: console print statement and PythonWin: replacement for off-table chars HOWTO?

Robert kxroberto at googlemail.com
Tue Jan 10 15:09:17 EST 2006


gregarican wrote:
> Robert wrote:
>
> > (windows or linux console)
> >
> > >>> print u'\u034a'
> >
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in ?
> >   File "C:\PYTHON23\lib\encodings\cp850.py", line 18, in encode
> >     return codecs.charmap_encode(input,errors,encoding_map)
> > UnicodeEncodeError: 'charmap' codec can't encode character u'\u034a' in
> > position
> >  0: character maps to <undefined>
>
> Are you certain that this is a valid unicode character? Checking other
> values (like \u0020 which is a blank space) seems to work okay. What
> does \u034A represent?

yes, its delivered by filesystem:
>>> glob.glob(u'test/*')[3]
u'sytest3\\\u041f\u043e\u0448\u0443\u043a.txt'

u'\u043a' is cyrillic:  к

no matter, I guess no (small) system can know all unicode ranges in use
wordwide. The real problem is: to get a smoot, smart an tolerant setup
by default - not a mixup of 4 codecs and (most bothersome) intolerant
exception-breaks on simple tty-/win-outputs.

How to do this best and most tolerant to
platform/(python-)installation?

Robert




More information about the Python-list mailing list