right adjusted strings containing umlauts

Thu Aug 8 13:47:11 EDT 2013

Kurt Mueller wrote:

> Now I have this small example:
> ----------------------------------------------------------
> #!/usr/bin/env python
> # vim: set fileencoding=utf-8 :
>
> from __future__ import print_function
> import sys, shlex
>
> print( repr( sys.stdin.encoding ) )
>
> strg_form = u'{0:>3} {1:>3} {2:>3} {3:>3} {4:>3}'
> for inpt_line in sys.stdin:
>     proc_line = shlex.split( inpt_line, False, True, )
>     encoding = "utf-8"
>     proc_line = [ strg.decode( encoding ) for strg in proc_line ]
>     print( strg_form.format( *proc_line ) )
> ----------------------------------------------------------
>
> $ echo -e "a b c d e\na ö u 1 2" | file -
> /dev/stdin: UTF-8 Unicode text
> $ echo -e "a b c d e\na ö u 1 2" | ./align_compact.py
> None
>   a   b   c   d   e
>   a   ö   u   1   2
> $ echo -e "a b c d e\na ö u 1 2" | recode utf8..latin9 | file -
> /dev/stdin: ISO-8859 text
> $ echo -e "a b c d e\na ö u 1 2" | recode utf8..latin9 | ./align_compact.py
> None
>   a   b   c   d   e
> Traceback (most recent call last):
>   File "./align_compact.py", line 13, in <module>
>     proc_line = [ strg.decode( encoding ) for strg in proc_line ]
>   File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
>     return codecs.utf_8_decode(input, errors, True)
> UnicodeDecodeError: 'utf8' codec can't decode byte 0xf6 in position 0: invalid start byte
> muk at mcp20:/sw/prog/scripts/text_manip>
>
> How do I handle this two inputs?
>

Once you're using pipes, you've given up any hope that the terminal will
report a useful encoding, so I'm not surprised you're getting None for
sys.stdin.encoding()

So you can either do as others have suggested, and guess, or you can get
the information explicitly, say from argv.  In any case you'll need a
different way to assign   encoding = 

-- 
DaveA