Python 3 encoding question: Read a filename from stdin, subsequently?open that filename

Marc Christiansen usenet at solar-empire.de
Tue Nov 30 05:06:45 EST 2010


Dan Stromberg <drsalists at gmail.com> wrote:
> I've got a couple of programs that read filenames from stdin, and then
> open those files and do things with them.  These programs sort of do
> the *ix xargs thing, without requiring xargs.
> 
> In Python 2, these work well.  Irrespective of how filenames are
> encoded, things are opened OK, because it's all just a stream of
> single byte characters.
> 
> In Python 3, I'm finding that I have encoding issues with characters
> with their high bit set.  Things are fine with strictly ASCII
> filenames.  With high-bit-set characters, even if I change stdin's
> encoding with:
> 
>       import io
>       STDIN = io.open(sys.stdin.fileno(), 'r', encoding='ISO-8859-1')
> 
> ...even with that, when I read a filename from stdin with a
> single-character Spanish n~, the program cannot open that filename
> because the n~ is apparently internally converted to two bytes, but
> remains one byte in the filesystem.  I decided to try ISO-8859-1 with
> Python 3, because I have a Java program that encountered a similar
> problem until I used en_US.ISO-8859-1 in an environment variable to
> set the JVM's encoding for stdin.
> 
> Python 2 shows the n~ as 0xf1 in an os.listdir('.').  Python 3 with an
> encoding of ISO-8859-1 wants it to be 0xc3 followed by 0xb1.
> 
> Does anyone know what I need to do to read filenames from stdin with
> Python 3.1 and subsequently open them, when some of those filenames
> include characters with their high bit set?
> 
> TIA!

Try using sys.stdin.buffer instead of sys.stdin. It gives you bytes
instead of strings. Also use byteliterals instead of stringliterals for
paths, i.e. os.listdir(b'.').

Marc



More information about the Python-list mailing list