Python 3 encoding question: Read a filename from stdin, subsequently open that filename

Dan Stromberg drsalists at gmail.com
Tue Nov 30 00:26:23 EST 2010


I've got a couple of programs that read filenames from stdin, and then
open those files and do things with them.  These programs sort of do
the *ix xargs thing, without requiring xargs.

In Python 2, these work well.  Irrespective of how filenames are
encoded, things are opened OK, because it's all just a stream of
single byte characters.

In Python 3, I'm finding that I have encoding issues with characters
with their high bit set.  Things are fine with strictly ASCII
filenames.  With high-bit-set characters, even if I change stdin's
encoding with:

      import io
      STDIN = io.open(sys.stdin.fileno(), 'r', encoding='ISO-8859-1')

...even with that, when I read a filename from stdin with a
single-character Spanish n~, the program cannot open that filename
because the n~ is apparently internally converted to two bytes, but
remains one byte in the filesystem.  I decided to try ISO-8859-1 with
Python 3, because I have a Java program that encountered a similar
problem until I used en_US.ISO-8859-1 in an environment variable to
set the JVM's encoding for stdin.

Python 2 shows the n~ as 0xf1 in an os.listdir('.').  Python 3 with an
encoding of ISO-8859-1 wants it to be 0xc3 followed by 0xb1.

Does anyone know what I need to do to read filenames from stdin with
Python 3.1 and subsequently open them, when some of those filenames
include characters with their high bit set?

TIA!



More information about the Python-list mailing list