Python 3 encoding question: Read a filename from stdin, subsequently open that filename

Antoine Pitrou solipsis at pitrou.net
Wed Dec 1 04:12:51 EST 2010


On Tue, 30 Nov 2010 16:57:57 -0800
Dan Stromberg <drsalists at gmail.com> wrote:
> >> --- On Tue, 11/30/10, Dan Stromberg <drsalists at gmail.com> wrote:
> >> > In Python 3, I'm finding that I have encoding issues with
> >> > characters
> >> > with their high bit set.  Things are fine with strictly
> >> > ASCII
> >> > filenames.  With high-bit-set characters, even if I
> >> > change stdin's
> >> > encoding with:
> >>
[...]
> 
> I have the same problem using 3.2alpha4: the word man~ana (6
> characters long) in a filename causes problems (I'm catching the
> exception and skipping the file for now) despite using what I believe
> is an 8-bit, all 256-bytes-are-characters encoding: iso-8859-1.  'not
> sure if you wanted both of us to try this, or Yingjie alone though.

What do sys.stdin.encoding and sys.getfilesystemencoding() return? If
they are different, then it's the cause of the problem, since
sys.getfilesystemencoding() is used by open() to encode filenames.
In this case, the solution is to encode filenames yourself using
sys.stdin.encoding, or read them as bytes directly from
sys.stdin.buffer (which is the binary non-unicode counterpart of
sys.stdin).

If they are the same, then I guess you can open an issue, provided you
give enough indications for people to reproduce :)

Regards

Antoine.



More information about the Python-list mailing list