Python 3 encoding question: Read a filename from stdin, subsequently open that filename

Peter Otten __peter__ at web.de
Wed Dec 1 04:34:24 EST 2010


Nobody wrote:

> Python 3.x's decision to treat filenames (and environment variables) as
> text even on Unix is, in short, a bug. One which, IMNSHO, will mean that
> Python 2.x is still around when Python 4 is released.

For filenames in Python 3 the user has the choice between "text" (str) and 
bytes. If the user chooses text that will be converted to bytes using a 
default encoding that hopefully matches that of the other tools on the 
machine that manipulate filenames. 

I see that you may run into problems with the text approach when you 
encounter byte sequences that are illegal in the chosen encoding.
I therefore expect that lowlevel tools will use bytes to manipulate 
filenames while end user scripts will choose text.

I don't see how a dogmatic bytes only restriction can improve the situation.

Also, you can already provide unicode filenames in Python 2.x (and a script 
containing constant filenames becomes more portable if you do), so IMHO the 
situation in Python 2 and 3 is similar enough as to not hinder adoption of 
3.x.

Peter




More information about the Python-list mailing list