Changing filenames from Greeklish => Greek (subprocess complain)

Cameron Simpson cs at zip.com.au
Thu Jun 6 06:24:16 EDT 2013


On 05Jun2013 11:43, =?utf-8?B?zp3Or866zr/PgiDOk866z4EzM866?= <nikos.gr33k at gmail.com> wrote:
| Τη Τετάρτη, 5 Ιουνίου 2013 9:32:15 μ.μ. UTC+3, ο χρήστης MRAB έγραψε:
| > Using Python, I think you could get the filenames using os.listdir, 
| > passing the directory name as a bytestring so that it'll return the
| > names as bytestrings.
| 
| > Then, for each name, you could decode from its current encoding and 
| > encode to UTF-8 and rename the file, passing the old and new paths to
| > os.rename as bytestrings.
| 
| Iam not sure i follow:
| 
| Change this:
| 
| # Compute a set of current fullpaths
| fullpaths = set()
| path = "/home/nikos/public_html/data/apps/"
| 
| for root, dirs, files in os.walk(path):
[...]

Have a read of this:

  http://docs.python.org/3/library/os.html#os.listdir

The UNIX API accepts bytes for filenames and paths.

Python 3 strs are sequences of Unicode code points. If you try to
open a file or directory on a UNIX system using a Python str, that
string must be converted to a sequence of bytes before being handed
to the OS.

This is done implicitly using your locale settings if you just use a str.

However, if you pass a bytes to open or listdir, this conversion
does not take place. You put bytes in and in the case of listdir
you get bytes out.

You can work on pathnames in bytes and never concern yourself with
encode/decode at all.

In this way you can write code that does not care about the translation
between Unicode and some arbitrary byte encoding.

Of course, the issue will still arise when accepting user input;
your shell has done exactly this kind of thing when you renamed
your MP3 file. But it is possible to write pure utility code that
doesn't care about filenames as Unicode or str if you work purely
in bytes.

Regarding user filenames, the common policy these days is to use
utf-8 throughout. Of course you need to get everything into that
regime to start with.
-- 
Cameron Simpson <cs at zip.com.au>

...but C++ gloggles the cheesewad, thus causing a type conflict.
        - David Jevans, jevans at apple.com



More information about the Python-list mailing list