Changing filenames from Greeklish => Greek (subprocess complain)

Steven D'Aprano steve+comp.lang.python at pearwood.info
Wed Jun 5 02:03:41 EDT 2013


On Tue, 04 Jun 2013 21:15:23 -0700, Νικόλαος Κούρας wrote:

> One of my Greek filenames is "Ευχή του Ιησού.mp3". Just a Greek filename
> with spaces.
> Is there a problem when a filename contain both english and greek
> letters? Isn't it still a unicode string?

No problem, and Unicode includes both English and Greek letters.


> All i did in my CentOS was 'mv "Euxi tou Ihsou.mp3" "Ευχή του Ιησού.mp3"

That's not what you wrote earlier. You said you used FileZilla to 
transfer the files from Windows 8.


> and the displayed filename after 'ls -l' returned was:
> 
> is -rw-r--r-- 1 nikos nikos 3511233 Jun 4 14:11 \305\365\367\336\
> \364\357\365\ \311\347\363\357\375.mp3
> 
> There is no way at all to check the charset used to store it in hdd? It
> should be UTF-8, but it doesn't look like it. Is there some linxu
> command or some python command that will print out the actual encoding
> of '\305\365\367\336\ \364\357\365\ \311\347\363\357\375.mp3' ?

You have misunderstood.

The Linux file system does not track encodings. It just stores bytes.

There is no *reliable* way to guess the encoding that a bunch of bytes 
came from. If your bytes look like 

0x48 0x65 0x6c 0x6c 0x6f 0x20 0x77 0x6f 0x72 0x6c 0x64 0x21

(ASCII "Hello World!") then you might *guess* that the encoding is ASCII, 
or UTF-8, or Latin-1. But in general, you can't go from the bytes to the 
encoding. Encodings are out-of-band information.


-- 
Steven



More information about the Python-list mailing list