Changing filenames from Greeklish => Greek (subprocess complain)

Michael Torrie torriem at gmail.com
Wed Jun 5 01:40:39 EDT 2013


On 06/04/2013 10:15 PM, Νικόλαος Κούρας wrote:
> One of my Greek filenames is "Ευχή του Ιησού.mp3". Just a Greek
> filename with spaces. Is there a problem when a filename contain both
> english and greek letters? Isn't it still a unicode string?
> 
> All i did in my CentOS was 'mv "Euxi tou Ihsou.mp3" "Ευχή του
> Ιησού.mp3"
> 
> and the displayed filename after 'ls -l' returned was:
> 
> is -rw-r--r-- 1 nikos nikos 3511233 Jun 4 14:11 \305\365\367\336\
> \364\357\365\ \311\347\363\357\375.mp3
> 
> There is no way at all to check the charset used to store it in hdd? 
> It should be UTF-8, but it doesn't look like it. Is there some linxu
> command or some python command that will print out the actual
> encoding of '\305\365\367\336\ \364\357\365\
> \311\347\363\357\375.mp3' ?

I can see that you are starting to understand things. I can't answer
your question (don't know the answer), but you're correct about one
thing.  A filename is just a sequence of bytes.  We'd hope it would be
utf-8, but it could be anything.  Even worse, it's not possible to tell
from a byte stream what encoding it is unless we just try one and see
what happens.  Text editors, for example, have to either make a guess
(utf-8 is a good one these days), or ask, or try to read from the first
line of the file using ascii and see if there's a source code character
set command to give it an idea.



More information about the Python-list mailing list