Changing filenames from Greeklish => Greek (subprocess complain)

MRAB python at mrabarnett.plus.com
Wed Jun 5 12:44:14 EDT 2013


On 05/06/2013 06:40, Michael Torrie wrote:
> On 06/04/2013 10:15 PM, Νικόλαος Κούρας wrote:
>> One of my Greek filenames is "Ευχή του Ιησού.mp3". Just a Greek
>> filename with spaces. Is there a problem when a filename contain both
>> english and greek letters? Isn't it still a unicode string?
>>
>> All i did in my CentOS was 'mv "Euxi tou Ihsou.mp3" "Ευχή του
>> Ιησού.mp3"
>>
>> and the displayed filename after 'ls -l' returned was:
>>
>> is -rw-r--r-- 1 nikos nikos 3511233 Jun 4 14:11 \305\365\367\336\
>> \364\357\365\ \311\347\363\357\375.mp3
>>
>> There is no way at all to check the charset used to store it in hdd?
>> It should be UTF-8, but it doesn't look like it. Is there some linxu
>> command or some python command that will print out the actual
>> encoding of '\305\365\367\336\ \364\357\365\
>> \311\347\363\357\375.mp3' ?
>
> I can see that you are starting to understand things. I can't answer
> your question (don't know the answer), but you're correct about one
> thing.  A filename is just a sequence of bytes.  We'd hope it would be
> utf-8, but it could be anything.  Even worse, it's not possible to tell
> from a byte stream what encoding it is unless we just try one and see
> what happens.  Text editors, for example, have to either make a guess
> (utf-8 is a good one these days), or ask, or try to read from the first
> line of the file using ascii and see if there's a source code character
> set command to give it an idea.
>
 From the previous posts I guessed that the filename might be encoded
using ISO-8859-7:

 >>> s = b"\305\365\367\336\ \364\357\365\ \311\347\363\357\375.mp3"
 >>> s.decode("iso-8859-7")
'Ευχή\\ του\\ Ιησού.mp3'

Yes, that looks the same.



More information about the Python-list mailing list