Changing filenames from Greeklish => Greek (subprocess complain)

Νικόλαος Κούρας nikos.gr33k at gmail.com
Wed Jun 5 02:05:33 EDT 2013


Τη Τετάρτη, 5 Ιουνίου 2013 8:40:39 π.μ. UTC+3, ο χρήστης Michael Torrie έγραψε:
> On 06/04/2013 10:15 PM, Νικόλαος Κούρας wrote:
> 
> > One of my Greek filenames is "Ευχή του Ιησού.mp3". Just a Greek
> 
> > filename with spaces. Is there a problem when a filename contain both
> 
> > english and greek letters? Isn't it still a unicode string?
> 
> > 
> 
> > All i did in my CentOS was 'mv "Euxi tou Ihsou.mp3" "Ευχή του
> 
> > Ιησού.mp3"
> 
> > 
> 
> > and the displayed filename after 'ls -l' returned was:
> 
> > 
> 
> > is -rw-r--r-- 1 nikos nikos 3511233 Jun 4 14:11 \305\365\367\336\
> 
> > \364\357\365\ \311\347\363\357\375.mp3
> 
> > 
> 
> > There is no way at all to check the charset used to store it in hdd? 
> 
> > It should be UTF-8, but it doesn't look like it. Is there some linxu
> 
> > command or some python command that will print out the actual
> 
> > encoding of '\305\365\367\336\ \364\357\365\
> 
> > \311\347\363\357\375.mp3' ?
> 
> 
> 
> I can see that you are starting to understand things. I can't answer
> your question (don't know the answer), but you're correct about one
> thing.  A filename is just a sequence of bytes.  We'd hope it would be
> utf-8, but it could be anything.  Even worse, it's not possible to tell
> from a byte stream what encoding it is unless we just try one and see
> what happens.  Text editors, for example, have to either make a guess
> (utf-8 is a good one these days), or ask, or try to read from the first
> line of the file using ascii and see if there's a source code character 
> set command to give it an idea.


Um, is there a way even if we don't actually know the encoding CentOS used to store the filename to hdd to tell Python to just open the bytestream as it is?

I don't know if its possible, but iam looking for a way to skip the encoding, since we have now way of knowing what this is.

This is very weird because:


nikos at superhost.gr [~]# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
nikos at superhost.gr [~]#

all i did it was a simple rename from english to greek. Since locale is set to use utf8, shouldnt the result in the hdd be an utf-8 bytestream?




More information about the Python-list mailing list