Changing filenames from Greeklish => Greek (subprocess complain)

Νικόλαος Κούρας nikos.gr33k at gmail.com
Wed Jun 5 02:40:19 EDT 2013


Τη Τετάρτη, 5 Ιουνίου 2013 9:03:41 π.μ. UTC+3, ο χρήστης Steven D'Aprano έγραψε:
Nikos wrote:
> > and the displayed filename after 'ls -l' returned was:
> > is -rw-r--r-- 1 nikos nikos 3511233 Jun 4 14:11 \305\365\367\336\
> > \364\357\365\ \311\347\363\357\375.mp3

> > There is no way at all to check the charset used to store it in hdd? It
> > should be UTF-8, but it doesn't look like it. Is there some linxu
> > command or some python command that will print out the actual encoding
> > of '\305\365\367\336\ \364\357\365\ \311\347\363\357\375.mp3' ?

> The Linux file system does not track encodings. It just stores bytes.
> There is no *reliable* way to guess the encoding that a bunch of bytes  
> came from. If your bytes look like 

> 0x48 0x65 0x6c 0x6c 0x6f 0x20 0x77 0x6f 0x72 0x6c 0x64 0x21

> (ASCII "Hello World!") then you might *guess* that the encoding is ASCII, 
> or UTF-8, or Latin-1. But in general, you can't go from the bytes to the 
> encoding. Encodings are out-of-band information.


Your explanation of encoding/decoding is excellent and iam storing this Steven!
So what i understand now is:

encoding = string -> (some charset used) -> charset bytes
decoding = bytes -> (have to know what charset has been used) -> original string

Have i understtod corrctly, that the *key* to the whole encode/decode process is the charset used/has to be used?

string = 'Ευχή του Ιησού.mp3'
abive string in unknown charset bytes = '\305\365\367\336\364\357\365\ \311\347\363\357\375.mp3'

We dont know they key(charset) used, but we do know the original form of the string, so it occured to me that if we write a python script to decode the mojabike bytestream to all available charsets then as some point the original string will appear back!


Won't you agree steven? Of course if that is likeley to work i don't know how to write it.


Hre is the comamnds you asked.
-----------------------------------------
nikos at superhost.gr [~/www/data/apps]# printf %q\n\n *
100\ Mythoi\ tou\ Aiswpou.pdfnnAnekdotologio.exennBattleship.exenn$'\323\352\335                                                                                        \370\357\365 \335\355\341\355 \341\361\351\350\354\374.exe'nnKosmas\ o\ Aitwlos\                                                                                         -\ Profiteies.pdfnnLuxor\ Evolved.exennMonopoly.exenn$'\305\365\367\336 \364\35                                                                                        7\365 \311\347\363\357\375.mp3'nnOnline\ Movie\ Player.zipnnO\ Nomos\ tou\ Merfy                                                                                        \ v1-2-3.zipnnOrthodoxo\ Imerologio.exennPac-Man.exennScrabble.exennTo\ 1o\ mou\                                                                                         vivlio\ gia\ to\ skaki.pdfnnVivlos\ gia\ Atheofovous.pdfnnV-Radio\ v2.4.msinnni
                                                                                        nikos at superhost.gr [~/www/data/apps]# ls -b *
100\ Mythoi\ tou\ Aiswpou.pdf*                                            Online\ Movie\ Player.zip*
Anekdotologio.exe*                                                        O\ Nomos\ tou\ Merfy\ v1-2-3.zip
Battleship.exe                                                            Orthodoxo\ Imerologio.exe*
\323\352\335\370\357\365\ \335\355\341\355\ \341\361\351\350\354\374.exe  Pac-Man.exe
Kosmas\ o\ Aitwlos\ -\ Profiteies.pdf*                                    Scrabble.exe
Luxor\ Evolved.exe                                                        To\ 1o\ mou\ vivlio\ gia\ to\ skaki.pdf*
Monopoly.exe                                                              Vivlos\ gia\ Atheofovous.pdf*
\305\365\367\336\ \364\357\365\ \311\347\363\357\375.mp3                  V-Radio\ v2.4.msi
nikos at superhost.gr [~/www/data/apps]#
-------------------------------

I uploaded via FileZilla the files with english chars and then reanmes from CentOS, i did that to avoid renaming them from within my Win8. I though it was betetr to rename form within linux itself.



More information about the Python-list mailing list