Changing filenames from Greeklish => Greek (subprocess complain)

Νικόλαος Κούρας nikos.gr33k at gmail.com
Thu Jun 6 14:46:20 EDT 2013


Τη Πέμπτη, 6 Ιουνίου 2013 3:44:52 μ.μ. UTC+3, ο χρήστης Steven D'Aprano έγραψε:

> py> s = '999-Eυχή-του-Ιησού'
> py> bytes_as_utf8 = s.encode('utf-8')
> py> t = bytes_as_utf8.decode('iso-8859-7', errors='replace')
> py> print(t) 
> 999-EΟΟΞ�-ΟΞΏΟ-ΞΞ·ΟΞΏΟ

errors='replace' mean dont break in case or error?
You took the unicode 's' string you utf-8 bytestringed it.
Then how its possible to ask for the utf8-bytestring to decode back to unicode string with the use of a different charset that the one used for encoding and thsi actually printed the filename in greek-iso?


> So that demonstrates part of your problem: even though your Linux system  
> is using UTF-8, your terminal is probably set to ISO-8859-7. The  
> interaction between these will lead to strange and disturbing Unicode 
> errors.

Yes i feel this is the problem too. 
Its a wonder to me why putty used by default greek-iso instead of utf-8 !!

Please explain this t me because now that i begin to understand this encode/decode things i begin to like them!

a) WHAT does it mean when a linux system is set to use utf-8?
b) WHAT does it mean when a terminal client is set to use utf-8?
c) WHAT happens when the two of them try to work together?


> So I believe I understand how your file name has become garbage. To fix 
> it, make sure that your terminal is set to use UTF-8, and then rename it. 
> Do the same with every file in the directory until the problem goes away.

nikos at superhost.gr [~/www/cgi-bin]# echo $LS_OPTIONS
--color=tty -F -a -b -T 0

Is this okey? The '-b' option is for to display a filename in binary mode?

Indeed i have changed putty to use 'utf-8' and 'ls -l' now displays the file in correct greek letters. Switching putty's encoding back to 'greek-iso' then the *displayed* filanames shows in mojabike.

WHAT is being displayed and what is actually stored as bytes is two different thigns right?

Ευχη του Ιησου.mp3
EΟΟΞ�-ΟΞΏΟ-ΞΞ·ΟΞΏΟ

is the way the filaname is displayed in the terminal depending on the encoding the terminal uses, correct? But no matter *how* its being dislayed those two are the same file?



More information about the Python-list mailing list