Changing filenames from Greeklish => Greek (subprocess complain)

Νικόλαος Κούρας nikos.gr33k at gmail.com
Fri Jun 7 02:56:42 EDT 2013


On 7/6/2013 4:01 πμ, Cameron Simpson wrote:
> On 06Jun2013 11:46, =?utf-8?B?zp3Or866zr/PgiDOk866z4EzM866?= <nikos.gr33k at gmail.com> wrote:
> | Τη Πέμπτη, 6 Ιουνίου 2013 3:44:52 μ.μ. UTC+3, ο χρήστης Steven D'Aprano έγραψε:
> | > py> s = '999-Eυχή-του-Ιησού'
> | > py> bytes_as_utf8 = s.encode('utf-8')
> | > py> t = bytes_as_utf8.decode('iso-8859-7', errors='replace')
> | > py> print(t)
> | > 999-EΟΟΞ�-ΟΞΏΟ-ΞΞ·ΟΞΏΟ
> |
> | errors='replace' mean dont break in case or error?
>
> Yes. The result will be correct for correct iso-8859-7 and slightly mangled
> for something that would not decode smoothly.
How can it be correct? We have encoded out string in utf-8 and then we 
tried to decode it as greek-iso? How can this possibly be correct?
>
> | You took the unicode 's' string you utf-8 bytestringed it.
> | Then how its possible to ask for the utf8-bytestring to decode
> | back to unicode string with the use of a different charset that the
> | one used for encoding and thsi actually printed the filename in
> | greek-iso?
>
> It is easily possible, as shown above. Does it make sense? Normally
> not, but Steven is demonstrating how your "mv" exercises have
> behaved: a rename using utf-8, then a _display_ using iso-8859-7.
Same as above, i don't understand it at all, since different 
charsets(encodings) used in the encode/decode process.
> |
> | a) WHAT does it mean when a linux system is set to use utf-8?
>
> It means the locale settings _for the current process_ are set for
> UTF-8. The "locale" command will show you the current state.
That means that, when a linux application needs to saved a filename to 
the linux filesystem, the app checks the filesytem's 'locale', so to 
encode the filename using the utf-8 charset ?
And likewise when a linux application wants to decode a filename is also 
checking the filesystem's 'locale' setting so to know what charset must 
use to decode the filename correctly back to the original string?

So locale is used for filesystem itself and linux apps to know how to 
read(decode) and write(enode) filenames from/into the system's hdd?
>
>
> | c) WHAT happens when the two of them try to work together?
>
> If everything matches, it is all good. If the locales do not match,
> the mismatch will result in an undesired bytes<->characters
> encode/decode step somewhere, and something will display incorrectly
> or be entered as input incorrectly.

Cant quite grasp the idea:

local end: Win8,  locale = greek-iso
remote end: CentOS 6.4,  locale = utf-8

FileZilla by default uses "do not know what charset" to upload filenames
Putty by default uses greek-iso to display filenames


WHAT someone can expect to happen when all of the above work together?
Mess of course, but i want to hear in detail each step of the mess as it 
emerges.

-- 
Webhost <http://superhost.gr>&& Weblog <http://psariastonafro.wordpress.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20130607/0928f2e1/attachment.html>


More information about the Python-list mailing list