Changing filenames from Greeklish => Greek (subprocess complain)

Benjamin Kaplan benjamin.kaplan at case.edu
Sun Jun 9 16:05:49 EDT 2013


On Sun, Jun 9, 2013 at 2:38 AM, Νικόλαος Κούρας <nikos.gr33k at gmail.com> wrote:
> Τη Κυριακή, 9 Ιουνίου 2013 12:20:58 μ.μ. UTC+3, ο χρήστης Lele Gaifax έγραψε:
>
>> > How about a string i wonder?
>> > s = "νίκος"
>> > what_are these_bytes = s.encode('iso-8869-7').encode(utf-8')
>
>> Ignoring the usual syntax error, this is just a variant of the code I
>> posted: "s.encode('iso-8869-7')" produces a bytes instance which
>> *cannot* be "re-encoded" again in whatever encoding.
>
> s = 'a'
> s = s.encode('iso-8859-7').decode('utf-8')
> print( s )
>
> a (we got the original character back)
> ================================
> s = 'α'
> s = s.encode('iso-8859-7').decode('utf-8')
> print( s )
>
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 0: unexpected end of data
>
> Why this error? because 'a' ordinal value > 127 ?
> --

No. You get that error because the string is not encoded in UTF-8.
It's encoded in ISO-8859-7. For ASCII strings (ord(x) < 127),
ISO-8859-7 and UTF-8 look exactly the same. For anything else, they
are different. If you were to try to decode it as ISO-8859-1, it would
succeed, but you would get the character "á" back instead of α.

You're misunderstanding the decode function. Decode doesn't turn it
into a string with the specified encoding. It takes it *from* the
string with the specified encoding and turns it into Python's internal
string representation. In Python 3.3, that encoding doesn't even have
a name because it's not a standard encoding. So you want the decode
argument to match the encode argument.



More information about the Python-list mailing list