Changing filenames from Greeklish => Greek (subprocess complain)

Νικόλαος Κούρας nikos.gr33k at gmail.com
Sun Jun 9 05:08:48 EDT 2013


Τη Κυριακή, 9 Ιουνίου 2013 11:55:43 π.μ. UTC+3, ο χρήστης Lele Gaifax έγραψε:
> Steven D'Aprano <steve+comp.lang.python at pearwood.info> writes:
> 
> 
> 
> > On Sat, 08 Jun 2013 22:09:57 -0700, nagia.retsina wrote:
> 
> >
> 
> >> chr('A') would give me the mapping of this char, the number 65 while
> 
> >> ord(65) would output the char 'A' likewise.
> 
> >
> 
> > Correct. Python uses Unicode, where code-point 65 ("ordinal value 65") 
> 
> > means letter "A".
> 
> 
> 
> Actually, that's the other way around:
> 
> 
> 
>     >>> chr(65)
> 
>     'A'
> 
>     >>> ord('A')
> 
>     65
> 
> 
> 
> >> What would happen if we we try to re-encode bytes on the disk? like
> 
> >> trying:
> 
> >> 
> 
> >> s = "νίκος"
> 
> >> utf8_bytes = s.encode('utf-8')
> 
> >> greek_bytes = utf_bytes.encode('iso-8869-7')
> 
> >> 
> 
> >> Can we re-encode twice or as many times we want and then decode back
> 
> >> respectively lke?
> 
> >
> 
> > Of course. Bytes have no memory of where they came from, or what they are 
> 
> > used for. All you are doing is flipping bits on a memory chip, or on a 
> 
> > hard drive. So long as *you* remember which encoding is the right one, 
> 
> > there is no problem. If you forget, and start using the wrong one, you 
> 
> > will get garbage characters, mojibake, or errors.
> 
> 
> 
> Uhm, no: "encode" transforms a Unicode string into an array of bytes,
> 
> "decode" does the opposite transformation. You cannot do the former on
> 
> an "arbitrary" array of bytes:
> 
> 
> 
>     >>> s = "νίκος"
> 
>     >>> utf8_bytes = s.encode('utf-8')
> 
>     >>> greek_bytes = utf8_bytes.encode('iso-8869-7')
> 
>     Traceback (most recent call last):
> 
>       File "<stdin>", line 1, in <module>
> 
>     AttributeError: 'bytes' object has no attribute 'encode'

So something encoded into bytes cannot be re-encoded to some other bytes.

How about a string i wonder?
s = "νίκος"
what_are these_bytes = s.encode('iso-8869-7').encode(utf-8')



More information about the Python-list mailing list