Changing filenames from Greeklish => Greek (subprocess complain)

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sun Jun 9 02:45:50 EDT 2013


On Sat, 08 Jun 2013 22:09:57 -0700, nagia.retsina wrote:

> chr('A') would give me the mapping of this char, the number 65 while
> ord(65) would output the char 'A' likewise.

Correct. Python uses Unicode, where code-point 65 ("ordinal value 65") 
means letter "A".

There are older encodings. For example, a very old one, used on IBM 
mainframes, is EBCDIC, where ordinal value 65 means the letter "â", and 
the letter "A" has ordinal value 193.

 
> What would happen if we we try to re-encode bytes on the disk? like
> trying:
> 
> s = "νίκος"
> utf8_bytes = s.encode('utf-8')
> greek_bytes = utf_bytes.encode('iso-8869-7')
> 
> Can we re-encode twice or as many times we want and then decode back
> respectively lke?

Of course. Bytes have no memory of where they came from, or what they are 
used for. All you are doing is flipping bits on a memory chip, or on a 
hard drive. So long as *you* remember which encoding is the right one, 
there is no problem. If you forget, and start using the wrong one, you 
will get garbage characters, mojibake, or errors.

[...]
> And also is there a deiffrence between "encoding" and "compressing" ?

Of course. They are totally unrelated.

> Isnt the latter useing some form of encoding to take a string or bytes
> to make hold less space on disk?

Correct, except forget about "encoding". It's not relevant (except, 
maybe, in a mathematical sense) and will just confuse you.


-- 
Steven



More information about the Python-list mailing list