Changing filenames from Greeklish => Greek (subprocess complain)

Νικόλαος Κούρας nikos.gr33k at gmail.com
Sat Jun 8 17:14:01 EDT 2013


Τη Σάββατο, 8 Ιουνίου 2013 10:01:57 μ.μ. UTC+3, ο χρήστης Steven D'Aprano έγραψε:

> ASCII actually needs 7 bits to store a character. Since computers are  
> optimized to work with bytes, not bits, normally ASCII characters are
> stored in a single byte, with one bit wasted.

So ASCII and Unicode are 2 Encoding Systems currently in use.
How should i imagine them, visualize them?
Like tables 'A' = 65, 'B' = 66 and so on?

But if i do then that would be the visualization of a 'charset' not of an encoding system.
What the diffrence of an encoding system and of a charset?

ebcdic - ascii - unicode = al of them are encoding systems

greek-iso - latin-iso - utf8 - utf16 = all of them are charsets.

What are these differences? i cant imagine them all, i can only imagine charsets not encodign systems.

Why python interprets by default all given strings as unicode and not ascii? because the former supports many positions while ascii only 127 positions , hence can interpet only 127 different characters? 


> "Narrow" Unicode uses two bytes per character. Since two bytes is only 
> enough for about 65,000 characters, not 1,000,000+, the rest of the 
> characters are stored as pairs of two-byte "surrogates".

surrogates literal means a replacemnt?


> Latin-1 is similar, except there are 256 positions. Greek ISO-8859-7 is 
> also similar, also 256 positions, but the characters are different. And 
> so on, with dozens of charsets. 

Latin has to display english chars(capital, small) + numbers + symbols. that would be 127 why 256?

greek = all of the above plus greek chars, no?

> And then there is Unicode, which includes *every* character is all of 
> those dozens of charsets. It has 1114111 positions (most are currently  
> unfilled).

Shouldt the positions that Unicode has to use equal to the summary of all available characters of all the languages of the worlds plus numbers and special chars? why 1.000.000+ why the need for so many positions? Narrow Unicode format (2 byted) can cover all ofmthe worlds symbols.

> An encoding is simply a program that takes a character and returns a 
> byte, or visa versa. For instance, the ASCII encoding will take character 
> 'A'. That is found at position 65, which is 0x41 in hexadecimal, so the 
> ASCII encoding turns character 'A' into byte 0x41, and visa versa.

Why you say ASCII turn a character into HEX format and not as in binary format?
Isnt the latter the way bytes are stored into hdd, like 010101111010101 etc?
Are they stored as hex instead or you just said so to avoid printing 0s and 1s?




More information about the Python-list mailing list