Changing filenames from Greeklish => Greek (subprocess complain)

Larry Hudson orgnut at yahoo.com
Mon Jun 10 03:51:34 EDT 2013


On 06/09/2013 03:37 AM, Νικόλαος Κούρας wrote:

>
> I mean utf-8 could use 1 byte for storing the 1st 256 characters. I meant up to 256, not above 256.
>
NO!!

0 - 127, yes.
128 - 255 -> one byte of a multibyte code.

That's why the decode fails, it sees it as incomplete data so it can't do anything with it.

>
> A surrogate pair is like itting for example Ctrl-A, which means is a combination character that consists of 2 different characters?
> Is this what a surrogate is? a pari of 2 chars?
>
You're confusing character encodings with the way NON-CHARACTER keys on the KEYBOARD are encoded 
(function keys, arrow keys and such).  These are NOT text characters but KEYBOARD key codes. 
These are NOT text codes and are entirely different and not related to any character encoding. 
How programs interpret and use these codes depends entirely on the individual programs.  There 
are common conventions on how many are used, but there are no standards.

Also the control-codes are the first 32 values of the ASCII (and ASCII-compatible) character set 
and are not multi-character key codes like the keyboard non-character keys.

However, there are a few keyboard keys that actually produce control-codes.  A few examples:

Return/Enter -> Ctrl-M
Tab -> Ctrl-I
Backspace -> Ctrl-H

>
> So character 'A' <-> 65 (in decimal uses in charset's table)  <-> 01011100 (as binary stored in disk) <-> 0xEF (as hex, when we open the file with a hex editor)
>
You are trying to put too much meaning to this.  The value stored on disk, in memory, or 
whatever is binary bits, nothing else.  How you describe the value, in decimal, in octal, in 
hex, in base-12, or... is totally irrelevant.  These are simply different ways of describing or 
naming these numeric values.

It's the same as saying 3 in English is three, in Spanish is tres, in German is drei...  (I 
don't know Greek, sorry.)  No matter what you call it, it is still the numeric integer value 
that is between 2 and 4.




More information about the Python-list mailing list