hex dump w/ or w/out utf-8 chars

wxjmfauth at gmail.com wxjmfauth at gmail.com
Tue Jul 9 05:34:08 EDT 2013


Le mardi 9 juillet 2013 09:00:02 UTC+2, Steven D'Aprano a écrit :
> On Mon, 08 Jul 2013 10:53:18 -0700, ferdy.blatsco wrote:
> 
> 
> 
> > Not using python 3, for me (a programmer which was present at the
> 
> > beginning of computer science, badly interacting with many languages
> 
> > from assembler to Fortran and from c to Pascal and so on) it was an hard
> 
> > job to arrange the abrupt transition from characters only equal to bytes
> 
> 
> 
> Characters have *never* been equal to bytes. Not even Perl treats the 
> 
> character 'A' as equal to the byte 0x0A:
> 
> 
> 
> if (0x0A eq 'A') {print "Equal\n";}
> 
> else {print "Unequal\n";}
> 
> 
> 
> will print Unequal, even if you replace "eq" with "==". Nor does Perl 
> 
> consider the character 'A' equal to 65.
> 
> 
> 
> If you have learned to think of characters being equal to bytes, you have 
> 
> learned wrong.
> 
> 
> 
> 
> 
> > to some special characters defined with 2, 3 bytes and even more. I
> 
> > should have preferred another solution... but i'm not Guido....!
> 
> 
> 
> What's a special character?
> 
> 
> 
> To an Italian, the characters J, K, W, X and Y are "special characters" 
> 
> which do not exist in the ordinary alphabet. To a German, they are not 
> 
> special, but S is special because you write SS as ß, but only in 
> 
> lowercase.
> 
> 
> 
> To a mathematician, σ is just as ordinary as it would be to a Greek; but 
> 
> the mathematician probably won't recognise ς unless she actually is 
> 
> Greek, even though they are the same letter.
> 
> 
> 
> To an American electrician, Ω is an ordinary character, but ω isn't.
> 
> 
> 
> To anyone working with angles, or temperatures, the degree symbol ° is an 
> 
> ordinary character, but the radian symbol is not. (I can't even find it.)
> 
> 
> 
> The English have forgotten that W used to be a ligature for VV, and 
> 
> consider it a single ordinary character. But the ligature Æ is considered 
> 
> an old-fashioned way of writing AE.
> 
> 
> 
> But to Danes and Norwegians, Æ is an ordinary letter, as distinct from AE 
> 
> as TH is from Þ. (Which English used to have.) And so on... 
> 
> 
> 
> I don't know what a special character is, unless it is the ASCII NUL 
> 
> character, since that terminates C strings.


--------

The concept of "special characters" does not exist.
However, the definition of a "character" is a problem
per se (character, glyph, grapheme, ...).

You are confusing Unicode, typography and linguistic.

There is no symbole for radian because mathematically
radian is a pure number, a unitless number. You can
hower sepecify a = ... in radian (rad).

Note the difference between SS and ẞ
'FRANZ-JOSEF-STRAUSS-STRAẞE'

jmf






More information about the Python-list mailing list