To unicode or not to unicode

Thorsten Kampe thorsten at thorstenkampe.de
Sat Feb 21 20:25:54 EST 2009


* Ross Ridge (Sat, 21 Feb 2009 19:39:42 -0500)
> Thorsten Kampe  <thorsten at thorstenkampe.de> wrote:
> >That's right. As long as you use pure ASCII you can skip this nasty step 
> >of informing other people which charset you are using. If you do use non 
> >ASCII then you have to do that. That's the way virtually all newsreaders 
> >work. It has nothing to do with some 21+ year old RFC. Even your Google 
> >Groups "newsreader" does that ('content="text/html; charset=UTF-8"').
> 
> No, the original post demonstrates you don't have include MIME headers for
> ISO 8859-1 text to be properly displayed by many newsreaders.

*sigh* As you still refuse to read the article[1] I'm going to quote it 
now here:

'The Single Most Important Fact About Encodings

If you completely forget everything I just explained, please remember 
one extremely important fact. It does not make sense to have a string 
without knowing what encoding it uses.
[...]
If you have a string [...] in an email message, you have to know what 
encoding it is in or you cannot interpret it or display it to users 
correctly.

Almost every [...] "she can't read my emails when I use accents" problem 
comes down to one naive programmer who didn't understand the simple fact 
that if you don't tell me whether a particular string is encoded using 
UTF-8 or ASCII or ISO 8859-1 (Latin 1) or Windows 1252 (Western 
European), you simply cannot display it correctly [...]. There are over 
a hundred encodings and above code point 127, all bets are off.'

Enough said.

> The fact that your obscure newsreader didn't display it properly
> doesn't mean that original poster's newsreader is broken.

You don't even know if my "obscure newsreader" displayed it properly. 
Non ASCII text without a declared encoding is just a bunch of bytes. 
It's not even text.

T.

[1] http://www.joelonsoftware.com/articles/Unicode.html



More information about the Python-list mailing list