To unicode or not to unicode

Thorsten Kampe thorsten at thorstenkampe.de
Sat Feb 21 18:35:35 EST 2009


* Ross Ridge (Sat, 21 Feb 2009 18:06:35 -0500)
> > The link demonstrates that Google Groups doesn't assume ASCII like
> > Python does.  Since popular newsreaders like Google Groups and Outlook
> > Express can display the message correctly without the MIME headers,
> > but your obscure one can't, there's a much stronger case to made that
> > it's your newsreader that's broken.
> 
> Thorsten Kampe  <thorsten at thorstenkampe.de> wrote:
> >*sigh* I give up on you. You didn't even read the "Joel on Software" 
> >article. The whole "why" and "what for" of Unicode and MIME will always 
> >be a complete mystery to you.
> 
> I understand what Unicode and MIME are for and why they exist. Neither
> their merits nor your insults change the fact that the only current
> standard governing the content of Usenet posts doesn't require their
> use.

That's right. As long as you use pure ASCII you can skip this nasty step 
of informing other people which charset you are using. If you do use non 
ASCII then you have to do that. That's the way virtually all newsreaders 
work. It has nothing to do with some 21+ year old RFC. Even your Google 
Groups "newsreader" does that ('content="text/html; charset=UTF-8"').

Being explicit about your encoding is 99% of the whole Unicode magic in 
Python and in any communication across the Internet (may it be NNTP, 
SMTP or HTTP). Your Google Groups simply uses heuristics to guess the 
encoding the OP probably used. Windows newsreaders simply use the locale 
of the local host. That's guessing. You can call it assuming but it's 
still guessing. There is no way you can be sure without any declaration.

And it's unpythonic. Python "assumes" ASCII and if the decodes/encoded 
text doesn't fit that encoding it refuses to guess.

T.



More information about the Python-list mailing list