How do I decode unicode characters in the subject using email.message_from_string()?

Thorsten Kampe thorsten at thorstenkampe.de
Wed Feb 25 13:19:35 EST 2009


* Tim Golden (Wed, 25 Feb 2009 17:27:07 +0000)
> Thorsten Kampe wrote:
> > * Gabriel Genellina (Wed, 25 Feb 2009 14:00:16 -0200)
> >> En Wed, 25 Feb 2009 13:40:31 -0200, Thorsten Kampe  
[...]
> >>> And I wonder why you would think the header contains Unicode characters
> >>> when it says "us-ascii" ("=?us-ascii?Q?"). I think there is a tendency
> >>> to label everything "Unicode" someone does not understand.
> >> And I wonder why you would think the header does *not* contain Unicode  
> >> characters when it says "us-ascii"?.
> > 
> > Basically because it didn't contain any Unicode characters (anything 
> > outside the ASCII range).
> 
> And I imagine that Gabriel's point was -- and my point certainly
> is -- that Unicode includes all the characters *inside* the
> ASCII range.

I know that this was Gabriel's point. And my point was that Gabriel's 
point was pointless. If you call any text (or character) "Unicode" then 
the word "Unicode" is generalized to an extent where it doesn't mean 
anything at all anymore and becomes a buzz word.

With the same reason you could call ASCII an Unicode encoding (which it 
isn't) because all ASCII characters are Unicode characters (code 
points). Only encodings that cover the full Unicode range can reasonably 
be called Unicode encodings.

The OP just saw some "weird characters" in the email subject and thought 
"I know. It looks weird. Must be Unicode". But it wasn't. It was good 
ole ASCII - only Quoted Printable encoded.


Thorsten



More information about the Python-list mailing list