How do I decode unicode characters in the subject using email.message_from_string()?

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Wed Feb 25 13:59:17 EST 2009


En Wed, 25 Feb 2009 16:19:35 -0200, Thorsten Kampe  
<thorsten at thorstenkampe.de> escribió:
> * Tim Golden (Wed, 25 Feb 2009 17:27:07 +0000)
>> Thorsten Kampe wrote:
>> > * Gabriel Genellina (Wed, 25 Feb 2009 14:00:16 -0200)
>> >> En Wed, 25 Feb 2009 13:40:31 -0200, Thorsten Kampe
> [...]
>> >>> And I wonder why you would think the header contains Unicode  
>> characters
>> >>> when it says "us-ascii" ("=?us-ascii?Q?"). I think there is a  
>> tendency
>> >>> to label everything "Unicode" someone does not understand.
>> >> And I wonder why you would think the header does *not* contain  
>> Unicode
>> >> characters when it says "us-ascii"?.
>> >
>> > Basically because it didn't contain any Unicode characters (anything
>> > outside the ASCII range).
>>
>> And I imagine that Gabriel's point was -- and my point certainly
>> is -- that Unicode includes all the characters *inside* the
>> ASCII range.
>
> I know that this was Gabriel's point. And my point was that Gabriel's
> point was pointless. If you call any text (or character) "Unicode" then
> the word "Unicode" is generalized to an extent where it doesn't mean
> anything at all anymore and becomes a buzz word.

If it's text, it should use Unicode. Maybe not now, but in a few years, it  
will be totally unacceptable not to properly use Unicode to process  
textual data.

> With the same reason you could call ASCII an Unicode encoding (which it
> isn't) because all ASCII characters are Unicode characters (code
> points). Only encodings that cover the full Unicode range can reasonably
> be called Unicode encodings.

Not at all. ASCII is as valid as character encoding ("coded character set"  
as the Unicode guys like to say) as ISO 10646 (which covers the whole  
range).

> The OP just saw some "weird characters" in the email subject and thought
> "I know. It looks weird. Must be Unicode". But it wasn't. It was good
> ole ASCII - only Quoted Printable encoded.

Good f*cked ASCII is Unicode too.

-- 
Gabriel Genellina




More information about the Python-list mailing list