Decode email subjects into unicode

Laszlo Nagy gandalf at shopzeus.com
Wed Mar 19 06:24:45 EDT 2008


Gertjan Klein wrote:
> Laszlo Nagy wrote:
>
>   
>> However, there are malformed emails and I have to put them into the 
>> database. What should I do with this:
>>     
> [...]
>   
>> There is no encoding given in the subject but it contains 0x92. When I 
>> try to insert this into the database, I get:
>>     
>
> This is indeed malformed email. The content type in the header specifies
> iso-8859-1, but this looks like Windows code page 1252, where character
> \x92 is a single right quote character (unicode \x2019).
>
> As the majority of the mail clients out there are Windows-based, and as
> far as I can tell many of them get the encoding wrong, I'd simply try to
> decode as CP1252 on error, especially if the content-type claims
> iso-8859-1. Many Windows mail clients consider iso-8859-1 equivalent to
> 1252 (it's not; the former doesn't use code points in the range \x8n and
> \x9n, the latter does.)
>
>   
Thank you very much!




More information about the Python-list mailing list