[Email-SIG] Patch: Improve recognition of attachment file name, with encodings

Nando nando at acapela.com.br
Sun Feb 17 03:24:00 CET 2008


OK, I get question number 2 now. My question was:

2) Is there some flaw in decode_header()? Something that Thunderbird 
displays as "Eduardo & Mônica" is being decoded with the wrong character 
in place of the ô:
repr(decode_header(m["subject"])[0][0])
'Eduardo & M\xf4nica'
The header being tested is:
Subject: =?iso-8859-1?Q?Eduardo_&_M=F4nica?=
In case we are again doing the Right Thing, then why does Thunderbird 
display it the way it was intended?


The answer is I have to use codecs.decode():

import codecs

In [20]: [(s, encoding)] = decode_header("=?iso-8859-1?Q?P=F4nei?=")

In [21]: s
Out[21]: 'P\xf4nei'

In [22]: encoding
Out[22]: 'iso-8859-1'

In [23]: print codecs.decode(s, encoding)
Pônei

Well, that just makes it even harder to use the return value of the 
decode_header() function. And instead of encapsulating all that 
complexity in the email library, you are forcing every user of the 
library to find all this out by himself, just as I had to.

This is very un-MartinFowler-like, if you pardon that expression :p

I understand Stephen Turnbull's point that it is useful to map the 
Message class to RFC 2822, because some users need that. However, that 
is not what *I* need - I want a high-level email library, and I am sure 
many others do too.

Other mail libraries have faced the challenges of encodings before. I 
don't really see why we in Python should hide from that can o'worms (as 
Hans-Peter Jansen put it). It is a dirty job, but someone gotta do it!

"How would you handle a mixture of say: big5, euc_jp, koi8_r _and_ utf-8 
encodings?"

Well I don't know what the flabbergast you are talking about, but:

Are you scared?

Why should the application developer have to deal with something that 
you e-mail experts are much more qualified to implement?

What is it, are you afraid of having a module accused of being "buggy"? 
(If so, you know very well that this is not the free software way.)

What about code reuse? Did you see how much I had to do just in order to 
print a Subject header?

I do think that a Message subclass (HighLevelMessage?) could play this 
role nicely - a high-level interface. Has anyone done this before? (It 
is a very obvious idea.) Is anybody else interested at all? Most of the 
vibes I get here are like "don't do this, don't do that"...

Thanks to Mark Shapiro for showing me a way to do what I want.

Nando Florestan
===============
[skype]    nandoflorestan
[phone]  + 55 (11) 3675-3038
[mobile] + 55 (11) 9820-5451
[internet] http://oui.com.br/
[À Capela] http://acapela.com.br/
[location] São Paulo - SP - Brasil



More information about the Email-SIG mailing list