[Email-SIG] Patch: Improve recognition of attachment file name, with encodings
Nando
nando at acapela.com.br
Sun Feb 17 03:24:00 CET 2008
OK, I get question number 2 now. My question was:
2) Is there some flaw in decode_header()? Something that Thunderbird
displays as "Eduardo & Mônica" is being decoded with the wrong character
in place of the ô:
repr(decode_header(m["subject"])[0][0])
'Eduardo & M\xf4nica'
The header being tested is:
Subject: =?iso-8859-1?Q?Eduardo_&_M=F4nica?=
In case we are again doing the Right Thing, then why does Thunderbird
display it the way it was intended?
The answer is I have to use codecs.decode():
import codecs
In [20]: [(s, encoding)] = decode_header("=?iso-8859-1?Q?P=F4nei?=")
In [21]: s
Out[21]: 'P\xf4nei'
In [22]: encoding
Out[22]: 'iso-8859-1'
In [23]: print codecs.decode(s, encoding)
Pônei
Well, that just makes it even harder to use the return value of the
decode_header() function. And instead of encapsulating all that
complexity in the email library, you are forcing every user of the
library to find all this out by himself, just as I had to.
This is very un-MartinFowler-like, if you pardon that expression :p
I understand Stephen Turnbull's point that it is useful to map the
Message class to RFC 2822, because some users need that. However, that
is not what *I* need - I want a high-level email library, and I am sure
many others do too.
Other mail libraries have faced the challenges of encodings before. I
don't really see why we in Python should hide from that can o'worms (as
Hans-Peter Jansen put it). It is a dirty job, but someone gotta do it!
"How would you handle a mixture of say: big5, euc_jp, koi8_r _and_ utf-8
encodings?"
Well I don't know what the flabbergast you are talking about, but:
Are you scared?
Why should the application developer have to deal with something that
you e-mail experts are much more qualified to implement?
What is it, are you afraid of having a module accused of being "buggy"?
(If so, you know very well that this is not the free software way.)
What about code reuse? Did you see how much I had to do just in order to
print a Subject header?
I do think that a Message subclass (HighLevelMessage?) could play this
role nicely - a high-level interface. Has anyone done this before? (It
is a very obvious idea.) Is anybody else interested at all? Most of the
vibes I get here are like "don't do this, don't do that"...
Thanks to Mark Shapiro for showing me a way to do what I want.
Nando Florestan
===============
[skype] nandoflorestan
[phone] + 55 (11) 3675-3038
[mobile] + 55 (11) 9820-5451
[internet] http://oui.com.br/
[À Capela] http://acapela.com.br/
[location] São Paulo - SP - Brasil
More information about the Email-SIG
mailing list