[Email-SIG] fixing the current email module
Glenn Linderman
v+python at g.nevcal.com
Fri Oct 9 00:50:37 CEST 2009
On approximately 10/8/2009 4:40 AM, came the following characters from
the keyboard of Stephen J. Turnbull:
> Glenn Linderman writes:
>
> > > > If conversions are avoided, then octets are unlikely to be out of
> > > > range?
> > >
> > > Haven't looked in your spam bucket recently, I guess. Spammers
> > > regularly put 8 bit characters into headers (and into bodies in
> > > messages without a Content-Type header), for one thing.
> >
> > I'm aware of that, but if conversions are not done, octets are unlikely
> > to be _reported_ to be out of range....
>
> Conversions will eventually be done. "Best it were done quickly."
>
Disagree. Deferring the conversions defers failure issues to the point
where the code (hopefully) somewhat understands the type of data being
manipulated, and can then handle it appropriately. Converting up front
causes errors in things that may never be touched or needed, so the
error detection and handling is wasteful.
> > > Most clients are simply not going to be prepared for the kind of
> > > crap I see in /var/mail/turnbull every day.
> >
> > Are you referring to most email clients, or most
> > Python-email-library-using clients?
>
> Sorry. When I mean "MUA" I try to say "MUA". By "client", I'm
> referring to the higher level logic that is going to be calling the
> email module.
>
Yeah, terminology between people that haven't discussed the topic before
can slow communication.
So for headers, which are supposed to be ASCII, or encoded via RFC rules
to ASCII (no 8-bit chars), then the discovery of an 8-bit char should be
produce a defect report, but then simply converted to Unicode as if it
were Latin-1 (since there is no other knowledge available that could
produce a better conversion). And if the result of that is not expected
by the client (your definition), then the client should either notice
the defect report and reject it based on that, or attempt to parse it,
and reject it if it encounters unexpected syntax. After all, this is,
for that client, "raw user input" (albeit from a remote source) so fully
error checking the input is appropriate.
> > Is it your point of view, then, that incorrectly formed email should be
> > mostly treated as SPAM?
>
> Heavens no! Not by the email module, anyway! The email module should
> not know about spam (but see Barry's "we're having spam for Launchpad"
> post: if you're that good, anything goes!), except maybe at a very
> high level.
>
I didn't think you'd think that, but things you were saying seemed to be
implying that.
> > Your "hit me with your best shot" comment indicates that you want a
> > failure code or exception when the data is bad, and then a way to
> > "retry accepting errors"?
>
> My curent thinking is that the email module should return an object
> representing a partial parse. The way that you find out if it is
> partial is to try to access some data that "should" be in the object.
> If the parse succeeded, the accessor returns the data (which might be
> empty). If the parse did not succeed, you get an AttributeError.
> (This is just a paraphrase of what I wrote in response to Oleg.)
yeah, or some error, anyway.
The problem with the APIs that are spelled __str__ and __bytes__ is that
there is no other way to return errors other than exceptions.... the
Python way. Since the email library is trying to avoid raising
exceptions in large blocks of its code, it is non-Pythonic (which is
what Oleg is probably complaining about, in part). But because it needs
to avoid exceptions, and is therefore non-Pythonic, it may be
inappropriate to spell very many of its APIs __str__ and __bytes__,
because that is Pythonic, and requires exceptions. Once you become
non-Pythonic in one area, you may have to also be non-Pythonic in some
other areas...
--
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
More information about the Email-SIG
mailing list