From xavier.delannoy at cloudmark.com  Wed Jul  6 11:53:43 2011
From: xavier.delannoy at cloudmark.com (xavier delannoy)
Date: Wed, 6 Jul 2011 11:53:43 +0200
Subject: [Email-SIG] question about the best way to check if an email is
	valid (RFC compliant)
Message-ID: <4E1430A7.3060701@cloudmark.com>

Hi,

I use the python email library for an Automation Test Framework. The 
framework test a Mail Transfert Agent. I notice that the "email" library 
silently fix a lot of MIME errors. I can understand this behaviour, but 
I need to validate that the email sent by the MTA are correct.
I wonder if there's a way to use the python MIME Parser more 
"aggressively" (without modifying the email) and raise and exception as 
soon as an error is detected.

Here's my needs and an extract of my python:

   # Parse the received email
   try:
     received_email = email.message_from_string(args)
   except email.errors.MessageError, msg:
     self.ok((False, msg), msg)

I test with an invalid email (for example, with a missing closed 
boundary), and no exception is raised. The email lib fixes the issue.

Is there a way to tell the email lib to not modify the email, and raise 
an exception ?

Regards,

-- Xavier

QA Engineer at Cloudmark Labs.

From rdmurray at bitdance.com  Wed Jul  6 14:08:55 2011
From: rdmurray at bitdance.com (R. David Murray)
Date: Wed, 06 Jul 2011 08:08:55 -0400
Subject: [Email-SIG] question about the best way to check if an email is
	valid (RFC compliant)
In-Reply-To: <4E1430A7.3060701@cloudmark.com>
References: <4E1430A7.3060701@cloudmark.com>
Message-ID: <20110706120856.349E02506C1@webabinitio.net>

On Wed, 06 Jul 2011 11:53:43 +0200, xavier delannoy <xavier.delannoy at cloudmark.com> wrote:
> I use the python email library for an Automation Test Framework. The 
> framework test a Mail Transfert Agent. I notice that the "email" library 
> silently fix a lot of MIME errors. I can understand this behaviour, but 
> I need to validate that the email sent by the MTA are correct.
> I wonder if there's a way to use the python MIME Parser more 
> "aggressively" (without modifying the email) and raise and exception as 
> soon as an error is detected.
> 
> Here's my needs and an extract of my python:

Not at the moment.  You can check the defects attribute afterward to
see if there were any detected errors, though.

In email6 (planned for python 3.3) we will be providing a facility for
doing the raise immediately.  I don't know what you mean by leave
the message unmodified, though, since the input string is already
unmodified, and you won't get a message object when an error is raised.

--David

From xavier.delannoy at cloudmark.com  Wed Jul  6 15:58:22 2011
From: xavier.delannoy at cloudmark.com (xavier delannoy)
Date: Wed, 6 Jul 2011 15:58:22 +0200
Subject: [Email-SIG] question about the best way to check if an email is
 valid (RFC compliant)
In-Reply-To: <20110706120856.349E02506C1@webabinitio.net>
References: <4E1430A7.3060701@cloudmark.com>
	<20110706120856.349E02506C1@webabinitio.net>
Message-ID: <4E1469FE.8000906@cloudmark.com>

On 07/06/2011 02:08 PM, R. David Murray wrote:
> On Wed, 06 Jul 2011 11:53:43 +0200, xavier delannoy<xavier.delannoy at cloudmark.com>  wrote:
>> I use the python email library for an Automation Test Framework. The
>> framework test a Mail Transfert Agent. I notice that the "email" library
>> silently fix a lot of MIME errors. I can understand this behaviour, but
>> I need to validate that the email sent by the MTA are correct.
>> I wonder if there's a way to use the python MIME Parser more
>> "aggressively" (without modifying the email) and raise and exception as
>> soon as an error is detected.
>>
>> Here's my needs and an extract of my python:
>
> Not at the moment.  You can check the defects attribute afterward to
> see if there were any detected errors, though.
>
> In email6 (planned for python 3.3) we will be providing a facility for
> doing the raise immediately.  I don't know what you mean by leave
> the message unmodified, though, since the input string is already
> unmodified, and you won't get a message object when an error is raised.
>

If an error is raised, and if I won't get a message object, then I'm
fine. But with Python 2.7.1, I get a message object and the attribute
defects is empty.

In the attachment you will find:
   - orig.eml : an email with an error. The boundary
"000101020201080900040301" isn't closed
   - after_parsing.eml: same email after calling email.message_from_file()
The boundary is now closed. And the defects attribute is empty
   - test.py: python script to reproduce.

-- Xavier


> --David

-------------- next part --------------
A non-text attachment was scrubbed...
Name: sample.tgz
Type: application/x-compressed-tar
Size: 1853 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/email-sig/attachments/20110706/42f8810c/attachment.bin>

From rdmurray at bitdance.com  Wed Jul  6 20:41:40 2011
From: rdmurray at bitdance.com (R. David Murray)
Date: Wed, 06 Jul 2011 14:41:40 -0400
Subject: [Email-SIG] question about the best way to check if an email is
	valid (RFC compliant)
In-Reply-To: <4E145693.4030306@cloudmark.com>
References: <4E1430A7.3060701@cloudmark.com>
	<20110706120856.349E02506C1@webabinitio.net>
	<4E145693.4030306@cloudmark.com>
Message-ID: <20110706184141.389112505A3@webabinitio.net>

On Wed, 06 Jul 2011 14:35:31 +0200, xavier delannoy <xavier.delannoy at cloudmark.com> wrote:
> If an error is raised, and if I won't get a message object, then I'm 
> fine. But with Python 2.7.1, I get a message object and the attribute 
> defects is empty.

Please file a bug report about this at bugs.python.org and add me (tracker
id r.david.murray) to the nosy list.  It may not be fixable as a bug since
the current email package makes no promises about detecting all errors.
However, since as you point out the resulting message structure is
changed, it probably is classifiable as a bug.  I'll take a look at
the issue when I get a chance, or you could propose a patch if you are
motivated to do so.

--
R. David Murray           http://www.bitdance.com

From rdmurray at bitdance.com  Wed Jul 13 18:05:36 2011
From: rdmurray at bitdance.com (R. David Murray)
Date: Wed, 13 Jul 2011 12:05:36 -0400
Subject: [Email-SIG] new blog post
Message-ID: <20110713160537.458D0B14005@webabinitio.net>

I just posted a summary of my past month of work (which has been
at a considerably slower pace than earlier):

    http://www.bitdance.com/blog/2011/07/13_01_email6_summer_vacation/

As I report there, it looks like I have to take a break this summer to do other
stuff, but will pick it up again in the fall.  I think that will still give us
enough test time before 3.3 beta to get this in to 3.3.

I should still be able to coble together bits and pieces of time, so I'm
thinking about posting at least one specific task to the python-mentor's list
(additional tests and the resulting fixes for the parser) to see if anyone
wants to help out.  Or perhaps someone here does :)  I'll also try to find some
time to work on the docs.

The big summer project hasn't actually started yet, so I may make some non-trivial
progress before it does.

--
R. David Murray           http://www.bitdance.com

From rdmurray at bitdance.com  Wed Jul 13 18:15:41 2011
From: rdmurray at bitdance.com (R. David Murray)
Date: Wed, 13 Jul 2011 12:15:41 -0400
Subject: [Email-SIG] API question with respect to address objects
Message-ID: <20110713161542.2AD86B14005@webabinitio.net>

So we have these address objects.  Currently their string value is the
"decoded" value, which means the content from the source is preserved except
that (when I get it working!) encoded words and IDNA are decoded.

As reported in the blog post, I've currently added a 'reformatted' attribute to
give access to the value formatted according to RFC rules (but currently it
isn't handling re-encoding).

So, the question is, what do we want the API to ultimately be?  I'm thinking
that the string value should be the "idealized" value, which would mean we
decode it fully and make it RFC conformant where that makes sense (minimal
quoting, removal of spaces around dots in local parts, etc).  We'll also want a
way to get the wire format version of the address out (properly encoded to
ASCII).  And of course the application may want access to the 'source' value
that was parsed to create the Address object.  I'm thinking that version should
probably be a strict substring of the source attribute of the header the
address belongs to.

Thoughts?

--
R. David Murray           http://www.bitdance.com

From rdmurray at bitdance.com  Tue Jul 19 23:21:39 2011
From: rdmurray at bitdance.com (R. David Murray)
Date: Tue, 19 Jul 2011 17:21:39 -0400
Subject: [Email-SIG] email-6.0.0.a1
Message-ID: <20110719212139.D5D732500D5@webabinitio.net>

OK, so I've released the first iteration of the email6 package on pypi
as email-6.0.0a1.  After install you import it as email6.  This will
allow anyone curious and/or motivated to test it out under Python 3.2.
I'm especially interested in anyone with a working program that uses
email in 3.2: it should be completely backward compatible, and if it
isn't I want to know ASAP.[*]

I've also opened issue 12586 for review of the delta between default
and the code that is in the release.  I'd like to check the code in
to default and continue to work on it from there.  As I said in the
issue comments: "When we originally planned out email6 we thought we'd
be making a "compatibility break" with backward compatibility shims.
As things have turned out the work is more a matter of incremental
improvement of the API while maintaining the old API, and thus it seems
reasonable to me to work on it directly in default rather than continue
to work on it in a separate feature branch."  Assuming, that is, that
the general approach represented by *this* delta is accepted.

What this delta adds to email is a conversion to handling all headers as
full blown objects (as opposed to strings, tuples of strings, or Header
objects, depending on context).  The object type is a subclass of str,
so the headers act like strings if you don't use their additional API.
The basic additional API is that a 'source' attribute contains the
text the generator read from the input source, and a 'value' attribute
that contains the value with all the Content-Transfer-Encoding stuff
undone so that you have a real unicode string.  By changing a policy
setting, you can have that value as the string value of the header.
You can also assign a string with non-ASCII characters to a header, and
the right thing will happen.  (Well, eventually it will happen...right
now it only works correctly for unstructured headers).  Further, Date
headers have a datetime attribute (and accept being set to a datetime),
and address headers have attributes for accessing the individual addresses
in the header.  Other structured headers will eventually grow additional
attributes as well.

The general approach has been discussed with and approved by the email-sig,
but all comments are welcome.  I know there's room for bikeshedding
on some aspects of the API; in some cases I've dome some "placeholder"
stuff pending a more complete solution to certain design goals.

I have a big project in the offing over the next couple months.  QNX is
still fully behind the funding for email6 development, but I probably
won't be able to complete it until the fall.  So I'd like to get this
chunk (the biggest chunk of new code, considering the size of the parser)
reviewed and checked in if possible.  I'll keep working on the bits of
functionality that aren't quite complete and the bugs that I know are
there until my big project kicks off, but I wanted to release/post now
so that there might be a chance of some review happening while I still
have time to respond quickly to the feedback.

--
R. David Murray           http://www.bitdance.com

[*] I believe that if you try to use an email6 Message object with the 3.2
mailbox module you will run in to some trouble, but I think it ought to
be possible to make it work with the right magic :)

PS: I don't have much experience writing parsers, so I'm expecting some
critical comments about my parser design.  It had to be a custom parser
since otherwise I'd be blocked on waiting for some other software to
get accepted into the stdlib, but it certainly wound up being a bigger
chunk of code than I expected when I started writing it.

From rdmurray at bitdance.com  Mon Jul 25 21:42:37 2011
From: rdmurray at bitdance.com (R. David Murray)
Date: Mon, 25 Jul 2011 15:42:37 -0400
Subject: [Email-SIG] header folding
Message-ID: <20110725194238.3ABCB2505A8@webabinitio.net>

Well, my big project still hasn't kicked off, so I'm still working on email6.

I just posted a new blog post:

    http://www.bitdance.com/blog/2011/07/25_01_email6_pypi_release/

The PyPI release is old news here.  The interesting part of the post
for this group is the discussion of the new header folding API at
the end.  Basically, BaseHeader gets a 'wrap' method, and there is
a new policy control, 'refold_source' (I'll probably rename it to
'rewrap_source', since I expect to apply it also to message bodies).
The policy control has three values: none, long, and all.  None means
never touch the source, always use it.  long means refold a header if
any if the source's component lines are longer than max_line_length.
'all' means refold everything.

Email5.1 wraps long lines, but leaves short lines alone.  Under 'long',
this code refolds the whole header if there is a long line in it.
I think that is more RFC compliant, and I don't think it will cause any
problems if used.

The default for refold_source is 'none'.  I'm considering this a bug
fix, since a stated goal of the email package is to reproduce the source
accurately if possible.

(Currently the new code still calls Header to do the folding; writing
the new folder is my next task.)

--
R. David Murray           http://www.bitdance.com

From stephen at xemacs.org  Tue Jul 26 06:03:11 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 26 Jul 2011 13:03:11 +0900
Subject: [Email-SIG]  header folding
In-Reply-To: <20110725194238.3ABCB2505A8@webabinitio.net>
References: <20110725194238.3ABCB2505A8@webabinitio.net>
Message-ID: <87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp>

R. David Murray writes:

 > the end.  Basically, BaseHeader gets a 'wrap' method, and there is
 > a new policy control, 'refold_source' (I'll probably rename it to
 > 'rewrap_source', since I expect to apply it also to message
 > bodies).

This bothers me.  Folding and wrapping are two different things.

Folding is about invertibly reformatting a single logical line to make
machines happy during transmission, what wrapping "does" is not 100%
clear to me but it's about making people happy.  (I put "does" in
quotes because it's not obvious to me that the source of wrapped text
necessarily is a single anything, nor that wrapping need be
invertible.)

I grant that people and many MUAs take a different point of view about
header folding, but clearly the RFCs have moved away from placing any
importance on presentation aspects toward specifying an invertible
transformation exactly.  On the other hand, I think that wrapping
should place emphasis on presentation.

From rdmurray at bitdance.com  Tue Jul 26 14:38:26 2011
From: rdmurray at bitdance.com (R. David Murray)
Date: Tue, 26 Jul 2011 08:38:26 -0400
Subject: [Email-SIG] header folding
In-Reply-To: <87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <20110725194238.3ABCB2505A8@webabinitio.net>
	<87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <20110726123827.3EC7FB14005@webabinitio.net>

On Tue, 26 Jul 2011 13:03:11 +0900, "Stephen J. Turnbull" <stephen at xemacs.org> wrote:
> R. David Murray writes:
> 
>  > the end.  Basically, BaseHeader gets a 'wrap' method, and there is
>  > a new policy control, 'refold_source' (I'll probably rename it to
>  > 'rewrap_source', since I expect to apply it also to message
>  > bodies).
> 
> This bothers me.  Folding and wrapping are two different things.
> 
> Folding is about invertibly reformatting a single logical line to make
> machines happy during transmission, what wrapping "does" is not 100%
> clear to me but it's about making people happy.  (I put "does" in
> quotes because it's not obvious to me that the source of wrapped text
> necessarily is a single anything, nor that wrapping need be
> invertible.)
> 
> I grant that people and many MUAs take a different point of view about
> header folding, but clearly the RFCs have moved away from placing any
> importance on presentation aspects toward specifying an invertible
> transformation exactly.  On the other hand, I think that wrapping
> should place emphasis on presentation.

Hmm.  Makes sense to me.  So you'd rather the method were called "fold"
and that refold_source remains the name of the policy control.

What's the word for what is done when a text message is made to have
a line length of less than 78 by using quoted printable (or base64)
encoding?  Is that also folding?  If there's no existing term in common
use, folding would make sense to me.  So I have no objection to using
'fold' consistently in the api and code for these operations.

Can anyone see a use case for controlling folding of headers separately
from folding of message bodies?  I haven't thought of one, which is why
I'm thinking one policy knob controls both.

--David

From stephen at xemacs.org  Wed Jul 27 09:18:36 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 27 Jul 2011 16:18:36 +0900
Subject: [Email-SIG] header folding
In-Reply-To: <20110726123827.3EC7FB14005@webabinitio.net>
References: <20110725194238.3ABCB2505A8@webabinitio.net>
	<87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110726123827.3EC7FB14005@webabinitio.net>
Message-ID: <87hb68qjhv.fsf@uwakimon.sk.tsukuba.ac.jp>

R. David Murray writes:

 > Hmm.  Makes sense to me.  So you'd rather the method were called "fold"
 > and that refold_source remains the name of the policy control.

Yes.

 > What's the word for what is done when a text message is made to have
 > a line length of less than 78 by using quoted printable (or base64)
 > encoding?

RFC 2045 discusses "insertion of soft line breaks"; it doesn't mention
a term like "folding".  "Folding" seems like a good term to me,
though.  Note that the RFC 2045 definition of quoted-printable says
that physical line length MUST be 76 characters or less, including any
terminating = but not the CRLF pair that separates lines.

 > Can anyone see a use case for controlling folding of headers
 > separately from folding of message bodies?  I haven't thought of
 > one, which is why I'm thinking one policy knob controls both.

The RFCs' treatments differ somewhat.  RFC 5322 has both a MUST NOT
and a SHOULD NOT exceed limit on line length (998 and 78 characters,
not including the CRLF, respectively).  RFC 2045 quoted-printable has
only the MUST NOT limit of 76 (but the difference in limits is not a
big deal).

It's not clear to me what exactly the policy knob you're talking about
is for body text.  There is no policy really allowed if quoted-
printable is being used.  So the policy knob is whether to use
quoted-printable to limit physical line length?

The only reason I can think of for having separate controls is that
many MUAs mishandle quoted-printable in the body text.  Patches don't
apply, one-time-key URLs in links get broken and fail to be
recognized.  On the other hand, header-folding rarely has such
consequences in my experience.


From rdmurray at bitdance.com  Wed Jul 27 14:20:38 2011
From: rdmurray at bitdance.com (R. David Murray)
Date: Wed, 27 Jul 2011 08:20:38 -0400
Subject: [Email-SIG] header folding
In-Reply-To: <87hb68qjhv.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <20110725194238.3ABCB2505A8@webabinitio.net>
	<87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110726123827.3EC7FB14005@webabinitio.net>
	<87hb68qjhv.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <20110727122039.9033E2506ED@webabinitio.net>

On Wed, 27 Jul 2011 16:18:36 +0900, "Stephen J. Turnbull" <stephen at xemacs.org> wrote:
> It's not clear to me what exactly the policy knob you're talking about
> is for body text.  There is no policy really allowed if quoted-
> printable is being used.  So the policy knob is whether to use
> quoted-printable to limit physical line length?

Well, I have *not* looked at this in detail yet.  By default nothing
is changed (refold_source='none').  My preliminary thought was that if
refold_source is 'long', and we come across a body that is wider than
the RFC limit (or if the application wants to reformat to a different
limit), we could reconstruct the body and refold it to the new limit.
Perhaps this is not practical/useful; as I say I haven't gotten there
yet :)

> The only reason I can think of for having separate controls is that
> many MUAs mishandle quoted-printable in the body text.  Patches don't
> apply, one-time-key URLs in links get broken and fail to be
> recognized.  On the other hand, header-folding rarely has such
> consequences in my experience.

That's an interesting point.  So perhaps I should rename the control
'header_source_refold'.  I hate making the name longer, but anything
less would be ambiguous, and I've already got other controls with long
names :(.  On the other hand, we could also provide a separate control
for whether or not quoted printable bodies in particular were folded,
and consider both controls when deciding what to do with a particular
quoted printable body.  I favor the latter at the moment.

--
R. David Murray           http://www.bitdance.com

From stephen at xemacs.org  Wed Jul 27 16:07:33 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 27 Jul 2011 23:07:33 +0900
Subject: [Email-SIG] header folding
In-Reply-To: <20110727122039.9033E2506ED@webabinitio.net>
References: <20110725194238.3ABCB2505A8@webabinitio.net>
	<87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110726123827.3EC7FB14005@webabinitio.net>
	<87hb68qjhv.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110727122039.9033E2506ED@webabinitio.net>
Message-ID: <87fwlrrf4q.fsf@uwakimon.sk.tsukuba.ac.jp>

R. David Murray writes:

 > That's an interesting point.  So perhaps I should rename the control
 > 'header_source_refold'.

I don't know have a strong opinion, but I tend to think it's
unnecessary.

 > On the other hand, we could also provide a separate control
 > for whether or not quoted printable bodies in particular were
 > folded,

If the body is already known to be quoted-printable, you don't really
have a choice.  Folding lines longer than 76 characters after
quoted-printable encoding is required by RFC 2045.  Of course you can
do more folding than necessary (eg, fold an 85-character line at 35
and 70 characters), but that doesn't seem very useful to me.

It seems to me that the policy question (if it exists) is "We have an
all-ASCII body with 'long lines'.  Shall we encode in quoted-printable
only for the purpose of folding the long lines?"

From rdmurray at bitdance.com  Wed Jul 27 17:34:13 2011
From: rdmurray at bitdance.com (R. David Murray)
Date: Wed, 27 Jul 2011 11:34:13 -0400
Subject: [Email-SIG] header folding
In-Reply-To: <87fwlrrf4q.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <20110725194238.3ABCB2505A8@webabinitio.net>
	<87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110726123827.3EC7FB14005@webabinitio.net>
	<87hb68qjhv.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110727122039.9033E2506ED@webabinitio.net>
	<87fwlrrf4q.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <20110727153414.A09C62506ED@webabinitio.net>

On Wed, 27 Jul 2011 23:07:33 +0900, "Stephen J. Turnbull" <stephen at xemacs.org> wrote:
> R. David Murray writes:
> 
>  > That's an interesting point.  So perhaps I should rename the control
>  > 'header_source_refold'.
> 
> I don't know have a strong opinion, but I tend to think it's
> unnecessary.
> 
>  > On the other hand, we could also provide a separate control
>  > for whether or not quoted printable bodies in particular were
>  > folded,
> 
> If the body is already known to be quoted-printable, you don't really
> have a choice.  Folding lines longer than 76 characters after
> quoted-printable encoding is required by RFC 2045.  Of course you can

Right, I realized what I said didn't make sense after I hit send :)

> do more folding than necessary (eg, fold an 85-character line at 35
> and 70 characters), but that doesn't seem very useful to me.

Well, the use case I was thinking of was fixing up non-conformant output
from another MUA (quoted printable but with overlong lines).  I don't
know if such exists in the wild, but I would expect that it does,
everything else seems to :)  Still it may be a YAGNI, since any such
are most likely to be spammers.

> It seems to me that the policy question (if it exists) is "We have an
> all-ASCII body with 'long lines'.  Shall we encode in quoted-printable
> only for the purpose of folding the long lines?"

Yes, that would be a similar case:  we have a body that doesn't conform
to the "SHOULD" limit of 78; if refold_source is 'long', should we
use QP to fold it?  But this question also arises if the application
is attaching a text part with lines longer than 78 characters.  As you
suggested it might be the case that we don't want to QP encode such text.
That question, QP encoding only to fold text parts with long lines, thus
seems to be a separate policy control (and I do think we want one for it).
So if we have 'refold_source' set to 'long', an unencoded text part with
long lines would get QP encoded if and only if this new policy setting
that we haven't named yet is set to fold such parts using QP.

--
R. David Murray           http://www.bitdance.com

From barry at python.org  Tue Jul 26 17:07:03 2011
From: barry at python.org (Barry Warsaw)
Date: Tue, 26 Jul 2011 11:07:03 -0400
Subject: [Email-SIG] header folding
In-Reply-To: <20110726123827.3EC7FB14005@webabinitio.net>
References: <20110725194238.3ABCB2505A8@webabinitio.net>
	<87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110726123827.3EC7FB14005@webabinitio.net>
Message-ID: <20110726110703.2ec7b1e6@resist.wooz.org>

On Jul 26, 2011, at 08:38 AM, R. David Murray wrote:

>On Tue, 26 Jul 2011 13:03:11 +0900, "Stephen J. Turnbull" <stephen at xemacs.org> wrote:
>> R. David Murray writes:
>> 
>>  > the end.  Basically, BaseHeader gets a 'wrap' method, and there is
>>  > a new policy control, 'refold_source' (I'll probably rename it to
>>  > 'rewrap_source', since I expect to apply it also to message
>>  > bodies).
>> 
>> This bothers me.  Folding and wrapping are two different things.
>> 
>> Folding is about invertibly reformatting a single logical line to make
>> machines happy during transmission, what wrapping "does" is not 100%
>> clear to me but it's about making people happy.  (I put "does" in
>> quotes because it's not obvious to me that the source of wrapped text
>> necessarily is a single anything, nor that wrapping need be
>> invertible.)
>> 
>> I grant that people and many MUAs take a different point of view about
>> header folding, but clearly the RFCs have moved away from placing any
>> importance on presentation aspects toward specifying an invertible
>> transformation exactly.  On the other hand, I think that wrapping
>> should place emphasis on presentation.
>
>Hmm.  Makes sense to me.  So you'd rather the method were called "fold"
>and that refold_source remains the name of the policy control.

Stephen makes a good one, one I agree with.

>What's the word for what is done when a text message is made to have
>a line length of less than 78 by using quoted printable (or base64)
>encoding?  Is that also folding?  If there's no existing term in common
>use, folding would make sense to me.  So I have no objection to using
>'fold' consistently in the api and code for these operations.

Haven't we used 'splitting' as a term for this, at least internally, in
previous versions?  That's at least what I think of, and I do think we could
have two knows to control the different functionality:

- To 'split' a line means to take a line longer than a specified maximum, and
  make it fit into the maximum line length, splitting at whitespace or other
  semantic separators.

- To 'fill' a header means to take the logical contents of the header and
  recombine and resplit it so that each line is as close to the maximum line
  length as possible.  My analogy here is Emacs's M-q (fill-paragraph).

What then is "folding" or "wrapping"?  Maybe no different than the above.

>Can anyone see a use case for controlling folding of headers separately
>from folding of message bodies?  I haven't thought of one, which is why
>I'm thinking one policy knob controls both.

You might have a message body that contains code, in which case you might want
to fill the headers (using the terminology above), but not fill the body.

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/email-sig/attachments/20110726/ad5aba38/attachment.pgp>

From barry at python.org  Wed Jul 20 00:45:46 2011
From: barry at python.org (Barry Warsaw)
Date: Tue, 19 Jul 2011 18:45:46 -0400
Subject: [Email-SIG] email-6.0.0.a1
In-Reply-To: <20110719212139.D5D732500D5@webabinitio.net>
References: <20110719212139.D5D732500D5@webabinitio.net>
Message-ID: <20110719184546.4eb8f52a@resist.wooz.org>

On Jul 19, 2011, at 05:21 PM, R. David Murray wrote:

>OK, so I've released the first iteration of the email6 package on pypi
>as email-6.0.0a1.  After install you import it as email6.  This will
>allow anyone curious and/or motivated to test it out under Python 3.2.
>I'm especially interested in anyone with a working program that uses
>email in 3.2: it should be completely backward compatible, and if it
>isn't I want to know ASAP.[*]

It'll take some time to digest, but congratulations RDM!  You've accomplished
an impressive milestone.

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/email-sig/attachments/20110719/4e1d2bf5/attachment-0001.pgp>

From barry at python.org  Tue Jul 26 17:07:03 2011
From: barry at python.org (Barry Warsaw)
Date: Tue, 26 Jul 2011 11:07:03 -0400
Subject: [Email-SIG] header folding
In-Reply-To: <20110726123827.3EC7FB14005@webabinitio.net>
References: <20110725194238.3ABCB2505A8@webabinitio.net>
	<87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110726123827.3EC7FB14005@webabinitio.net>
Message-ID: <20110726110703.2ec7b1e6@resist.wooz.org>

On Jul 26, 2011, at 08:38 AM, R. David Murray wrote:

>On Tue, 26 Jul 2011 13:03:11 +0900, "Stephen J. Turnbull" <stephen at xemacs.org> wrote:
>> R. David Murray writes:
>> 
>>  > the end.  Basically, BaseHeader gets a 'wrap' method, and there is
>>  > a new policy control, 'refold_source' (I'll probably rename it to
>>  > 'rewrap_source', since I expect to apply it also to message
>>  > bodies).
>> 
>> This bothers me.  Folding and wrapping are two different things.
>> 
>> Folding is about invertibly reformatting a single logical line to make
>> machines happy during transmission, what wrapping "does" is not 100%
>> clear to me but it's about making people happy.  (I put "does" in
>> quotes because it's not obvious to me that the source of wrapped text
>> necessarily is a single anything, nor that wrapping need be
>> invertible.)
>> 
>> I grant that people and many MUAs take a different point of view about
>> header folding, but clearly the RFCs have moved away from placing any
>> importance on presentation aspects toward specifying an invertible
>> transformation exactly.  On the other hand, I think that wrapping
>> should place emphasis on presentation.
>
>Hmm.  Makes sense to me.  So you'd rather the method were called "fold"
>and that refold_source remains the name of the policy control.

Stephen makes a good one, one I agree with.

>What's the word for what is done when a text message is made to have
>a line length of less than 78 by using quoted printable (or base64)
>encoding?  Is that also folding?  If there's no existing term in common
>use, folding would make sense to me.  So I have no objection to using
>'fold' consistently in the api and code for these operations.

Haven't we used 'splitting' as a term for this, at least internally, in
previous versions?  That's at least what I think of, and I do think we could
have two knows to control the different functionality:

- To 'split' a line means to take a line longer than a specified maximum, and
  make it fit into the maximum line length, splitting at whitespace or other
  semantic separators.

- To 'fill' a header means to take the logical contents of the header and
  recombine and resplit it so that each line is as close to the maximum line
  length as possible.  My analogy here is Emacs's M-q (fill-paragraph).

What then is "folding" or "wrapping"?  Maybe no different than the above.

>Can anyone see a use case for controlling folding of headers separately
>from folding of message bodies?  I haven't thought of one, which is why
>I'm thinking one policy knob controls both.

You might have a message body that contains code, in which case you might want
to fill the headers (using the terminology above), but not fill the body.

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/email-sig/attachments/20110726/ad5aba38/attachment-0002.pgp>

From rdmurray at bitdance.com  Wed Jul 27 22:56:19 2011
From: rdmurray at bitdance.com (R. David Murray)
Date: Wed, 27 Jul 2011 16:56:19 -0400
Subject: [Email-SIG] header folding
In-Reply-To: <20110726110703.2ec7b1e6@resist.wooz.org>
References: <20110725194238.3ABCB2505A8@webabinitio.net>
	<87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110726123827.3EC7FB14005@webabinitio.net>
	<20110726110703.2ec7b1e6@resist.wooz.org>
Message-ID: <20110727205620.08FCA2506ED@webabinitio.net>

On Tue, 26 Jul 2011 11:07:03 -0400, Barry Warsaw <barry at python.org> (by way of Barry Warsaw <barry at python.org>) wrote:
> On Jul 26, 2011, at 08:38 AM, R. David Murray wrote:
> >What's the word for what is done when a text message is made to have
> >a line length of less than 78 by using quoted printable (or base64)
> >encoding?  Is that also folding?  If there's no existing term in common
> >use, folding would make sense to me.  So I have no objection to using
> >'fold' consistently in the api and code for these operations.
> 
> Haven't we used 'splitting' as a term for this, at least internally, in
> previous versions?  That's at least what I think of, and I do think we could
> have two knows to control the different functionality:

'split' and 'wrap' seem to be used somewhat interchangeably in the
current code and docs.  I'm now consistently using 'fold' in the new
code.

> - To 'split' a line means to take a line longer than a specified maximum, and
>   make it fit into the maximum line length, splitting at whitespace or other
>   semantic separators.

My current code doesn't do this anywhere.  The old code does.

> - To 'fill' a header means to take the logical contents of the header and
>   recombine and resplit it so that each line is as close to the maximum line
>   length as possible.  My analogy here is Emacs's M-q (fill-paragraph).

Neither my current code nor the old code does exactly this anywhere.

> What then is "folding" or "wrapping"?  Maybe no different than the above.

Folding is an RFC term-of-art that implies the specific RFC rules for
making sure a semantic unit (header, body) has lines that are shorter
than the RFC defined maximum length.

Wrapping is much more like your 'filling', but probably a less precise
term, as filling does imply maximizing line lengths, while wrapping
to my ears does not have that connotation as a requirement.

'refolding', as I've implemented it, consists of taking an existing folded
header, unfolding it, and then folding it according to the RFC rules and
recommendations.  This may or may not put the maximum possible number
of characters on a line, depending on whether the header is structured
or unstructured and the content of said header.  And it may or may not
exactly reproduce the original header, depending on how closely the
original folder and I agree on our interpretation of the RFC rules :)
(Which is why headers are only refolded by explicit request.)

So, I agree with Stephen, I think 'folding' is the correct term to
use here.

> >Can anyone see a use case for controlling folding of headers separately
> >from folding of message bodies?  I haven't thought of one, which is why
> >I'm thinking one policy knob controls both.
> 
> You might have a message body that contains code, in which case you might want
> to fill the headers (using the terminology above), but not fill the body.

This is similar to the case we've already discussed, about excluding
a text body from being QP encoded.  I think we don't currently do
any paragraph reflow, but it might be an interesting facility to add :)

--
R. David Murray           http://www.bitdance.com

From barry at python.org  Thu Jul 28 01:10:42 2011
From: barry at python.org (Barry Warsaw)
Date: Wed, 27 Jul 2011 19:10:42 -0400
Subject: [Email-SIG] header folding
In-Reply-To: <20110727205620.08FCA2506ED@webabinitio.net>
References: <20110725194238.3ABCB2505A8@webabinitio.net>
	<87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110726123827.3EC7FB14005@webabinitio.net>
	<20110726110703.2ec7b1e6@resist.wooz.org>
	<20110727205620.08FCA2506ED@webabinitio.net>
Message-ID: <20110727191042.1268a098@resist.wooz.org>

On Jul 27, 2011, at 04:56 PM, R. David Murray wrote:

>Wrapping is much more like your 'filling', but probably a less precise
>term, as filling does imply maximizing line lengths, while wrapping
>to my ears does not have that connotation as a requirement.

Is it just the guarantee of maximizing line lengths that's missing?

>'refolding', as I've implemented it, consists of taking an existing folded
>header, unfolding it, and then folding it according to the RFC rules and
>recommendations.  This may or may not put the maximum possible number
>of characters on a line, depending on whether the header is structured
>or unstructured and the content of said header.  And it may or may not
>exactly reproduce the original header, depending on how closely the
>original folder and I agree on our interpretation of the RFC rules :)
>(Which is why headers are only refolded by explicit request.)
>
>So, I agree with Stephen, I think 'folding' is the correct term to
>use here.

Okay.  To me 'folding' is closer to 'splitting', while 'wrapping' is closer
'filling' since in what you describe above, there is an 'unfolding' operation
that happens first.  Note too that Emacs's filling doesn't guarantee maximal
line lengths (i.e. fill-column) either since long words can cause previous
lines to be shorter.  That seems analogous to your description above.

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/email-sig/attachments/20110727/828889c8/attachment.pgp>

From stephen at xemacs.org  Thu Jul 28 05:09:53 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 28 Jul 2011 12:09:53 +0900
Subject: [Email-SIG] header folding
In-Reply-To: <20110726110703.2ec7b1e6@resist.wooz.org>
References: <20110725194238.3ABCB2505A8@webabinitio.net>
	<87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110726123827.3EC7FB14005@webabinitio.net>
	<20110726110703.2ec7b1e6@resist.wooz.org>
Message-ID: <20016.54017.96798.606472@uwakimon.sk.tsukuba.ac.jp>

Barry Warsaw <barry at python.org> writes:

 > That's at least what I think of, and I do think we could
 > have two knows to control the different functionality:
 > 
 > - To 'split' a line means to take a line longer than a specified maximum, and
 >   make it fit into the maximum line length, splitting at whitespace or other
 >   semantic separators.

In the case of headers, "folding" is hallowed usage (going back to at
least RFC 733), and is very precisely defined by RFC 5322.  If we are
going to do something non-RFC conformant (yeah, right, we might do
that, eh?), "splitting" would be better.  If our implementation is
intended to be conformant, I think "folding" is preferable both for
familiarity and ease of reference ("look it up in RFC 5322").

I think the generalization to bodies is reasonable, although I haven't
found any RFC usage of "folding" in that context in a quick look.

 > - To 'fill' a header means to take the logical contents of the
 > header and recombine and resplit it so that each line is as close
 > to the maximum line length as possible.  My analogy here is Emacs's
 > M-q (fill-paragraph).

 > What then is [...] "wrapping"?  Maybe no different than the above.

In my dialect, what you describe as "filling" is (at least
potentially) far more sophisticated than what I mean by "wrapping".
Wrapping moves forward through each line and at the maximum length
backtracks to the rightmost break point in the line, breaking there,
then continuing the process in the tail line.  This could and often in
my experience does result in very uneven lines.

However, I don't think we're talking about filling here.  Filling IMHO
should be implemented by the email module, but it should be called
explicitly by the client, not imposed internally on the basis of a
global policy.

Consider the following ugly header (which is somewhat unlikely to
actually appear in a real use case, although it could easily result
from cut-and-paste into an MUA's to field):

To: Amie Cawinski <abc at abc.org>, Ichabod
 Tallman <imt at cow.org>

(there is no trailing whitespace on either line).  IMO, there are two
plausible fillings (assuming a limit of 78 characters) here:

To: Amie Cawinski <abc at abc.org>, Ichabod Tallman <imt at cow.org>

and

To: Amie Cawinski <abc at abc.org>,
    Ichabod Tallman <imt at cow.org>

of which the second will be uglified by a RFC-5322-conformant
processor into:

To: Amie Cawinski <abc at abc.org>,    Ichabod Tallman <imt at cow.org>

(note the extra space after the comma).  I personally don't consider
either of

To: Amie Cawinski <abc at abc.org>,
 Ichabod Tallman <imt at cow.org>

To: Amie Cawinski <abc at abc.org>,
<TAB>Ichabod Tallman <imt at cow.org>

plausible as a presentation, but YMMV.  So filling (to me) is about
presentation, not protocol conformance.

Anyway, I don't see how we can justify making *these* choices for the
user on the basis of a policy that really is about conservative
compliance to a wire protocol standard.  For example, I personally do
not "fill" 81-character subject headers; it's just too ugly.  However,
I might want my mail program to conservatively "fold" them, especially
for certain correspondents known to be stuck behind weird MTAs or MUAs.

 > You might have a message body that contains code, in which case you
 > might want to fill the headers (using the terminology above), but
 > not fill the body.

That's another example of why control for filling has to be flexible
(and why IMHO filling should be called explicitly by the client).

However, if the receiving MUA is RFC 2045-conformant, the user cannot
tell that quoted-printable folding was used.

From v+python at g.nevcal.com  Fri Jul 29 02:57:12 2011
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Thu, 28 Jul 2011 17:57:12 -0700
Subject: [Email-SIG] header folding
In-Reply-To: <20110726123827.3EC7FB14005@webabinitio.net>
References: <20110725194238.3ABCB2505A8@webabinitio.net>
	<87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110726123827.3EC7FB14005@webabinitio.net>
Message-ID: <4E320568.3080705@g.nevcal.com>

On 7/26/2011 5:38 AM, R. David Murray wrote:
> What's the word for what is done when a text message is made to have
> a line length of less than 78 by using quoted printable (or base64)
> encoding?  Is that also folding?  If there's no existing term in common
> use, folding would make sense to me.  So I have no objection to using
> 'fold' consistently in the api and code for these operations.

To me, "fold" means to divide _a_ long line into multiple short lines 
(less than line length).  (Barry calls this split, it seems.)

To me, "wrap" means to divide and join as necessary a set of lines 
(sometimes/often a paragraph) to achieve some number of similar length 
lines, not to exceed a line length limit, with possibly a shorter one at 
the end.

To me, "fill" means to divide and join as necessary a set of lines 
(sometimes/often a paragraph) to use as few lines as possible without 
exceeding a line length limit, usually resulting in a shorter one at the 
end. (Barry seems to have this same definition.)

For all the above, all divisions and joinings happen at white space 
sequences, and white space sequences are considered irrelevant in 
composition, and are generally reduced to a single space or newline as a 
side effect.

I think that if these terms are defined in the RFCs, that those 
definitions should be preferred to mine.

Some set of definitions needs to be agreed upon, before sensible 
communication can be made about what various algorithms should actually 
do, and what policy settings might be named, and what algorithms they 
would invoke.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/email-sig/attachments/20110728/98773ac7/attachment.html>

From stephen at xemacs.org  Fri Jul 29 06:40:56 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 29 Jul 2011 13:40:56 +0900
Subject: [Email-SIG] header folding
In-Reply-To: <4E320568.3080705@g.nevcal.com>
References: <20110725194238.3ABCB2505A8@webabinitio.net>
	<87mxg1r8n4.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110726123827.3EC7FB14005@webabinitio.net>
	<4E320568.3080705@g.nevcal.com>
Message-ID: <87mxfxpulj.fsf@uwakimon.sk.tsukuba.ac.jp>

Glenn Linderman writes:

 > To me, "wrap" means to divide and join as necessary a set of lines 
 > (sometimes/often a paragraph) to achieve some number of similar length 
 > lines, not to exceed a line length limit, with possibly a shorter one at 
 > the end.

Typically such usage is in contexts where a paragraph is represented
as a single physical line, though.  Your "set" is not part of "wrap"
in my dialect.

 > I think that if these terms are defined in the RFCs, that those 
 > definitions should be preferred to mine.

"Fold" is defined per RFC 5322.  The others don't seem to be.

I think "fold" should be used for the well-defined operation of header
folding (RFC 5322) and also for the well-defined operation of
"inserting a soft linebreak" in quoted-printable bodies (RFC 2045).
I'm happy with whatever usage others prefer for the other operations.

From Axel.Rau at Chaos1.DE  Fri Jul 29 13:18:20 2011
From: Axel.Rau at Chaos1.DE (Axel Rau)
Date: Fri, 29 Jul 2011 13:18:20 +0200
Subject: [Email-SIG] email-6.0.0.a1
In-Reply-To: <20110719212139.D5D732500D5@webabinitio.net>
References: <20110719212139.D5D732500D5@webabinitio.net>
Message-ID: <C7C71ED8-9B33-43BC-BEE9-3BACCDAD1C83@Chaos1.DE>


Am 19.07.2011 um 23:21 schrieb R. David Murray:

> I'm especially interested in anyone with a working program that uses
> email in 3.2: it should be completely backward compatible, and if it
> isn't I want to know ASAP.[*]
I just started testing a SpamCop reporter (800 lines of code).
Runs perfect so far.

Axel
---
PGP-Key:29E99DD6  ? +49 151 2300 9283  ? computing @ chaos claudius


From Axel.Rau at Chaos1.DE  Sat Jul 30 11:14:03 2011
From: Axel.Rau at Chaos1.DE (Axel Rau)
Date: Sat, 30 Jul 2011 11:14:03 +0200
Subject: [Email-SIG] email-6.0.0.a1
In-Reply-To: <C7C71ED8-9B33-43BC-BEE9-3BACCDAD1C83@Chaos1.DE>
References: <20110719212139.D5D732500D5@webabinitio.net>
	<C7C71ED8-9B33-43BC-BEE9-3BACCDAD1C83@Chaos1.DE>
Message-ID: <FC66690A-925A-4FA9-8395-C7FCEA648CF5@Chaos1.DE>


Am 29.07.2011 um 13:18 schrieb Axel Rau:

> 
> Am 19.07.2011 um 23:21 schrieb R. David Murray:
> 
>> I'm especially interested in anyone with a working program that uses
>> email in 3.2: it should be completely backward compatible, and if it
>> isn't I want to know ASAP.[*]
> I just started testing a SpamCop reporter (800 lines of code).
> Runs perfect so far.
1st problem:
----
Traceback (most recent call last):
 File "/usr/local/etc/exim/erdb_bt.py", line 834, in <module>
   reporter.addReport(spam)
 File "/usr/local/etc/exim/erdb_bt.py", line 227, in addReport
   self.flushReports()
 File "/usr/local/etc/exim/erdb_bt.py", line 258, in flushReports
   smtp.send_message(self.msg)
 File "/usr/local/lib/python3.2/smtplib.py", line 790, in send_message
   g.flatten(msg, linesep='\r\n')
 File "/usr/local/lib/python3.2/email/generator.py", line 99, in flatten
   self._write(msg)
 File "/usr/local/lib/python3.2/email/generator.py", line 145, in _write
   self._dispatch(msg)
 File "/usr/local/lib/python3.2/email/generator.py", line 171, in _dispatch
   meth(msg)
 File "/usr/local/lib/python3.2/email/generator.py", line 232, in _handle_multipart
   g.flatten(part, unixfrom=False, linesep=self._NL)
 File "/usr/local/lib/python3.2/email/generator.py", line 99, in flatten
   self._write(msg)
 File "/usr/local/lib/python3.2/email/generator.py", line 152, in _write
   self._write_headers(msg)
 File "/usr/local/lib/python3.2/email/generator.py", line 373, in _write_headers
   for h, v in msg._headers:
ValueError: too many values to unpack (expected 2)
----
At least seems not to be compatible with my bugs. What am I doing wrong?
	https://www.chaos1.de/svn-public/repos/network-tools/ERDB/trunk/database/erdb_bt.py
Axel
---
PGP-Key:29E99DD6  ? +49 151 2300 9283  ? computing @ chaos claudius