[Spambayes] Email package and the CRLF pair
Paul Moore
lists at morpheus.demon.co.uk
Sat Apr 19 17:59:53 EDT 2003
"Meyer, Tony" <T.A.Meyer at massey.ac.nz> writes:
>> Not an expert, but I'll do my best...
>
> Well, your comments matched those of *the* expert, so that's not bad!
Blind luck :-)
>> Interesting. Does it say anything about line terminators in
>> the body? It probably should, as email is a pure-text medium,
>> so you should be considering line termination for the whole
>> message, not just the headers.
>
> I hadn't bothered to look that far, but yes it does. It (RFC2822) says:
> "CR and LF MUST only occur together as CRLF; they MUST NOT appear
> independently in the body."
OK, that's definitive. But reality differs. Look at any mbox file on a
Unix system and you'll see LF terminators.
Actually, if you look at RFC822, section 1.1 (Scope), you'll see:
This standard specifies a syntax for text messages that are
sent among computer users, within the framework of "electronic
mail".
and later:
Note: This standard is NOT intended to dictate the internal for-
mats used by sites,
So this is pretty clear that RFC822 defines a network format, and not
a local file, or other, format. And mandating a specific line
termination convention is crucial for wire transfer formats.
You *could* argue that RFC822 has nothing to say outside the context
of network transfers, and so saying that the email package conforms to
RFC822 is meaningless. But that's hair splitting (which is one of my
hobbies, but I try not to inflict it on others :-))
The practical fact is that "colloquial" use of the term "RFC822"
refers to the header and body structure, but not such things as the
line termination, or the mandatory headers, etc. And the email package
works with that "colloquial" version.
As a general set of rules (which aren't stated anywhere) it's probably
fair to say that:
1. Modules which manipulate internet-format data (like email)
should work with line terminators of \n internally (just like
Python strings do).
2. Modules which transmit files across TCP/IP should canonicalise
any form of line ending to CRLF.
3. Modules which present data *received* from TCP/IP (like POP3)
should convert data to \n line endings before returning it to
the program.
4. Reading from the filesystem should be handled like (3), and
should support files opened in text or binary modes (or
universal newline mode in Python 2.3)
5. Writing to the filesystem should be done by assuming the data
uses \n internally (the above rules make this true) and writing
either in binary format (which leaves LFs in the files, ie Unix
format) or in text format (which converts the \n characters to
the platform native newline sequence).
This is basically "be lenient in what you accept, and strict in what
you send", plus "use \n internally as a line terminator".
The only places I know of where these rules don't work currently are
the imaplib bug you just raised, and a bug in the mailbox module I
raised a long time ago (http://www.python.org/sf/586899) which
basically notes that passing a file open in text mode to the mailbox
constructor doesn't work.
It might be nice to document these rules (or the right ones, rather
than just my unsubstantiated opinions :-)) somewhere. But I don't know
where, so I'm not volunteering :-)
Paul.
--
This signature intentionally left blank
More information about the Spambayes
mailing list