Part of RFC 822 ignored by email module

Carl Banks pavlovevidence at gmail.com
Thu Jan 20 12:23:11 EST 2011


On Jan 20, 7:08 am, Bob Kline <bkl... at rksystems.com> wrote:
> I just noticed that the following passage in RFC 822:
>
>          The process of moving  from  this  folded   multiple-line
>          representation  of a header field to its single line represen-
>          tation is called "unfolding".  Unfolding  is  accomplished  by
>          regarding   CRLF   immediately  followed  by  a  LWSP-char  as
>          equivalent to the LWSP-char.
>
> is not being honored by the email module.  The following two invocations
> of message_from_string() should return the same value, but that's not
> what happens:
>
>  >>> import email
>  >>> email.message_from_string("Subject: blah").get('SUBJECT')
> 'blah'
>  >>> email.message_from_string("Subject:\n blah").get('SUBJECT')
> ' blah'
>
> Note the space in front of the second value returned, but missing from
> the first.  Can someone convince me that this is not a bug?

That's correct, according to my reading of RFC 822 (I doubt it's
changed so I didn't bother to look up what the latest RFC on that
subject is.)

The RFC says that in a folded line the whitespace on the following
line is considered a part of the line.  Relevant quite (section
3.1.1):


        Each header field can be viewed as a single, logical  line  of
        ASCII  characters,  comprising  a field-name and a field-body.
        For convenience, the field-body  portion  of  this  conceptual
        entity  can be split into a multiple-line representation; this
        is called "folding".  The general rule is that wherever  there
        may  be  linear-white-space  (NOT  simply  LWSP-chars), a CRLF
        immediately followed by AT LEAST one LWSP-char may instead  be
        inserted.  Thus, the single line

            To:  "Joe & J. Harvey" <ddd @Org>, JJV @ BBN

        can be represented as:

            To:  "Joe & J. Harvey" <ddd @ Org>,
                    JJV at BBN

        and

            To:  "Joe & J. Harvey"
                            <ddd@ Org>, JJV
             @BBN

        and

            To:  "Joe &
             J. Harvey" <ddd @ Org>, JJV @ BBN

             The process of moving  from  this  folded   multiple-line
        representation  of a header field to its single line represen-
        tation is called "unfolding".  Unfolding  is  accomplished  by
        regarding   CRLF   immediately  followed  by  a  LWSP-char  as
        equivalent to the LWSP-char.


Carl Banks



More information about the Python-list mailing list