rfc822 module problem

Tim Roberts timr at probo.com
Thu May 15 23:57:02 EDT 2003


Francois Pinard <pinard at iro.umontreal.ca> wrote:

>Hi, people.  Here is a little session transcript:
>
>---------------------------------------------------------------------->
>Python 2.2.2 (#1, Jan 21 2003, 20:10:11) 
>[GCC 3.2] on linux2
>Type "help", "copyright", "credits" or "license" for more information.
>>>> from __future__ import nested_scopes, generators, division
>>>> import rfc822
>>>> pd = rfc822.parsedate_tz
>>>> pd('12, May 2003 14:16:47 +')
>None
>>>> pd('Mon, 12, May 2003 14:16:47 +')
>(2003, 5, 12, 14, 16, 47, 0, 0, 0, None)
>>>> pd('12 May 2003 14:16:47 +')
>(2003, 5, 12, 14, 16, 47, 0, 0, 0, None)
>>>>
>----------------------------------------------------------------------<
>
>I would not think a comma is allowed after the day in an RFC822 date, so
>my guess is that the first and third `pd()' above return a correct answer,
>while the second `pd()' should ideally return None, instead of being too
>permissive.  Or else, if it has to be permissive, then the first `pd()'
>should be permissive as well.  (I got its string from a spam message.)

Subtle issue.  In the first case, parsedate_tz decides that, if the first
word ends in a comma, it must be a day of the week, and discards it.  The
remaining string doesn't have a day of the month, so it fails.

However, after it has thrown away a day of the week, it specifically allows
and removes a comma after the dat of the month.

I'm not sure why you would call this a problem.  Today, most mailers use
the exact correct format, so a recognizer can be "picky".  However, that
wasn't always the case.  A decade ago, many mailers inserted blanks and
commas in unusual and unexpected places.  As long as rfc822 does the right
thing when given an RFC822-compliant date, I would call it an added benefit
that it is able to handle certain obvious malformations.

>About accepting a trailing ` +', this does not look right to my naive
>eyes, but maybe RFC822 does describe this as a degenerate zone reference,
>I do not know.  I would have expected that trailing ` +' to be rejected,
>and that all `pd()' return None.

You answered your own question, right?  RFC822 specifically allows it, so
the rfc822 module is correct.
-- 
- Tim Roberts, timr at probo.com
  Providenza & Boekelheide, Inc.




More information about the Python-list mailing list