rfc822 module problem
Tim Roberts
timr at probo.com
Thu May 15 23:57:02 EDT 2003
Francois Pinard <pinard at iro.umontreal.ca> wrote:
>Hi, people. Here is a little session transcript:
>
>---------------------------------------------------------------------->
>Python 2.2.2 (#1, Jan 21 2003, 20:10:11)
>[GCC 3.2] on linux2
>Type "help", "copyright", "credits" or "license" for more information.
>>>> from __future__ import nested_scopes, generators, division
>>>> import rfc822
>>>> pd = rfc822.parsedate_tz
>>>> pd('12, May 2003 14:16:47 +')
>None
>>>> pd('Mon, 12, May 2003 14:16:47 +')
>(2003, 5, 12, 14, 16, 47, 0, 0, 0, None)
>>>> pd('12 May 2003 14:16:47 +')
>(2003, 5, 12, 14, 16, 47, 0, 0, 0, None)
>>>>
>----------------------------------------------------------------------<
>
>I would not think a comma is allowed after the day in an RFC822 date, so
>my guess is that the first and third `pd()' above return a correct answer,
>while the second `pd()' should ideally return None, instead of being too
>permissive. Or else, if it has to be permissive, then the first `pd()'
>should be permissive as well. (I got its string from a spam message.)
Subtle issue. In the first case, parsedate_tz decides that, if the first
word ends in a comma, it must be a day of the week, and discards it. The
remaining string doesn't have a day of the month, so it fails.
However, after it has thrown away a day of the week, it specifically allows
and removes a comma after the dat of the month.
I'm not sure why you would call this a problem. Today, most mailers use
the exact correct format, so a recognizer can be "picky". However, that
wasn't always the case. A decade ago, many mailers inserted blanks and
commas in unusual and unexpected places. As long as rfc822 does the right
thing when given an RFC822-compliant date, I would call it an added benefit
that it is able to handle certain obvious malformations.
>About accepting a trailing ` +', this does not look right to my naive
>eyes, but maybe RFC822 does describe this as a degenerate zone reference,
>I do not know. I would have expected that trailing ` +' to be rejected,
>and that all `pd()' return None.
You answered your own question, right? RFC822 specifically allows it, so
the rfc822 module is correct.
--
- Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.
More information about the Python-list
mailing list