Regular Expressions and RFC 822

Mats Kindahl matkin at iar.se
Tue May 21 06:07:57 EDT 2002


Tim Roberts <timr at probo.com> writes:

> "alex gigh" <cogs2002 at hotmail.com> wrote:
> >
> >I am trying to write a mail server in Python and I found out that I can use 
> >regular expressions and then grouping:
> >
> >"For example, an RFC-822 header line is divided into a header name and a 
> >value, separated by a ":". This can be handled by writing a regular 
> >expression which matches an entire header line, and has one group which 
> >matches the header name, and another group which matches the
> >header's value.  
> >"
> 
> That's not exactly true.  RFC-822 header lines are often continued onto
> multiple lines.  A line following a header that starts with
> whitespace is automatically a continuation of the previous line:
> 
> Subject: This is a rather unusual
>     but perfectly legal
>     subject line that could not
>     be easily parsed with a simple regular
>     expression.
> To:  "Joe Cool"
>      <jcool at snoopy.com>,
>      "Charlie Brown"
>      <kicker at snoopy.com>
> 

I agree with the sentiment of using an already existing module, but
for the case at hand... dependig on what people consider as "simple"
regular expressions, the following code works just fine (and has,
IMHO, a simple regular expression):

    header = R'''
    Subject: This is a rather unusual
        but perfectly legal
        subject line that could not
        be easily parsed with a simple regular
        expression.
    To: "Joe Cool"
        <jcool at snoopy.com>,
        "Charlie Brown"
        <kicker at snoopy.com>
    '''

    import re
    pat = re.compile(R"^\S+?\:.+?(?=^\S+?\:|\Z)",
                     re.MULTILINE | re.DOTALL)

    matches = pat.findall(header)
    i = 1
    for x in matches:
        print "Match", i
        print "--------"
        print x
        i += 1

Never underestimate the power of a well articulated regular
expression. :)

Best wishes,

-- 
Mats Kindahl, IAR Systems, Sweden

Any opinions expressed are my own, not my company's.



More information about the Python-list mailing list