RFC 2047 parser

François Pinard pinard at iro.umontreal.ca
Thu Jul 6 19:53:38 EDT 2000


[Jason Abate]

> I was wondering if anyone has put together code for parsing mail headers
> encoded according to  RFC 2047 (Message Header Extensions for
> Non-ASCII Text).

I needed this soon after learning Python (so this is part of my first Python
lines, I would probably write something simpler today :-) and quickly
wrote what appears below.  However, I found out after the fact that the
Python library had something already.  See mimify.mime_decode_header.


# Handling of RFC 2047 (previously RFC 1522) headers.

import re, string

def to_latin1(text):
    return _sub_f(r'=\?ISO-8859-1\?Q\?([^?]*)\?=', re.I, _replace1, text)

def _replace1(match):
    return _sub_f('=([0-9A-F][0-9A-F])', re.I, _replace2,
                  re.sub('_', ' ', match.group(1)))

def _replace2(match):
    return chr(string.atoi(match.group(1), 16))

def _sub_f(pattern, flags, function, text):
    matcher = re.compile(pattern, flags).search
    position = 0
    results = []
    while 1:
        match = matcher(text, position)
        if not match:
            results.append(text[position:])
            return string.joinfields(results, '')
        results.append(text[position:match.start(0)])
        position = match.end(0)
        results.append(function(match))

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard





More information about the Python-list mailing list