Is there any way to make sense of these E-Mail subjects?

Dieter Maurer dieter at handshake.de
Fri Dec 24 16:55:20 EST 2021


Chris Green wrote at 2021-12-24 16:03 +0000:
>I have a Python 3 script which processes E-Mail caught in my hosting
>provider's 'catchall' mailbox.  It looks for things that *might* be
>useful E-Mails, forwards them, and throws the rest away.
> ...
>I have a function which, given a header name, extracts the header and
>returns it as a string:-
>
>    #
>    #
>    # Get a message header as a string
>    #
>    def getHdr(msg, header):
>       return str("\n  " + header + ": " + str(msg.get(header, "empty")))
>
>msg is a mailbox.mboxMessage object.
>
>
>This is mostly working as expected, returning the header contents as
>strings so I can output them to my log files as necessary.  However
>some Subject: lines are being returned like the following:-
>
>      Subject: [SPAM] =?UTF-8?B?8J+TtyBKb2huIEJheHRlci1C?=
...

Email headers follow the MIME standard and it requires that
they use ASCII only characters.
Thus, if the header content should contain non ASCII characters
some form of encoding becomees necessary.

The `=?UTF-8?B?8J+TtyBKb2huIEJheHRlci1C?=` is an encoded word.

An encoded word has the form `=?<charset>?<code type>?<encoding>?=`
<code type> is either `B` (base 64 encoding) or
`Q` (quoted printable encoding).
You decode an encoded word by decoding <encoding> with <code type>
and then interpret the resulting byte sequence in <charset>.

The `email` package contains functions to do this encoding.


I would likely serialize your `mboxMessage` as a string and
then use an `email` function to turn this string into an
`email` `Message` object. Those messages can return headers
in a decoded form.


More information about the Python-list mailing list