Decode email subjects into unicode

Jeffrey Froman jeffrey at fro.man
Tue Mar 18 12:24:03 EDT 2008


Laszlo Nagy wrote:

> I know that "=?UTF-8?B" means UTF-8 + base64 encoding, but I wonder if
> there is a standard method in the "email" package to decode these
> subjects?

The standard library function email.Header.decode_header will parse these
headers into an encoded bytestring paired with the appropriate encoding
specification, if any. For example:

>>> raw_headers = [
...     '=?koi8-r?B?4tnT1NLP19nQz8zOyc3PIMkgzcHMz9rB1NLB1M7P?=',
...     '[Fwd: re:Flags Of The World, Us States, And Military]',
...     '=?ISO-8859-2?Q?=E9rdekes?=',
...     '=?UTF-8?B?aGliw6Fr?=',
... ]
>>> from email.Header import decode_header
>>> for raw_header in raw_headers:
...     for header, encoding in decode_header(raw_header):
...         if encoding is None:
...             print header.decode()
...         else:
...             print header.decode(encoding)
...
Быстровыполнимо и малозатратно
[Fwd: re:Flags Of The World, Us States, And Military]
érdekes
hibák


Jeffrey



More information about the Python-list mailing list