What do these '=?utf-8?' sequences mean in python?

Peter Pearson pkpearson at nowhere.invalid
Sat May 6 11:10:05 EDT 2023


On Sat, 6 May 2023 14:50:40 +0100, Chris Green <cl at isbd.net> wrote:
[snip]
> So, what do those =?utf-8? and ?= sequences mean?  Are they part of
> the string or are they wrapped around the string on output as a way to
> show that it's utf-8 encoded?

Yes, "=?utf-8?" signals "MIME header encoding".

I've only blundered about briefly in this area, but I think you
need to make sure that all header values you work with have been
converted to UTF-8 before proceeding.  
Here's the code that seemed to work for me:

def mime_decode_single(pair):
    """Decode a single (bytestring, charset) pair.
    """
    b, charset = pair
    result = b if isinstance(b, str) else b.decode(
        charset if charset else "utf-8")
    return result

def mime_decode(s):
    """Decode a MIME-header-encoded character string.
    """
    decoded_pairs = email.header.decode_header(s)
    return "".join(mime_decode_single(d) for d in decoded_pairs)



-- 
To email me, substitute nowhere->runbox, invalid->com.


More information about the Python-list mailing list