What do these '=?utf-8?' sequences mean in python?

Mon May 8 06:19:01 EDT 2023

Chris Green ha scritto:
> Keith Thompson <Keith.S.Thompson+u at gmail.com> wrote:
>> Chris Green <cl at isbd.net> writes:
>>> Chris Green <cl at isbd.net> wrote:
>>>> I'm having a real hard time trying to do anything to a string (?)
>>>> returned by mailbox.MaildirMessage.get().
>>>>
>>> What a twit I am :-)
>>>
>>> Strings are immutable, I have to do:-
>>>
>>>      newstring = oldstring.replace("_", " ")
>>>
>>> Job done!
>>
>> Not necessarily.
>>
>> The subject in the original article was:
>> =?utf-8?Q?aka_Marne_=C3=A0_la_Sa=C3=B4ne_(Waterways_Continental_Europe)?=
>>
>> That's some kind of MIME encoding.  Just replacing underscores by spaces
>> won't necessarily give you anything meaningful.  (What if there are
>> actual underscores in the original subject line?)
>>
>> You should probably apply some kind of MIME-specific decoding.  (I don't
>> have a specific suggestion for how to do that.)
>>
> Yes, OK, but my problem was that my filter looks for the string
> "Waterways Continental Europe" in the message Subject: to route the
> message to the appropriate mailbox.  When the Subject: has accents the
> string becomes "Waterways_Continental_Europe" and thus the match
> fails.  Simply changing all underscores back to spaces makes my test
> for "Waterways Continental Europe" work.  The changed Subject: line
> gets thrown away after the test so I don't care about anything else
> getting changed.
> 
> (When there are no accented characters in the Subject: the string is
> "Waterways Continental Europe" so I can't easily change the search
> text. I guess I could use an RE.)
> 

In reality you should also take into account the fact that if the header
contains a 'b' instead of a 'q' as a penultimate character, then the
rest of the package is converted on the basis64

"=?utf-8?Q?"  --> "=?utf-8?B?"