What do these '=?utf-8?' sequences mean in python?

Chris Green cl at isbd.net
Mon May 8 04:45:43 EDT 2023


Keith Thompson <Keith.S.Thompson+u at gmail.com> wrote:
> Chris Green <cl at isbd.net> writes:
> > Chris Green <cl at isbd.net> wrote:
> >> I'm having a real hard time trying to do anything to a string (?)
> >> returned by mailbox.MaildirMessage.get().
> >> 
> > What a twit I am :-)
> >
> > Strings are immutable, I have to do:-
> >
> >     newstring = oldstring.replace("_", " ")
> >
> > Job done!
> 
> Not necessarily.
> 
> The subject in the original article was:
> =?utf-8?Q?aka_Marne_=C3=A0_la_Sa=C3=B4ne_(Waterways_Continental_Europe)?=
> 
> That's some kind of MIME encoding.  Just replacing underscores by spaces
> won't necessarily give you anything meaningful.  (What if there are
> actual underscores in the original subject line?)
> 
> You should probably apply some kind of MIME-specific decoding.  (I don't
> have a specific suggestion for how to do that.)
> 
Yes, OK, but my problem was that my filter looks for the string
"Waterways Continental Europe" in the message Subject: to route the
message to the appropriate mailbox.  When the Subject: has accents the
string becomes "Waterways_Continental_Europe" and thus the match
fails.  Simply changing all underscores back to spaces makes my test
for "Waterways Continental Europe" work.  The changed Subject: line
gets thrown away after the test so I don't care about anything else
getting changed.

(When there are no accented characters in the Subject: the string is
"Waterways Continental Europe" so I can't easily change the search
text. I guess I could use an RE.)

-- 
Chris Green
·


More information about the Python-list mailing list