Problem with accented characters in mailbox.Maildir()

Chris Green cl at isbd.net
Sat May 6 06:13:03 EDT 2023


I have a custom mail filter in python that uses the mailbox package to
open a mail message and give me access to the headers.

So I have the following code to open each mail message:-

    # 
    # 
    # Read the message from standard input and make a message object from it 
    # 
    msg = mailbox.MaildirMessage(sys.stdin.buffer.read())

and then later I have (among many other bits and pieces):-

    #
    #
    # test for string in Subject:
    #
    if searchTxt in str(msg.get("subject", "unknown")):
        do
        various
        things


This works exactly as intended most of the time but occasionally a
message whose subject should match the test is missed.  I have just
realised when this happens, it's when the Subject: has accented
characters in it (this is from a mailing list about canals in France).

So, for example, the latest case of this happening has:-

    Subject: aka Marne à la Saône (Waterways Continental Europe)

where the searchTxt in the code above is "Waterways Continental Europe".


Is there any way I can work round this issue?  E.g. is there a way to
strip out all extended characters from a string?  Or maybe it's
msg.get() that isn't managing to handle the accented string correctly?

Yes, I know that accented characters probably aren't allowed in
Subject: but I'm not going to get that changed! :-)


-- 
Chris Green
·


More information about the Python-list mailing list