Problem with accented characters in mailbox.Maildir()

jak nospam at please.ty
Sat May 6 07:38:49 EDT 2023


Chris Green ha scritto:
> I have a custom mail filter in python that uses the mailbox package to
> open a mail message and give me access to the headers.
> 
> So I have the following code to open each mail message:-
> 
>      #
>      #
>      # Read the message from standard input and make a message object from it
>      #
>      msg = mailbox.MaildirMessage(sys.stdin.buffer.read())
> 
> and then later I have (among many other bits and pieces):-
> 
>      #
>      #
>      # test for string in Subject:
>      #
>      if searchTxt in str(msg.get("subject", "unknown")):
>          do
>          various
>          things
> 
> 
> This works exactly as intended most of the time but occasionally a
> message whose subject should match the test is missed.  I have just
> realised when this happens, it's when the Subject: has accented
> characters in it (this is from a mailing list about canals in France).
> 
> So, for example, the latest case of this happening has:-
> 
>      Subject: aka Marne à la Saône (Waterways Continental Europe)
> 
> where the searchTxt in the code above is "Waterways Continental Europe".
> 
> 
> Is there any way I can work round this issue?  E.g. is there a way to
> strip out all extended characters from a string?  Or maybe it's
> msg.get() that isn't managing to handle the accented string correctly?
> 
> Yes, I know that accented characters probably aren't allowed in
> Subject: but I'm not going to get that changed! :-)
> 
> 

Hi,
you could try extracting the "Content-Type:charset" and then using it
for subject conversion:

subj = str(raw_subj, encoding='...')



More information about the Python-list mailing list