[Mailman-Users] Re: [Tutor] A question about Mailman soft. [hacking Mailman for fun and profit]

Fri Jul 26 15:40:34 CEST 2002

The mail I sent to tutor was with a Unicode encoded Subject line as follow:

Subject: =?utf-8?Q?Re:_=5BTutor=5D_=E6=B5=8B=E8=AF=95_for_test_pls_igno?=
 =?utf-8?Q?re.?=
Date: Fri, 26 Jul 2002 13:43:05 +0800
MIME-Version: 1.0
Content-Type: text/plain;
 charset="utf-8"
Content-Transfer-Encoding: base64

Your module did not handle utf-8 encode marker ?utf-8? :-) Decoded subject is
=?utf-8?Q?Re:_[Tutor]_\xe6\xb5\x8b\xe8\xaf\x95_for_test_pls_igno? =?utf-8?Q?re.?
                                      ^^^^^^^^^^^^^^^^^^^ 2 chinese words displayed in Idle. It's normal.

The mail I sent to my list was with base64 encoded Subject line as follow:

Subject: =?gb2312?B?UmU6IFtUZXN0XSCy4srUsuLK1A==?=
Date: Fri, 26 Jul 2002 18:43:53 +0800
MIME-Version: 1.0
Content-Type: text/plain;
 charset="gb2312"
Content-Transfer-Encoding: base64

When I changed your module from 
        mimetools.encode(StringIO.StringIO(s), outputfile, 'quoted-pritable') 
to 
        mimetools.encode(StringIO.StringIO(s), outputfile, 'base64')

I get an error: incorrect padding. 
Then I delete "?gb2312?B?" from string. I get real string. 

'Re: [Test] \xb2\xe2\xca\xd4\xb2\xe2\xca\xd4'
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 4 chinese words displayed in Idle.

So a working module must handle encode marker firstly,  and then decode the subject.

-Ares

----- Original Message ----- 
From: "Danny Yoo" <dyoo at hkn.eecs.berkeley.edu>
To: "Ares Liu" <gege at nst.pku.edu.cn>
Cc: <tutor at python.org>; <mailman-users at python.org>
Sent: Friday, July 26, 2002 5:22 PM
Subject: Re: [Tutor] A question about Mailman soft. [hacking Mailman for fun and profit]

> [Note: I'm CC'ing mailman-users as this might be useful for them.
> Hopefully, they'll correct my hack by telling me the right way to do this.
> *grin*]
> 
> 
> 
> On Fri, 26 Jul 2002, Ares Liu wrote:
> 
> > I checked the archive mail on mailman list. Some one had discussed this
> > question before.
> 
> Do you have a link to that archived message?  I'm interested in looking at
> this, just for curiosity's sake.
> 
> 
> 
> 
> > The reason is if I use no English words in the Subject Line, The
> > language code marker will added in fornt of "Re:"and encoding the
> > Subject as sth like "=?gb2312?B2xxxxxxxx?=".
> 
> Yes, it looks like it wraps it in some kind of encoding... utf-8?  I wish
> I knew more about Unicode.
> 
> 
> 
> > It is surely that mailman could not search any reply keyword. So, added
> > prefix again.
> 
> 
> I think I understand better now.  The problem is that the encoding leaves
> many of the characters alone, but transforms the braces in:
> 
>     '[Tutor]'
> 
> to something like:
> 
>     '=5BTutor=5D'
> 
> I'm guessing this because 0x5b and 0x5D are the ascii codes for braces:
> 
> ###
> >>> chr(0x5b)
> '['
> >>> chr(0x5d)
> ']'
> ###
> 
> 
> 
> Hmmmm.  Wait.  I've seen these characters before.  Is this MIME encoding?
> MIME encoding is often used in representing language strings in email
> because almost all systems can handle it.
> 
> ###
> >>> def mydecode(s):
> ...     outputfile = StringIO.StringIO()
> ...     mimetools.decode(StringIO.StringIO(s), outputfile,
> 'quoted-printable')
> ...     return outputfile.getvalue()
> ...
> >>> mydecode('=5BTutor=5D')
> '[Tutor]'
> ###
> 
> Ah ha!  It looks like it!  Good!
> 
> 
> 
> In this case, maybe we can extend that check in
> Handlers.CookHeaders.process() to take this particular encoding into
> consideration: if we decode the header back to normal, then the prefix
> check will work.
> 
> 
> 
> If you're feeling adventurous, and if you're comfortable editing Python,
> you can add this file, 'quoted_printable_decoder.py' in the
> 'Mailman/Handlers/' directory of Mailman:
> 
> ######
> ## quoted_printable_decoder.py
> 
> import StringIO, mimetools
> def decode_quoted_printable(s):
>     """Given a mime 'quoted-printable' string s, returns its decoding.
> If anything bad happens, returns s."""
>     try:
>         outputfile = StringIO.StringIO()
>         mimetools.decode(StringIO.StringIO(s), outputfile,
>                          'quoted-printable')
>         return outputfile.getvalue()
>     except:
>         return s
> ###
> 
> This new module will convert the header and change all the '=5B' and '=5D'
> characters back into braces if it can do so safely.  We'll be using it in
> a moment.
> 
> 
> 
> 
> Once you've added this module, within the same directory, let's modify
> CookHeaders.py to use this function.
> 
> And make backups, because I have not tested this yet!  *grin*
> 
> 
> 
> Add at the top of the CookHeaders module:
> 
> ###
> from quoted_printable_decoder import decode_quoted_printable
> ###
> 
> so that Cookheaders knows about our new function.  Finally, modify the
> check in the Cookheaders.process() function:
> 
> ###
>         elif prefix and not re.search(re.escape(prefix), subject, re.I):
> ###
> 
> 
> into:
> 
> ###
>         elif prefix\
>              and not re.search(re.escape(prefix), subject, re.I)\
>              and not re.search(re.escape(prefix),
>                                decode_quoted_printable(subject), re.I)
> ###
> 
> 
> I've modified the logic to include the prefix check on the decoded subject
> header.  Ares, if this works, I'll send the patch over to the Mailman
> folks.  Who knows; it might be useful for someone else out there.  *grin*
> 
> 
> 
> Best of wishes to you!