[Mailman-Users] Re: [Tutor] A question about Mailman soft. [hacking Mailman for fun and profit]
Ares Liu
gege at nst.pku.edu.cn
Fri Jul 26 15:40:34 CEST 2002
The mail I sent to tutor was with a Unicode encoded Subject line as follow:
Subject: =?utf-8?Q?Re:_=5BTutor=5D_=E6=B5=8B=E8=AF=95_for_test_pls_igno?=
=?utf-8?Q?re.?=
Date: Fri, 26 Jul 2002 13:43:05 +0800
MIME-Version: 1.0
Content-Type: text/plain;
charset="utf-8"
Content-Transfer-Encoding: base64
Your module did not handle utf-8 encode marker ?utf-8? :-) Decoded subject is
=?utf-8?Q?Re:_[Tutor]_\xe6\xb5\x8b\xe8\xaf\x95_for_test_pls_igno? =?utf-8?Q?re.?
^^^^^^^^^^^^^^^^^^^ 2 chinese words displayed in Idle. It's normal.
The mail I sent to my list was with base64 encoded Subject line as follow:
Subject: =?gb2312?B?UmU6IFtUZXN0XSCy4srUsuLK1A==?=
Date: Fri, 26 Jul 2002 18:43:53 +0800
MIME-Version: 1.0
Content-Type: text/plain;
charset="gb2312"
Content-Transfer-Encoding: base64
When I changed your module from
mimetools.encode(StringIO.StringIO(s), outputfile, 'quoted-pritable')
to
mimetools.encode(StringIO.StringIO(s), outputfile, 'base64')
I get an error: incorrect padding.
Then I delete "?gb2312?B?" from string. I get real string.
'Re: [Test] \xb2\xe2\xca\xd4\xb2\xe2\xca\xd4'
^^^^^^^^^^^^^^^^^^^^^^^^^^^ 4 chinese words displayed in Idle.
So a working module must handle encode marker firstly, and then decode the subject.
-Ares
----- Original Message -----
From: "Danny Yoo" <dyoo at hkn.eecs.berkeley.edu>
To: "Ares Liu" <gege at nst.pku.edu.cn>
Cc: <tutor at python.org>; <mailman-users at python.org>
Sent: Friday, July 26, 2002 5:22 PM
Subject: Re: [Tutor] A question about Mailman soft. [hacking Mailman for fun and profit]
> [Note: I'm CC'ing mailman-users as this might be useful for them.
> Hopefully, they'll correct my hack by telling me the right way to do this.
> *grin*]
>
>
>
> On Fri, 26 Jul 2002, Ares Liu wrote:
>
> > I checked the archive mail on mailman list. Some one had discussed this
> > question before.
>
> Do you have a link to that archived message? I'm interested in looking at
> this, just for curiosity's sake.
>
>
>
>
> > The reason is if I use no English words in the Subject Line, The
> > language code marker will added in fornt of "Re:"and encoding the
> > Subject as sth like "=?gb2312?B2xxxxxxxx?=".
>
> Yes, it looks like it wraps it in some kind of encoding... utf-8? I wish
> I knew more about Unicode.
>
>
>
> > It is surely that mailman could not search any reply keyword. So, added
> > prefix again.
>
>
> I think I understand better now. The problem is that the encoding leaves
> many of the characters alone, but transforms the braces in:
>
> '[Tutor]'
>
> to something like:
>
> '=5BTutor=5D'
>
> I'm guessing this because 0x5b and 0x5D are the ascii codes for braces:
>
> ###
> >>> chr(0x5b)
> '['
> >>> chr(0x5d)
> ']'
> ###
>
>
>
> Hmmmm. Wait. I've seen these characters before. Is this MIME encoding?
> MIME encoding is often used in representing language strings in email
> because almost all systems can handle it.
>
> ###
> >>> def mydecode(s):
> ... outputfile = StringIO.StringIO()
> ... mimetools.decode(StringIO.StringIO(s), outputfile,
> 'quoted-printable')
> ... return outputfile.getvalue()
> ...
> >>> mydecode('=5BTutor=5D')
> '[Tutor]'
> ###
>
> Ah ha! It looks like it! Good!
>
>
>
> In this case, maybe we can extend that check in
> Handlers.CookHeaders.process() to take this particular encoding into
> consideration: if we decode the header back to normal, then the prefix
> check will work.
>
>
>
> If you're feeling adventurous, and if you're comfortable editing Python,
> you can add this file, 'quoted_printable_decoder.py' in the
> 'Mailman/Handlers/' directory of Mailman:
>
> ######
> ## quoted_printable_decoder.py
>
> import StringIO, mimetools
> def decode_quoted_printable(s):
> """Given a mime 'quoted-printable' string s, returns its decoding.
> If anything bad happens, returns s."""
> try:
> outputfile = StringIO.StringIO()
> mimetools.decode(StringIO.StringIO(s), outputfile,
> 'quoted-printable')
> return outputfile.getvalue()
> except:
> return s
> ###
>
> This new module will convert the header and change all the '=5B' and '=5D'
> characters back into braces if it can do so safely. We'll be using it in
> a moment.
>
>
>
>
> Once you've added this module, within the same directory, let's modify
> CookHeaders.py to use this function.
>
> And make backups, because I have not tested this yet! *grin*
>
>
>
> Add at the top of the CookHeaders module:
>
> ###
> from quoted_printable_decoder import decode_quoted_printable
> ###
>
> so that Cookheaders knows about our new function. Finally, modify the
> check in the Cookheaders.process() function:
>
> ###
> elif prefix and not re.search(re.escape(prefix), subject, re.I):
> ###
>
>
> into:
>
> ###
> elif prefix\
> and not re.search(re.escape(prefix), subject, re.I)\
> and not re.search(re.escape(prefix),
> decode_quoted_printable(subject), re.I)
> ###
>
>
> I've modified the logic to include the prefix check on the decoded subject
> header. Ares, if this works, I'll send the patch over to the Mailman
> folks. Who knows; it might be useful for someone else out there. *grin*
>
>
>
> Best of wishes to you!
More information about the Mailman-Users
mailing list