[Mailman-Users] Archive merge and search

Hal my_list_address at yahoo.no
Tue Nov 18 15:35:46 CET 2014


On 07/11/2014 19:42, Mark Sapiro wrote:
> On 11/07/2014 03:52 AM, Hal wrote:

>> For not allowing text/html
  [snip]
>> But does the above apply to *all* archived postings, or does it only
>> filter anything that comes in from now on?
>
>
> It applies to all posts that arrive after you make those settings, buth
> in the archive and in the messages delivered to the list. It won't
> affect messages already archived or in the mylist.mbox file, even if you
> rebuild the archive.

That's fine, but good to know for later.
So for any new messages from now on I want my list to work this way:

1) HTML formatted postings should be converted to plain text before 
reaching other members.

2) HTML formatted postings can retain their formatting for the archive 
(I believe the archive is in the HTML format anyway?), but if it only 
archives whatever is sent to list members I don't mind. The important 
thing is that members receive plain text messages.

3) Since many people have their email programs set by default to send in 
HTML these days I just want Mailman to do its filtering, then continue 
by sending the posting as plain text without any moderator request or 
alerting the sender.

4) I'd like to block all attachements (list members should only receive 
plain text files).
40kb is already set for Max_message_size (in "General options" within 
the list administration web interface) which seems to have worked fine 
(as far as I know).

Furthermore I understand that Filter_filename_extensions (in the 
"Content filtering" section) in addition removes any attachements based 
on specific filename *extensions* regardless of their file size?

I see exe, bat, cmd and a bunch of other filetypes I've never heard of 
(geared towards Windows/DOS users I suppose -I'm a Mac user) are listed, 
but I suppose I could block .zip and those pesky .vcf/.vcard and 
"winmail.dat" files the same way.
When such extensions are encountered, are they just removed from the 
messages while the message posting itself is passed on to list members, 
or is the whole posting stopped for approval first?

I'm thinking out loud here, so feel free to chime in for better ideas, 
but I'm thinking there are two kind of attachement groups which need 
different actions to be taken:

Deliberate attachements: zip files, gif/jpg images etc. which a poster 
wants to share. The message/attachement should be stopped from reaching 
the list and an email sent to the poster with a "your message has been 
blocked. Please resend your message, this time without an attachement" 
type of message.

Accidental attachements: winmail.dat, .vcf or .vcard an so on. Many 
users don't know (as with HTML postings) that their email program is set 
up to send this stuff. IMHO those attachements don't have anything to do 
with the actual content of their postings, so Mailman should just remove 
the attachement(s), then pass on the rest of the message to the list.

Having said that, have I understood things correctly by setting up my 
"Content filtering" options as follows? (based on what you've said and 
what I've read here: 
http://wiki.list.org/pages/viewpage.action?pageId=4030684):

Edit_filter_content:	YES
Filter_mime_types:	(left blank)
Pass_mime_types:	multipart
			message/rfc822
			text/plain
			text/html
filter_filename_ext.:	exe
			bat
			cmd
			com
			pif
			scr
			vbs
			cpl
			zip
			dat
			vcf
			vcard
pass_filename_ext.:	(left blank)
Collapse_alternatives:	YES
conv_html_to_plaintext:	YES
Filter_action:		DISCARD


>>> A different obfuscation for email addresses would require source code
>>> modification. I.e., there's no 'plugin' for it.
>>
>> Is this a feature that could be suggested for the upcoming Mailman 3?
>> Perhaps an optional user-configuration through the web admin interface?
>
>
> Mailman 3 uses different and 'pluggable' archiving. The archiver that is
> bundled with MM 3 is called Hyperkitty. I'm not sure what address
> obfuscation it does.

Thanks. I'll look into it.



>> Failing that, is there a way I could have the (currently private)
>> archive have a filter before HTTP access?
>
> You could create your own CGI or other web process to access the
> archives and present them any way you want.

Being ignorant on the subject, what kind of pre-written CGI script 
should I try to find (i.e. "search engine to web archive gateway" or 
something like that?).

You previously suggested htdig (http://www.htdig.org/) with your patches 
for allowing my visitors to search through both the Mailman archives and 
my website. Assuming this is a more ready-to-use solution than the other 
search engines out there, are there features I will be missing out on 
(e.g. the ability to use CSS and Ajax for making its search results 
appear more in line with the rest of my website) and is it still secure? 
I've read that malicious code can sometimes be entered as search phrases 
and damage the database if the search engine isn't using "parametrized 
queries".

I've found other search engines (Nutch, Lucene, Solr, Tipue search, 
Xapian and Ajax live search) but I have no idea if they're suitable for 
my use and how well they work or how difficult they are to set up. 
Opinions from anyone are highly appreciated.
Thanks.


Hal


More information about the Mailman-Users mailing list