From alexander at sulfrian.net  Sun Apr  1 16:08:00 2012
From: alexander at sulfrian.net (Alexander Sulfrian)
Date: Sun, 01 Apr 2012 16:08:00 +0200
Subject: [Mailman-Developers] GSoC 2012 - NNTP archive access
In-Reply-To: <20120327165204.255d7527@resist.wooz.org>
Message-ID: <87obrbilz3.wl%alexander@sulfrian.net>

Hi,

On Tue Mar 27 22:52:04 CEST 2012, Barry Warsaw wrote:
>
> On Mar 27, 2012, at 09:09 PM, Alexander Sulfrian wrote:
>
> > What are the next steps you would propose. I unfortunately not up
> > to date with the development of mailman 3. But I am a little bit
> > familiar with the mailman 2 source code.
>
> MM3 will be a better platform to build something like the NNTP
> access on.  The question in my mind is whether this should be done
> as part of the various independent (but related) archiver projects,
> or whether it should be done as a separate "archiver".

there is a second question connected with that: Should the messages
be kept in an additional storage for NNTP access or should the default
archiver be responsible for storage and should be extended with methods
for accessing specific messages?

> In mm3, there's an API for feeding posted messages to an IArchiver,
> but this is quite flexible.  I could imagine that something on the
> other end of this vended messages via NNTP instead of HTTP. 

This would be the scenario if implementing the NNTP access in a new
archiver, separated from the other.

> The one key difference is that you'd like to be able to post to the
> mailing list through NNTP, with probably some additional posting
> rules (e.g. if you're not a member, but we "know" you, or you've
> been approved for posting a few times, your message wouldn't get
> held for moderator approval).

If it should be possible to post messages over the NNTP transport,
that does not match the classic design of an archiver. I do not know,
whether there is an API to post messages, but eventually it would be
better to implement the NNTP archive as external module, that could
maybe even run on a separate server. 

> If I was doing this, I'd probably looks seriously at Twisted as the
> basis for implementing the NNTP side of things.  I haven't looked in
> quite a while, but at the time, it had great support for NNTP
> server-side.

Yes, twisted should be the right choice. There is a twisted module for
implementing a NNTPServer[1], but it is not very well documented. But
even if it is not working, it should not be hard to implement it. The
NNTP commands described in RFC3977[2] do not look very complicated.

Additional to that, there is also the question, whether it should be
possible to sync a few mailman server over the NNTP protocol. That
would be a possibility to do clustering for load balancing or
something like that.

> Cheers,
> -Barry

Thanks,
Alex

[1] http://twistedmatrix.com/documents/current/api/twisted.news.nntp.NNTPServer.html
[2] http://tools.ietf.org/html/rfc3977
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120401/4a4626e6/attachment.pgp>

From barry at list.org  Sun Apr  1 21:05:15 2012
From: barry at list.org (Barry Warsaw)
Date: Sun, 1 Apr 2012 13:05:15 -0600
Subject: [Mailman-Developers] GSoC 2012 - NNTP archive access
In-Reply-To: <87obrbilz3.wl%alexander@sulfrian.net>
References: <20120327165204.255d7527@resist.wooz.org>
	<87obrbilz3.wl%alexander@sulfrian.net>
Message-ID: <20120401130515.05305816@resist.wooz.org>

BTW, the NNTP queue runner has now been ported to Mailman 3.  You will need to
re-run bin/buildout though, to pick up the new dependency on the mock library.

On Apr 01, 2012, at 04:08 PM, Alexander Sulfrian wrote:

>> MM3 will be a better platform to build something like the NNTP
>> access on.  The question in my mind is whether this should be done
>> as part of the various independent (but related) archiver projects,
>> or whether it should be done as a separate "archiver".
>
>there is a second question connected with that: Should the messages
>be kept in an additional storage for NNTP access or should the default
>archiver be responsible for storage and should be extended with methods
>for accessing specific messages?

This is a good, but larger question.  I've always thought that Mailman will
require a "message store" as defined in the IMessageStore interface.  What
might make sense is to have a single implementation that satisfies the
IArchiver and IMessageStore (and possibly other interfaces), but with a single
on-disk storage.  This could in fact be the thing that backs the prototype
archiver.

>> In mm3, there's an API for feeding posted messages to an IArchiver,
>> but this is quite flexible.  I could imagine that something on the
>> other end of this vended messages via NNTP instead of HTTP. 
>
>This would be the scenario if implementing the NNTP access in a new
>archiver, separated from the other.

With the above, you probably wouldn't need this except as you say, if it is a
separate archiver.

>> The one key difference is that you'd like to be able to post to the
>> mailing list through NNTP, with probably some additional posting
>> rules (e.g. if you're not a member, but we "know" you, or you've
>> been approved for posting a few times, your message wouldn't get
>> held for moderator approval).
>
>If it should be possible to post messages over the NNTP transport,
>that does not match the classic design of an archiver. I do not know,
>whether there is an API to post messages, but eventually it would be
>better to implement the NNTP archive as external module, that could
>maybe even run on a separate server. 

Yes, now that the NNTPRunner is functional, it should be possible to set this
up as posting to an NNTP service that a site could run, independent of
Mailman.

>> If I was doing this, I'd probably looks seriously at Twisted as the
>> basis for implementing the NNTP side of things.  I haven't looked in
>> quite a while, but at the time, it had great support for NNTP
>> server-side.
>
>Yes, twisted should be the right choice. There is a twisted module for
>implementing a NNTPServer[1], but it is not very well documented. But
>even if it is not working, it should not be hard to implement it. The
>NNTP commands described in RFC3977[2] do not look very complicated.
>
>Additional to that, there is also the question, whether it should be
>possible to sync a few mailman server over the NNTP protocol. That
>would be a possibility to do clustering for load balancing or
>something like that.

That's a pretty cool idea, actually.  Something fun to explore for 3.1
perhaps.

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120401/22c2617f/attachment.pgp>

From terri at zone12.com  Mon Apr  2 02:14:23 2012
From: terri at zone12.com (Terri Oda)
Date: Sun, 01 Apr 2012 18:14:23 -0600
Subject: [Mailman-Developers] For prospective GSoC students
Message-ID: <4F78EF5F.3040500@zone12.com>

Some things you should know:

1. Mailman is working under the umbrella organization of the Python 
Software Foundation, so we get hundreds of applications to sort through 
not all of which are related to Mailman.  Please make sure to put 
"Mailman" somewhere in the subject of your application so it doesn't get 
lost in the crowd!

2. On a related note, please make sure your application has a 
descriptive title.  i.e. "GNU Mailman: improving archives by extending 
hyperkitty" or somesuch.  Again, this makes it easier for us to sort 
through the applications in the system.

3. Applications are due April 6th.  Google will not extend this deadline 
for any reason, including if the entire melange system goes down.  (And 
this *has* happened at the last minute.)  Please make sure to get your 
applications in early if you can!  If you don't have an application in 
the system, there is no way we can accept you.

4. You can edit any application you submit, so what I recommend is that 
you all go and submit something right now with a note at the top saying 
that it is a draft.  You can edit it after this is done, and when you're 
ready to finalize you can take the "this is a draft" note off the top.

5. Not all of us are set up as mentors in Melange yet (most PSF mentor 
accounts were only authorized today) so it may still be a couple of days 
before you get feedback.  If you want feedback sooner, please feel free 
to post your proposal to this list!

  Terri


From msk at cloudmark.com  Mon Apr  2 23:58:55 2012
From: msk at cloudmark.com (Murray S. Kucherawy)
Date: Mon, 2 Apr 2012 21:58:55 +0000
Subject: [Mailman-Developers] Presenting on anti-abuse developments
Message-ID: <9452079D1A51524AA5749AD23E0039280C6A46@exch-mbx901.corp.cloudmark.com>

Hi all,

One of the hats I wear these days is technical committee co-chair for the Messaging Anti-Abuse Working Group (MAAWG).  I'm looking to fill slots for our Berlin (June) and Baltimore (October) conferences.

If someone on the mailman development team would like to come and speak about developments and features of Mailman (especially the new version) that try to deal with abuse mitigation issues, please contact me off-list.  I have a request in to the executive director to find out what support we offer to speakers in terms of expenses, etc., so I'll pass that on once I have it to anyone that replies.

Thanks,
-MSK

From terri at zone12.com  Tue Apr  3 00:07:00 2012
From: terri at zone12.com (Terri Oda)
Date: Mon, 02 Apr 2012 16:07:00 -0600
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
Message-ID: <4F7A2304.5060408@zone12.com>

On 03/29/2012 02:27 PM, David Jeske wrote:
> On Thu, Mar 29, 2012 at 9:16 AM, Stephen J. Turnbull<stephen at xemacs.org>wrote:
>> I would say you should try to retain copyright, and have the Mailman
>> project distribute it with the S-BSD license under the "mere
>> aggregation" clause of the GPL.
> This agrees with my view of the situation as well. Which leads to the
> question, is the above approach interesting/viable for Mailman-team?
> (assuming the code does something awesome that people want)

If the question is just "would you like another archiver even if the 
licenses don't match?" then I believe the answer is yes.  I think it 
would be really beneficial for us to have more than one archiver on the 
table sooner rather than later, and working with you to make sure all 
the plumbing is there to connect things would be really beneficial to 
us.  The licensing issue might mean you're probably not guaranteed a 
blessing as the standard archiving utility for Mailman, but that never 
stopped other projects like MhonArc!

But... since you arrived around the same time GSoC started, I should ask 
whether you were hoping to do this as a GSoC project?  It'd be a 
worthwhile project to put out there, but it might be lower priority for 
us than more direct development, since one of the goals of GSoC is to 
get new developers who are going to stay around and do future work with 
the project.

  Terri

From terri at zone12.com  Tue Apr  3 00:37:04 2012
From: terri at zone12.com (Terri Oda)
Date: Mon, 02 Apr 2012 16:37:04 -0600
Subject: [Mailman-Developers] Google Summer of Code: Integration of
 Search Code
In-Reply-To: <CAKfaKcPVgADGVJU95GK0G9m-8Y28wkYZzh5aq43QWRRH2JhS+Q@mail.gmail.com>
References: <CAKfaKcMF7YXPwByzg0GBeXtegQhbgB-5Y-fTeVi_kn-azFyenQ@mail.gmail.com>
	<20120326194107.GX11151@unaka.lan>
	<CAKfaKcPOHtFrdQs3XOLZ4-ALi5Br6kwkMtuikWEqfo97MqCyDQ@mail.gmail.com>
	<4F721331.60807@zone12.com>
	<CAL_0O1-cXLbxr=tCPGx4cvCbpy6bkRgeKZ1M4sX1twb1j-Ni3A@mail.gmail.com>
	<CAKfaKcNhqK4u4+=ApKx9uCXQe0Zd3m7yBL9tbmuUr+ggB75zQQ@mail.gmail.com>
	<CAL_0O196TmaOAZW3sU4xfMK1=Fp2bsY3o2AYX5ekWx3qVKhC2Q@mail.gmail.com>
	<CAKfaKcNbUnUg85KH2dydA8rHwvuU1xDjVmP5hqegZ14N5a-L_Q@mail.gmail.com>
	<CAL_0O183b-kZThC1jTRoCeDYBL+6NNS-Tqps8HD1y2jOg_hncA@mail.gmail.com>
	<CAKfaKcPVgADGVJU95GK0G9m-8Y28wkYZzh5aq43QWRRH2JhS+Q@mail.gmail.com>
Message-ID: <4F7A2A10.6060202@zone12.com>

On 03/29/2012 11:58 PM, Shayan Md wrote:
> Okay then, can you please tell me how we can put this search code in best
> use of mailman3? I have a proposal to write, I am getting unsure of things
> day by day. Can you also tell me who is the mentor of this project?

When it comes to writing your proposal, I'd be most impressed if you 
looked at search in terms of how it's going to be used.  Take a look at 
some of the work generated by previous years students on search use cases:

http://systers.org/systers-dev/doku.php/usecases-priya:start
http://systers.org/systers-dev/doku.php/mailman_archives_ui_-_yian_shang

These were generated from surveys of mailman users worldwide, so they 
probably show a reasonable picture of expected behaviour.  Figure out 
what sort of data structures an indexes you need to support these use 
cases and work from there.  Don't worry if it's not perfect; your best 
guess is fine for an initial application and we'll ask for clarification 
as necessary.

Please also remember to give a reasonably detailed timeline for what you 
plan to do (e.g. weekly milestones) and how you will integrate code on a 
weekly or biweekly basis. That helps us a lot when evaluating your proposal!

As for who will be mentoring search-related projects... we haven't 
decided.  I was planning to just let the mentors fight for the best 
students once we have all the applications in. ;)  More seriously, 
though, search touches on interests and expertise for pretty much all of 
our mentors, so the primary mentor for a search project will depend on 
what other applications we get.

  Terri

From terri at zone12.com  Tue Apr  3 00:46:31 2012
From: terri at zone12.com (Terri Oda)
Date: Mon, 02 Apr 2012 16:46:31 -0600
Subject: [Mailman-Developers] [GSoC 2012] Candidate on 'Integration of
 (existing) search code into Mailman archives'
In-Reply-To: <CACeRBzkPqWpDfSt4KqtACYSTaOdSxBuTaoMixJ=Tbq_dOHGBAQ@mail.gmail.com>
References: <CACeRBzkPqWpDfSt4KqtACYSTaOdSxBuTaoMixJ=Tbq_dOHGBAQ@mail.gmail.com>
Message-ID: <4F7A2C47.1070002@zone12.com>

Hi George,

Your MailmanStats project looks great and would totally fit with what we 
have in mind for stats, though I'm guessing the hyperkitty team has some 
much more extensive work in mind making use of post ratings, tags, etc.

If you're putting together your proposal now, do feel free to mention 
both projects as sources of interest.  Since you already have the stats 
code available, it might be possible to toss the integration in there 
after doing some other work.  Normally I worry about students biting off 
more than they can chew, but given your prior experience with Mailman 
and the fact that you already have the basic code, you can make a case 
for being able to package up that code and contribute it in a week or 
two our of your summer if you're ready for a code review.


  Terri

PS -  For further advice regarding search projects, see my previous post 
to mailman-developers.

On 03/26/2012 03:38 PM, George Chatzisofroniou wrote:
> Hello Mailman Developers,
>
> My name is George Chatzisofroniou, i'm 20 years old and i'm an
> undergraduate student in the Department of Informatics at the
> University of Piraeus (Greece).
>
> ? have really good previous experience with Mailman. This is because i
> use it for managing mailing lists for almost three years.
>
> I have also developed, with a friend of mine, MailmanStats [1], a
> Python software that outputs statistics for a mailing list based on
> Mailman. I think this implements the 'metric' idea in some way. I
> would like to know your opinion about MailmanStats.
>
> I'm sending this mail to inform you about my will to be part of
> Mailman Development team starting by Google Summer of Code 2012. The
> idea that excites me more is the 'Integration of (existing) search
> code into Mailman archives'. I think it is better to be developed on
> Mailman v3 rather than v2. I realize the significance of a feature
> like this. Many times before, i've got through the archives to search
> for a specific thread, so an addition like this would be great!
>
> As another student mentioned this idea is kinda small for the whole
> summer, so if there is time left i could integrate my MailmanStats [1]
> software into Mailman and/or build CSS styles for the web UI.
>
> Please tell me what you think. I'm also on IRC by the name sophron.
>
> Thanks,
>
> [1]: http://mailmanstats.latthi.com/
>
>
>


From davidj at gmail.com  Tue Apr  3 05:04:23 2012
From: davidj at gmail.com (David Jeske)
Date: Mon, 2 Apr 2012 20:04:23 -0700
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <4F7A2304.5060408@zone12.com>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
Message-ID: <CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>

On Apr 2, 2012 3:07 PM, "Terri Oda" <terri at zone12.com> wrote:
>> This agrees with my view of the situation as well. Which leads to the
>> question, is the above approach interesting/viable for Mailman-team?
>> (assuming the code does something awesome that people want)
>
> If the question is just "would you like another archiver even if the
licenses don't match?" then I believe the answer is yes.

The question i "would you BUNDLE another archiver even if the licenses
don't match?"

My archiver has been available for download (like many others) for ten
years. All these sites are still running a limping pipermail archive,
because it's bundled. I want to get Mailman a better bundled archive.

> But... since you arrived around the same time GSoC started, I should ask
whether you were hoping to do this as a GSoC project?

Perhaps it would make things more clear if I expledin why I'm here...

I'm not a student. I've been working in software for 15 years, programming
for almost 30 (since I was 9). I wrote large portions of eGroups / Yahoo
Groups / Google Groups. I'm a successful post-Google entrepreneur. Since
leaving Google I've been angel investing mostly in tech stuff (see my Angel
List).. I've been donating notable chunks of money and time to open source
projects (with my blender donations working out the best so far). Given my
history, and the fact that I keep wanting to tear my hear out reading
mailing list archives in pipermail, I thought I'd give you folks an
archiver that would be nice.

HOWEVER, I personally will not write GPL code. I might submit a tiny patch
or bugfix, but I'm simply opposed to restrictions on how someone uses
something that I'm trying to donate to the software community. (i.e. you're
never going to turn me into a mailman developer, the best you'd get is me
writing my own mailman-ish and releassing it under S-BSD.. if you want
that, let me know)

From pingou at pingoured.fr  Tue Apr  3 08:15:48 2012
From: pingou at pingoured.fr (Pierre-Yves Chibon)
Date: Tue, 03 Apr 2012 08:15:48 +0200
Subject: [Mailman-Developers] Additional Mailman GSoC mentors
In-Reply-To: <20120329145310.GE11151@unaka.lan>
References: <4F7415CF.20304@zone12.com>  <20120329145310.GE11151@unaka.lan>
Message-ID: <1333433748.24909.11.camel@ambre.pingoured.fr>

On Thu, 2012-03-29 at 07:53 -0700, Toshio Kuratomi wrote:
> On Thu, Mar 29, 2012 at 01:57:03AM -0600, Terri Oda wrote:
> > It's looking like we're going to have more student applicants than in
> > previous years, so I think it'd be great if we could get a few more
> > mentors to match.
> > 
> > If you're a semi-active mailman developer (i.e. I'm going to
> > recognize your name from your mailman-developers postings) and you
> > think you might interested in mentoring for GSoC this summer or just
> > want to know what's involved, please get in touch with me!
> > 
> I'm willing to help mentor some work.  I'd really like to mentor with
> some other people -- especially at the application review stages -- I do
> have more time for day-to-day mentoring if that's done on IRC (Interrupt
> Driven Design, anyone ;-)  But so far, my knowledge of the mailman codebase
> is limited mainly to archiver stuff.
> 
> I can answer questions about using bzr and some launchpad questions
> (although I also have lots of launchpad questions of my own :-).  I'm now
> fully versed in Warsaw import style rules although I should probably
> recertify at the next pycon :-)
> 
> It does seem like there's a lot of interest in archivers this year (at
> least, people have been pinging me about that.  Since archivers for mailman3
> are somewhat in their infancy, it would be good to think of a "what do we
> want the state of archivers to be after the summer and a year from now" so
> that we can make sure that GSoC work fits into that.

Hi,

I actually would not mind give a hand to monitor someone, only point I
do not want to supervise alone, I have no experience with the GSoC
either as student or supervisor.

Regards,
Pierre


From a.badger at gmail.com  Tue Apr  3 20:58:22 2012
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Tue, 3 Apr 2012 11:58:22 -0700
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
Message-ID: <20120403185822.GI11151@unaka.lan>

On Mon, Apr 02, 2012 at 08:04:23PM -0700, David Jeske wrote:
> On Apr 2, 2012 3:07 PM, "Terri Oda" <terri at zone12.com> wrote:
> >> This agrees with my view of the situation as well. Which leads to the
> >> question, is the above approach interesting/viable for Mailman-team?
> >> (assuming the code does something awesome that people want)
> >
> > If the question is just "would you like another archiver even if the
> licenses don't match?" then I believe the answer is yes.
> 
> The question i "would you BUNDLE another archiver even if the licenses
> don't match?"
> 
> My archiver has been available for download (like many others) for ten
> years. All these sites are still running a limping pipermail archive,
> because it's bundled. I want to get Mailman a better bundled archive.
> 
From the talk about what it means to be a FSF project at the mailman sprint
at pycon I don't think a non-FSF copyright assigned archiver would be
bundled into mailman (Core).

Distributed/pointed to by list.org along with mailman and postorius might be
negotiable though :-)  Would that be something you'd like to pursue?

Also -- mailman3's builtin archiver is extremely minimal -- at the moment,
it archives (stores) mail but it doesn't have a means to display that email
on a web page or similar.  Given that sort of bundled archiver, I have
a feeling sites are going to want to run a third-party archiver of some sort
instead of the default.

> 
> HOWEVER, I personally will not write GPL code. I might submit a tiny patch
> or bugfix, but I'm simply opposed to restrictions on how someone uses
> something that I'm trying to donate to the software community. (i.e. you're
> never going to turn me into a mailman developer, the best you'd get is me
> writing my own mailman-ish and releassing it under S-BSD.. if you want
> that, let me know)
>
General impression from talking to a few other developers at PyCon is we
generally like copyleft licenses.  Some version of copyleft is likely what
a lot of us would choose to license our own code under.  A few of us are
unhappy when our code is used to make closed source applications.

Mailman2 is an FSF project.  mailman3 and postorius are both derivatives of
mailman2 and so they are both FSF projects.  FSF projects must do copyright
assignment to the FSF and are licensed with one of the GNU licenses.

Where could your archiver fit into that sequence of impressions?  I'm not
entirely sure.  I think that it probably couldn't be bundled into the same
tarball with mailman core due to mailman being an FSF project.  But pointing
to it from list.org or blessing it as the "standard archiver" for mailman3
is probably something that could be discussed by the core devs and yourself.

I don't think you're going to find the will to make this sort of decision
right at this instant because what we want the archiver ecosystem to look
like for mailman3 is somewhat in the air.  Do we really want an obviously
less capable archiver to be the bundled archiver?  Do we want to have
a single blessed archiver (probably in a separate tarball as postorius, the
admin web ui, is separate) as an eventual goal?  Do we want (at least for
a year or two) to let people go to town with their new ideas for archivers
and then see if a best-of-breed archiver is raising its head?  I don't
believe any of this is decided inside of our minds yet, so, for now, people
are defaulting to wait and see.

-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120403/6b2ad316/attachment.pgp>

From adam-mailman at amyl.org.uk  Wed Apr  4 00:48:08 2012
From: adam-mailman at amyl.org.uk (Adam McGreggor)
Date: Tue, 3 Apr 2012 23:48:08 +0100
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
Message-ID: <20120403224808.GF4783@hendricks.amyl.org.uk>

On Mon, Apr 02, 2012 at 08:04:23PM -0700, David Jeske wrote:
> HOWEVER, I personally will not write GPL code. I might submit a tiny patch
> or bugfix, but I'm simply opposed to restrictions on how someone uses
> something that I'm trying to donate to the software community.

+1.

(as well as the bloody length of the GPL, and its age.)

-- 
"Of course we are not patronising women. We are just going to explain to
  them, in words of one syllable, what it is all about."
    -- Olga Maitland

From pabs3 at bonedaddy.net  Wed Apr  4 03:32:55 2012
From: pabs3 at bonedaddy.net (Paul Wise)
Date: Wed, 4 Apr 2012 09:32:55 +0800
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <20120403185822.GI11151@unaka.lan>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
	<20120403185822.GI11151@unaka.lan>
Message-ID: <CAKTje6FtA2i9F9VvbcSDPkhrOinB84-N86yTdQXdsiPTjo993A@mail.gmail.com>

On Wed, Apr 4, 2012 at 2:58 AM, Toshio Kuratomi wrote:

> I don't think you're going to find the will to make this sort of decision
> right at this instant because what we want the archiver ecosystem to look
> like for mailman3 is somewhat in the air. ?Do we really want an obviously
> less capable archiver to be the bundled archiver? ?Do we want to have
> a single blessed archiver (probably in a separate tarball as postorius, the
> admin web ui, is separate) as an eventual goal? ?Do we want (at least for
> a year or two) to let people go to town with their new ideas for archivers
> and then see if a best-of-breed archiver is raising its head? ?I don't
> believe any of this is decided inside of our minds yet, so, for now, people
> are defaulting to wait and see.

I think it would be a mistake to bundle any archiver with mailman3.
Listing the available archiver options and their features and
shortcomings would be a better way to go.

-- 
bye,
pabs

http://wiki.debian.org/PaulWise

From bob at nleaudio.com  Wed Apr  4 05:21:14 2012
From: bob at nleaudio.com (Bob Puff)
Date: Tue, 3 Apr 2012 23:21:14 -0400
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <CAKTje6FtA2i9F9VvbcSDPkhrOinB84-N86yTdQXdsiPTjo993A@mail.gmail.com>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
	<20120403185822.GI11151@unaka.lan>
	<CAKTje6FtA2i9F9VvbcSDPkhrOinB84-N86yTdQXdsiPTjo993A@mail.gmail.com>
Message-ID: <20120404031831.M75170@nleaudio.com>


> I think it would be a mistake to bundle any archiver with mailman3.
> Listing the available archiver options and their features and
> shortcomings would be a better way to go.

-1

I think the majority of MM users will be simply using the RPM that comes with
their distro, and there is a real benefit to stuff working right "out of the
box".  This includes the Archiving functions.  

Its great to have options, and giving a list of possible alternatives for
users is excellent, but I think releasing MM 3 without -any- archiver is a
down-grade from the current MM 2.x.

Bob

From pabs at debian.org  Wed Apr  4 05:41:42 2012
From: pabs at debian.org (Paul Wise)
Date: Wed, 4 Apr 2012 11:41:42 +0800
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <20120404031831.M75170@nleaudio.com>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
	<20120403185822.GI11151@unaka.lan>
	<CAKTje6FtA2i9F9VvbcSDPkhrOinB84-N86yTdQXdsiPTjo993A@mail.gmail.com>
	<20120404031831.M75170@nleaudio.com>
Message-ID: <CAKTje6FgaJh=7LoYbT69F+WW5sZVuxoSXxMDxMbzg5gZSNf21Q@mail.gmail.com>

On Wed, Apr 4, 2012 at 11:21 AM, Bob Puff wrote:

> I think the majority of MM users will be simply using the RPM that comes with
> their distro, and there is a real benefit to stuff working right "out of the
> box". ?This includes the Archiving functions.
>
> Its great to have options, and giving a list of possible alternatives for
> users is excellent, but I think releasing MM 3 without -any- archiver is a
> down-grade from the current MM 2.x.

In the Debian world we would do something like this:

Package: mailman3
Depends: mailman3-archiver-default | mailman3-archiver

Package: mailman3-archiver-default
Depends: mailman3-archiver-hyperkitty

Package: mailman3-archiver-hyperkitty
Provides: mailman3-archiver

Is something like that not possible in the RPM world?

I'm subscribed to the list, no need to CC me.

-- 
bye,
pabs

http://bonedaddy.net/pabs3/

From davidj at gmail.com  Wed Apr  4 06:16:28 2012
From: davidj at gmail.com (David Jeske)
Date: Tue, 3 Apr 2012 21:16:28 -0700
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <20120404031831.M75170@nleaudio.com>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
	<20120403185822.GI11151@unaka.lan>
	<CAKTje6FtA2i9F9VvbcSDPkhrOinB84-N86yTdQXdsiPTjo993A@mail.gmail.com>
	<20120404031831.M75170@nleaudio.com>
Message-ID: <CA+CP9O6u3y7iOzmpdiry2WNzotu29iRx+UXb_Ub+DKvbjhkCog@mail.gmail.com>

On Apr 3, 2012 8:14 PM, "Bob Puff" <bob at nleaudio.com> wrote:
> > I think it would be a mistake to bundle any archiver with mailman3.
> > Listing the available archiver options and their features and
> > shortcomings would be a better way to go.
>
> -1
>
> I think the majority of MM users will be simply using the RPM that comes
with
> their distro, and there is a real benefit to stuff working right "out of
the
> box".  This includes the Archiving functions.
>
> Its great to have options, and giving a list of possible alternatives for
> users is excellent, but I think releasing MM 3 without -any- archiver is a
> down-grade from the current MM 2.x.

I agree. If MM2 and pipermail is any indication of how often admins just
'leave the defaults', then bunding no archive interface with MM3 would mean
most mailing lists would have no archive.

I'd personally like to see a better archiver rolled into an MM2 point
release, as well as upcoming MM3 development. (I understand pipermail URL
compat would be nice in that case).

From davidj at gmail.com  Wed Apr  4 06:33:33 2012
From: davidj at gmail.com (David Jeske)
Date: Tue, 3 Apr 2012 21:33:33 -0700
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <20120403185822.GI11151@unaka.lan>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
	<20120403185822.GI11151@unaka.lan>
Message-ID: <CA+CP9O4E3x+LE-H4jcBwirxEkkPJGoHxiZX1mi691DGKCdVYRQ@mail.gmail.com>

On Apr 3, 2012 11:58 AM, "Toshio Kuratomi" <a.badger at gmail.com> wrote:
> > The question is "would you BUNDLE another archiver even if the licenses
> > don't match?"

> Where could your archiver fit into that sequence of impressions?  I'm not
> entirely sure.  I think that it probably couldn't be bundled into the same
> tarball with mailman core due to mailman being an FSF project.

I'm just going to charge down the path I was on and finish up something
that's a great drop in for MM2/MM3. I'll even try to add some pipermail URL
compatibility. It'll be S-BSD, so (if you like it) the MM devs and the FSF
can wrestle with issues of whether you want to bundle it as is, put a
rubber GPL stamp on it, or just point to it like you would any other
archiver.

I honestly expected to have an updated UI to show by now. I've been busy
with some code-restructuring, and an unbelievable amount of life-stuff came
across my bow in the past week. It shouldn't be too long now.

> But pointing to it from list.org or blessing it as the "standard
archiver" for mailman3
> is probably something that could be discussed by the core devs and
yourself.

I'm a bit scared of a world where MM3 does not include any archiver. If
pipermail popularity is any indication of how often admins 'stick with the
bundled defaults', we could have an unreasonable number of MM3 lists with
no archives at all.

Obviously the team is free to bless any archiver it wants, mine or others.

Also, I'm certainly NOT trying to get anyone to agree to bless an archiver
before they've even seen it working and kicking butt. I was just trying to
understand the many issues as I'm cleaning up my code and trying to find it
a home with a bit more utility. I think I have a great idea from all the
disussions here.. THANKS!

From a.badger at gmail.com  Wed Apr  4 06:55:54 2012
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Tue, 3 Apr 2012 21:55:54 -0700
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <CAKTje6FgaJh=7LoYbT69F+WW5sZVuxoSXxMDxMbzg5gZSNf21Q@mail.gmail.com>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
	<20120403185822.GI11151@unaka.lan>
	<CAKTje6FtA2i9F9VvbcSDPkhrOinB84-N86yTdQXdsiPTjo993A@mail.gmail.com>
	<20120404031831.M75170@nleaudio.com>
	<CAKTje6FgaJh=7LoYbT69F+WW5sZVuxoSXxMDxMbzg5gZSNf21Q@mail.gmail.com>
Message-ID: <20120404045554.GJ11151@unaka.lan>

On Wed, Apr 04, 2012 at 11:41:42AM +0800, Paul Wise wrote:
> On Wed, Apr 4, 2012 at 11:21 AM, Bob Puff wrote:
> 
> > I think the majority of MM users will be simply using the RPM that comes with
> > their distro, and there is a real benefit to stuff working right "out of the
> > box". ?This includes the Archiving functions.
> >
> > Its great to have options, and giving a list of possible alternatives for
> > users is excellent, but I think releasing MM 3 without -any- archiver is a
> > down-grade from the current MM 2.x.
> 
> In the Debian world we would do something like this:
> 
> Package: mailman3
> Depends: mailman3-archiver-default | mailman3-archiver
> 
> Package: mailman3-archiver-default
> Depends: mailman3-archiver-hyperkitty
> 
> Package: mailman3-archiver-hyperkitty
> Provides: mailman3-archiver
> 
> Is something like that not possible in the RPM world?
> 
Not sure what the | syntax is so it may not be.  But what we might do would
be have a virtual provide.

Package mailman3
Requires: mailman3-archiver

Package: mailman3-archiver-hyperkitty
Provides: mailman3-archiver

Package: mailman3-archiver-pipermail-eeewwww
Provides: mailman3-archiver

-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120403/03f2dd72/attachment.pgp>

From stephen at xemacs.org  Wed Apr  4 07:03:00 2012
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 4 Apr 2012 14:03:00 +0900
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <CA+CP9O6u3y7iOzmpdiry2WNzotu29iRx+UXb_Ub+DKvbjhkCog@mail.gmail.com>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
	<20120403185822.GI11151@unaka.lan>
	<CAKTje6FtA2i9F9VvbcSDPkhrOinB84-N86yTdQXdsiPTjo993A@mail.gmail.com>
	<20120404031831.M75170@nleaudio.com>
	<CA+CP9O6u3y7iOzmpdiry2WNzotu29iRx+UXb_Ub+DKvbjhkCog@mail.gmail.com>
Message-ID: <CAL_0O1-WvuRi1onM_-CECSd9Gsr45KddDKNpY1pP1aDQ7ZYFvA@mail.gmail.com>

On Wed, Apr 4, 2012 at 1:16 PM, David Jeske <davidj at gmail.com> wrote:
> On Apr 3, 2012 8:14 PM, "Bob Puff" <bob at nleaudio.com> wrote:

>> I think the majority of MM users will be simply using the RPM that comes with
>> their distro, and there is a real benefit to stuff working right "out of the
>> box". ?This includes the Archiving functions.

I don't see why that precludes having the archiver in a separate
recommended or required RPM, .deb, ebuild, or whatever dependency, and
I imagine the distros can and will deal with that (as most of them use
Mailman themselves, they'd have to do without dogfood).

The problem as I see it is that many distros (I'm looking at you,
Debian!) get woefully out of date, and their packaging often pays more
attention to "fitting in to the distro" than to what we consider best
practice.  So users will often upgrade from our sources (and that is
historically what we recommend).  Also, many non-OS distros (*gag*
*spit* Plesk *barf* cPanel) will roll their own derivatives (typically
with little care for what we consider best practice).

> I'd personally like to see a better archiver rolled into an MM2 point
> release, as well as upcoming MM3 development. (I understand pipermail URL
> compat would be nice in that case).

That, and automatic storage conversion to whatever the new archive UI prefers.

The caveats above notwithstanding, at this point I'm definitely with
David and Bob on this issue -- +1 for including batteries.  I'd like
to hear from Mark, though (even more so than from Barry; Mark is the
guy who's been guiding people through upgrades on a daily basis for
the last decade or so).

From stephen at xemacs.org  Wed Apr  4 07:08:39 2012
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 4 Apr 2012 14:08:39 +0900
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <20120403185822.GI11151@unaka.lan>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
	<20120403185822.GI11151@unaka.lan>
Message-ID: <CAL_0O1-zd9C4YgcTLiB_SD5ivjVcgDTGvTfJ7n3J_Xhp83BeuA@mail.gmail.com>

On Wed, Apr 4, 2012 at 3:58 AM, Toshio Kuratomi <a.badger at gmail.com> wrote:

> From the talk about what it means to be a FSF project at the mailman sprint
> at pycon I don't think a non-FSF copyright assigned archiver would be
> bundled into mailman (Core).

AFAIK there are no "FSF projects", although the FSF does support "The"
GNU Project and sometimes specific GNU projects.  According to the
criteria for being a GNU project
(http://www.gnu.org/help/evaluation.html)

    For a program to be GNU software does not require transferring
    copyright to the FSF; that is a separate question. If you transfer
    the copyright to the FSF, the FSF will enforce the GPL for the
    program if someone violates it; if you keep the copyright,
    enforcement will be up to you.

What *is* required is the GPL:

    A GNU program should use the latest version of the license that
    the GNU Project recommends?not just any free software license.
    For most packages, this means using the GNU GPL.

So David's program can't be *part* of GNU Mailman without special
permission, which I doubt the GNU Project (ie, RMS, AFAIK) will grant
(and would require delicate negotations in extreme good humor on our
part, based on past experience trying to negotiate licensing
exceptions with RMS).  It is not obvious that it can't be bundled with
Mailman distributions, however.  To my mind, bundling is a very strong
recommendation, and the official standard for GNU projects merely
says:

    A GNU program should not recommend use of any non-free program[...].

We could also redistribute verbatim, as part of Mailman under the GPL,
with pointers to upstream (I would be happy personally host a mirror
of a permissively licensed distribution).  Perhaps with an
ElementTree-like agreement that David makes the call on changes to the
archiver he contributed.  AIUI, that would make David happy (enough),
as he doesn't believe you can really restrict redistribution of a
simplified BSD-licensed program merely by incorporating it in a GPLed
distribution.

The main stinker there is the "David is the boss" agreement, if he
wants it.  I personally have been working with that kind of agreement
for years in XEmacs, and it makes our package contributors happy,
although it pisses off some of our core contributors.  Similar to the
ElementTree controversy in the Python stdlib, although none of the
packages where issues have come up matters as much to us as
ElementTree does to Python.  So that would be mostly up to Barry (if
David decides he wants that kind of power over the future of his
archiver after contributing it to Mailman).

> General impression from talking to a few other developers at PyCon is we
> generally like copyleft licenses. ?Some version of copyleft is likely what
> a lot of us would choose to license our own code under. ?A few of us are
> unhappy when our code is used to make closed source applications.

Sure, but this isn't our code yet, it's David's, and he proposes to do
much of the work involved in adapting his code to Mailman 3.

> Mailman2 is an FSF project. ?mailman3 and postorius are both derivatives of
> mailman2 and so they are both FSF projects.

That logic is inaccurate.  There's no must about it; Mailman 3 could
just as well be a fork.  But since the FSF is the owner of most of our
code, there are certain important conveniences to continuing that
practice, and no real benefit to not doing so since we can't choose
our own license because of the derivation from Mailman under the GPL.

>?FSF projects must do copyright assignment to the FSF

Not true, see above.

> and are licensed with one of the GNU licenses.

This is true, and I'm pretty sure it will be GPL v3, although given
the functionality there is some chance the GNU Project would push for
AGPL (but AFAIK RMS still considers the Affero clause optional, even
for out-and-out Web 2.0 webapps).

> Do we want to have a single blessed archiver (probably
> in a separate tarball as postorius, the admin web ui, is
> separate) as an eventual goal?

I believe that we won't have a blessed archiver, in the sense that any
archiver we distribute will have to use the same APIs that other
archivers do.  But having followed mailman-users for a decade now, I
think it would be a bad idea to have a "batteries not included"
distribution for Mailman 3.1.  Which webmin, which archiver, is a
different question.

From terri at zone12.com  Wed Apr  4 08:19:32 2012
From: terri at zone12.com (Terri Oda)
Date: Wed, 04 Apr 2012 00:19:32 -0600
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <CAL_0O1-zd9C4YgcTLiB_SD5ivjVcgDTGvTfJ7n3J_Xhp83BeuA@mail.gmail.com>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
	<20120403185822.GI11151@unaka.lan>
	<CAL_0O1-zd9C4YgcTLiB_SD5ivjVcgDTGvTfJ7n3J_Xhp83BeuA@mail.gmail.com>
Message-ID: <4F7BE7F4.7070607@zone12.com>

On 12-04-03 11:08 PM, Stephen J. Turnbull wrote:
> So David's program can't be *part* of GNU Mailman without special 
> permission, which I doubt the GNU Project (ie, RMS, AFAIK) will grant 
> (and would require delicate negotations in extreme good humor on our 
> part, based on past experience trying to negotiate licensing 
> exceptions with RMS). It is not obvious that it can't be bundled with 
> Mailman distributions, however. 

It occurs to me that it's perfectly reasonable to assume that people who 
*package* mailman for different distributions may choose different 
recommended/required archive software, since they can (and with the 
license hassle likely should)) be separate packages.  So what works for 
the FSF, what works for us as a dev team, and what works for the 
distributions may actually be different things.  So no matter what, 
having David release his work is potentially going to lead to people 
getting it as a default, somewhere along the line, if he's got a great 
solution available.

People get something better than pipermail *and* it doesn't result in me 
getting more angry emails from RMS?  Sounds like a winner to me.

BTW, I *will* argue that we should have a bundled archiver that does 
something more than make mbox files, and you can all expect to have a 
big argument with me about it later. ;)  But I'm not in a hurry to make 
a decision about which one Right Now because I'm going to want to do a 
deeper usability analysis of Postorius + archive and I can't do that 
until we have them both on the table for user testing.

  Terri


From davidj at gmail.com  Wed Apr  4 08:56:30 2012
From: davidj at gmail.com (David Jeske)
Date: Tue, 3 Apr 2012 23:56:30 -0700
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <4F7BE7F4.7070607@zone12.com>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
	<20120403185822.GI11151@unaka.lan>
	<CAL_0O1-zd9C4YgcTLiB_SD5ivjVcgDTGvTfJ7n3J_Xhp83BeuA@mail.gmail.com>
	<4F7BE7F4.7070607@zone12.com>
Message-ID: <CA+CP9O4t6GciMjQ6fg4Lv1__GypsNS7HzrKdbmJ+Q974Euu_wA@mail.gmail.com>

This thread is slowing down my coding! :)    (it's been really helpful
though all, thanks for the many perspectives!)

On Tue, Apr 3, 2012 at 11:19 PM, Terri Oda <terri at zone12.com> wrote:

> It occurs to me that it's perfectly reasonable to assume that people who
> *package* mailman for different distributions may choose different
> recommended/required archive software, since they can (and with the license
> hassle likely should)) be separate packages.  So what works for the FSF,
> what works for us as a dev team, and what works for the distributions may
> actually be different things.
>

I agree I'm coming around to the sensibility of possibly not including an
archiver with MM3, just so long as there actually ARE solid and working
archivers that plug right in with nothing more than an apt-get (or equiv).
It's just as fine if the distribution maintainers pick which one to
include, and this gets around all this FSF/GPL/whatsit stuff... without
bascally getting a pipermail default. I still think it's dangerous for
people landing on Mailman's website and downloading source..


> So no matter what, having David release his work is potentially going to
> lead to people getting it as a default, somewhere along the line, if he's
> got a great solution available.
>

I know this thread is long and in pieces, but just to clarify, my code is
already released and has been S-BSD for **ten** years. The UI is a little
dated, so I'm cleaning up both the UI and the code right now, but I just
want folks to know the code is already out there..

http://www.clearsilver.net/archive/

http://dj1.willowmail.com/csla/Mailman-Developers

...this discussion is all just about whether mailman wants to bundle (or
reference) near-future updates to this stuff. I was hoping that rather than
create my own separate OSS-y website and such for it, I could just hang out
here and roll it into Mailman-land. You guys have done great work.

If this GPL/S-BSD issue turns out to be a blocker, then I'll just make my
own site and maintain (my version) there because I want to release my code
S-BSD.

Also, there will be *zero* ill-will if you folks want to wrap it up in a
GPL license and stick it into mailman... i just won't be maintaining that,
or assigning copyright, and any patches I make will be into my S-BSD tree.
Perhaps not ideal, but still seems a better outcome than pipermail.

From stephen at xemacs.org  Wed Apr  4 09:58:13 2012
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 4 Apr 2012 16:58:13 +0900
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <CA+CP9O4t6GciMjQ6fg4Lv1__GypsNS7HzrKdbmJ+Q974Euu_wA@mail.gmail.com>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
	<20120403185822.GI11151@unaka.lan>
	<CAL_0O1-zd9C4YgcTLiB_SD5ivjVcgDTGvTfJ7n3J_Xhp83BeuA@mail.gmail.com>
	<4F7BE7F4.7070607@zone12.com>
	<CA+CP9O4t6GciMjQ6fg4Lv1__GypsNS7HzrKdbmJ+Q974Euu_wA@mail.gmail.com>
Message-ID: <CAL_0O18WDdqPRktpPjFF3wxCuJXDt3qGnV78BB7JQBE822XpuQ@mail.gmail.com>

On Wed, Apr 4, 2012 at 3:56 PM, David Jeske <davidj at gmail.com> wrote:

> ...this discussion is all just about whether mailman wants to bundle (or
> reference) near-future updates to this stuff. I was hoping that rather than
> create my own separate OSS-y website and such for it, I could just hang out
> here and roll it into Mailman-land. You guys have done great work.

I can't see any technical or legal reason why you would need to
maintain a separate site.  I would be more than happy to help maintain
a Mailman-based site providing links to resources such as tarballs,
VCS repos, and docs (this site would presumably be on the wiki) and a
repo on Launchpad, which I think takes care of any social issues.  It
wouldn't be specific to the Clearsilver archiver, I'd do the same for
any other archivers people care to recommend.

As for a GPL-wrapped release bundled into Mailman, I'll do admin-level
work to make that possible if the Clearsilver archiver is adopted and
that's the way people want to go.  I have experience with that kind of
thing, and am happy to help lubricate that kind of friction.  (I might
be willing to do more hacking/maintainer-like work if people decide to
make significant GPL-only enhancements, but I won't decide whether to
do that until I've seen both the existing code and any such
enhancements.)

From benedict.stein at googlemail.com  Wed Apr  4 18:07:43 2012
From: benedict.stein at googlemail.com (Benedict Stein)
Date: Wed, 04 Apr 2012 18:07:43 +0200
Subject: [Mailman-Developers] Gsoc
In-Reply-To: <CAGmcnY8OvoidJAZdiBGp8scOy2mOBwxW3f6JzvsoT=4vT5DRmQ@mail.gmail.com>
References: <CAGmcnY8OvoidJAZdiBGp8scOy2mOBwxW3f6JzvsoT=4vT5DRmQ@mail.gmail.com>
Message-ID: <4F7C71CF.2020109@gmail.com>

Salut Fran?ois,

GSOC Application are generally done through the Melange Web Interface
which is on the Google Summer of Code Site.
Regarding the matter of Mailman - Florian / The Mailinglist is probably
the best way to contact the project people.

You'll find both things in CC

Also you'll find lots of addition information about mailman and GSOC
opportunities on the MAilinglist.

I've got only one question left - where did you get my Email ?

On 04/04/2012 17:16, Fran?ois Rib?mont wrote:
> Hello,
> I am a 4th year student in software in engineering who would like to
> apply for the project you have submitted on google summer of code.
> This will be my first gsoc, and I don't know much about it. So my
> first question is: is it the good way to contact you ? Is there any
> kind of : "How to apply" on your project ?
>
> Regards
>
> -- 
> Ribemont Fran?ois
> ribemont.francois at gmail.com <mailto:ribemont.francois at gmail.com>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120404/d8008d3f/attachment.pgp>

From rkw at dataplex.net  Thu Apr  5 05:57:59 2012
From: rkw at dataplex.net (Richard Wackerbarth)
Date: Wed, 4 Apr 2012 22:57:59 -0500
Subject: [Mailman-Developers] [Merge] lp:~crodjer/postorius/postorius
	into lp:postorius
In-Reply-To: <CAJeQML13Y0gtS6ZF2bfTFr9izQz+tRr42fy+u63O+KEP4Ws7TA@mail.gmail.com>
References: <CAJeQML13Y0gtS6ZF2bfTFr9izQz+tRr42fy+u63O+KEP4Ws7TA@mail.gmail.com>
Message-ID: <615831E3-3609-48CE-9692-8B1ACEC4E890@dataplex.net>

On Apr 4, 2012, at 9:40 PM, Rohan Jain wrote:

> Okay.  I changed it to mutest.db because someone by the nick wacky asked
> over the IRC

First, that was a typo in my IRC message.
Second, I did not request that you change the file name. I ASK why you changed the definition (from Florian's computed path).
You never responded.

From pingou at pingoured.fr  Thu Apr  5 15:41:28 2012
From: pingou at pingoured.fr (Pierre-Yves Chibon)
Date: Thu, 05 Apr 2012 15:41:28 +0200
Subject: [Mailman-Developers] From the creation of a ThreadID
Message-ID: <1333633288.23207.26.camel@ambre.pingoured.fr>

Hi,

In HyperKitty to be able to easily retrieve from the database all the
threads of a given month or just all the emails of a thread, I created a
Field in the database called ThreadID.
When I load the archives from mailman into mongo, I look for the absence
of the headers 'References' or 'In-Reply-To' to define an email that
starts a new thread.
Then all emails which have the header 'References' or 'In-Reply-To' will
look for the preceding email and extract the ThreadID from it.
This seems to work fine.

At the beginning I was using a simple integer as identifier but of
course if you changed the order in which the archives are loaded or just
if you miss like one month than the ThreadID is not consistent anymore.
So I changed to use the Message-ID of the first email of the Thread as
ThreadID.
Problem is of course, if the admin removes the first email of a thread
for x or y reasons, then when reloading the archives (for z or a
reasons), we will loose the ThreadID and actually, the integrity of the
Thread (each reply to the first email will be split into their own
thread).

Would anyone have an idea on how to generate a stable and delete/reload
proof ThreadID?
The other solution of course being that I regenerate the thread on the
fly based on the first email (which is still easy to find), but that
will be a lot of db querying.

Thinking about it, generating the thread on the fly would also give the
possibility to regenerate the thread view from anywhere (so you could
generate a thread view for only a sub-thread).

Do you have any suggestions/preferences ?

Thanks,
Pierre

From stephen at xemacs.org  Thu Apr  5 17:10:22 2012
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 6 Apr 2012 00:10:22 +0900
Subject: [Mailman-Developers] From the creation of a ThreadID
In-Reply-To: <1333633288.23207.26.camel@ambre.pingoured.fr>
References: <1333633288.23207.26.camel@ambre.pingoured.fr>
Message-ID: <CAL_0O19fP89g_jYH1rpkhUJZ4v+2D7FQRqi-VKeoQPmjgqvk2A@mail.gmail.com>

On Thu, Apr 5, 2012 at 10:41 PM, Pierre-Yves Chibon <pingou at pingoured.fr> wrote:

> In HyperKitty to be able to easily retrieve from the database all the
> threads of a given month or just all the emails of a thread, I created a
> Field in the database called ThreadID.
> When I load the archives from mailman into mongo, I look for the absence
> of the headers 'References' or 'In-Reply-To' to define an email that
> starts a new thread.

This fails when a thread crosses channels.  Eg,

To: Pierre
From: Steve
Message-Id: <x at y.z>

is followed by

To: Steve
From: Pierre
Cc: SomeList
References: <x at y.z>
Message-Id: <a at b.c>

> Would anyone have an idea on how to generate a stable and delete/reload
> proof ThreadID?

I don't see how this can be possible.  Eg, in the above scenario you
construct a thread based on your reply to me.  Then I go, "oh, really
I should have posted to mm-dev" and repost the thread.  So the
"Message-ID of root message" fails, and I don't see an alternative
that can be predicted.  So it may as well be arbitrary (eg, any
message in the thread) and stored in the database with appropriate
linkage from thread IDs to message IDs (one-to-many), and vice versa
(many-to-one).

> The other solution of course being that I regenerate the thread on the
> fly based on the first email (which is still easy to find), but that
> will be a lot of db querying.

I haven't thought about it deeply, but I would say just give the
thread an arbitrary ID in the database.  Message-IDs are supposed to
universally unique, so what's wrong with keeping the thread in the
database as a tree of message IDs?  Some Message-IDs will not have
corresponding messages but that's always a problem with threading (see
http://www.jwz.org/doc/threading.html, and RFC 5256).

There are other problems with threading that need to be dealt with as
well, such as References being inconsistent across messages in the
same thread and people who continue a thread with a new message, etc.

From richard at NFSNet.org  Thu Apr  5 17:57:04 2012
From: richard at NFSNet.org (Richard Wackerbarth)
Date: Thu, 5 Apr 2012 10:57:04 -0500
Subject: [Mailman-Developers] From the creation of a ThreadID
In-Reply-To: <CAL_0O19fP89g_jYH1rpkhUJZ4v+2D7FQRqi-VKeoQPmjgqvk2A@mail.gmail.com>
References: <1333633288.23207.26.camel@ambre.pingoured.fr>
	<CAL_0O19fP89g_jYH1rpkhUJZ4v+2D7FQRqi-VKeoQPmjgqvk2A@mail.gmail.com>
Message-ID: <AEDB8517-4A0C-4FDF-925A-6DA7F15890A5@NFSNet.org>

I agree with Steve. In general you cannot solve the problem with only the information contained in the message headers. You will need to maintain parallel meta-data for each message/thread. The header info would provide an initialization at the time of insertion. Presumedly this thread tree could be edited by an administrator to "correct" broken chains, etc.

On Apr 5, 2012, at 10:10 AM, Stephen J. Turnbull wrote:

> On Thu, Apr 5, 2012 at 10:41 PM, Pierre-Yves Chibon <pingou at pingoured.fr> wrote:
> 
>> In HyperKitty to be able to easily retrieve from the database all the
>> threads of a given month or just all the emails of a thread, I created a
>> Field in the database called ThreadID.
>> When I load the archives from mailman into mongo, I look for the absence
>> of the headers 'References' or 'In-Reply-To' to define an email that
>> starts a new thread.
> 
> This fails when a thread crosses channels.  Eg,
> 
> To: Pierre
> From: Steve
> Message-Id: <x at y.z>
> 
> is followed by
> 
> To: Steve
> From: Pierre
> Cc: SomeList
> References: <x at y.z>
> Message-Id: <a at b.c>


> I haven't thought about it deeply, but I would say just give the
> thread an arbitrary ID in the database.  Message-IDs are supposed to
> universally unique, so what's wrong with keeping the thread in the
> database as a tree of message IDs?  Some Message-IDs will not have
> corresponding messages but that's always a problem with threading (see
> http://www.jwz.org/doc/threading.html, and RFC 5256).
> 
> There are other problems with threading that need to be dealt with as
> well, such as References being inconsistent across messages in the
> same thread and people who continue a thread with a new message, etc.


From sophron at latthi.com  Thu Apr  5 20:38:58 2012
From: sophron at latthi.com (George Chatzisofroniou)
Date: Thu, 5 Apr 2012 21:38:58 +0300
Subject: [Mailman-Developers] [GSoC 2012] Candidate on 'Integration of
 (existing) search code into Mailman archives'
In-Reply-To: <4F7A2C47.1070002@zone12.com>
References: <CACeRBzkPqWpDfSt4KqtACYSTaOdSxBuTaoMixJ=Tbq_dOHGBAQ@mail.gmail.com>
	<4F7A2C47.1070002@zone12.com>
Message-ID: <CACeRBzmYY9MF3LGPqgAz3vEaDRVE=9X4M-ttX1ryE=hoEYDB=A@mail.gmail.com>

2012/4/3 Terri Oda <terri at zone12.com>:
> Hi George,
>
> Your MailmanStats project looks great and would totally fit with what we
> have in mind for stats, though I'm guessing the hyperkitty team has some
> much more extensive work in mind making use of post ratings, tags, etc.
>
> If you're putting together your proposal now, do feel free to mention both
> projects as sources of interest. ?Since you already have the stats code
> available, it might be possible to toss the integration in there after doing
> some other work. ?Normally I worry about students biting off more than they
> can chew, but given your prior experience with Mailman and the fact that you
> already have the basic code, you can make a case for being able to package
> up that code and contribute it in a week or two our of your summer if you're
> ready for a code review.
>
>
> ?Terri
>
> PS - ?For further advice regarding search projects, see my previous post to
> mailman-developers.
>
>

Thanks for your respond Terri,

I thought about it quite a lot.

Eventually, I think it is better to implement only the Metric idea
(since i already have the base code) by integrating my software into
Mailman. My previous experience with Mailman and the fact i have done
some work already will make a more awesome result.

I'll send my proposal in the next hours. I'll appreciate any feedback.

Thanks,


-- 
George Chatzisofroniou
sophron.latthi.com

From pingou at pingoured.fr  Thu Apr  5 20:42:51 2012
From: pingou at pingoured.fr (Pierre-Yves Chibon)
Date: Thu, 05 Apr 2012 20:42:51 +0200
Subject: [Mailman-Developers] From the creation of a ThreadID
In-Reply-To: <CAL_0O19fP89g_jYH1rpkhUJZ4v+2D7FQRqi-VKeoQPmjgqvk2A@mail.gmail.com>
References: <1333633288.23207.26.camel@ambre.pingoured.fr>
	<CAL_0O19fP89g_jYH1rpkhUJZ4v+2D7FQRqi-VKeoQPmjgqvk2A@mail.gmail.com>
Message-ID: <1333651371.6278.16.camel@ambre.pingoured.fr>

On Fri, 2012-04-06 at 00:10 +0900, Stephen J. Turnbull wrote:
> On Thu, Apr 5, 2012 at 10:41 PM, Pierre-Yves Chibon <pingou at pingoured.fr> wrote:
> 
> > In HyperKitty to be able to easily retrieve from the database all the
> > threads of a given month or just all the emails of a thread, I created a
> > Field in the database called ThreadID.
> > When I load the archives from mailman into mongo, I look for the absence
> > of the headers 'References' or 'In-Reply-To' to define an email that
> > starts a new thread.
> 
> This fails when a thread crosses channels.  Eg,
> 
> To: Pierre
> From: Steve
> Message-Id: <x at y.z>
> 
> is followed by
> 
> To: Steve
> From: Pierre
> Cc: SomeList
> References: <x at y.z>
> Message-Id: <a at b.c>
> 
> > Would anyone have an idea on how to generate a stable and delete/reload
> > proof ThreadID?
> 
> I don't see how this can be possible.  Eg, in the above scenario you
> construct a thread based on your reply to me.  Then I go, "oh, really
> I should have posted to mm-dev" and repost the thread.  So the
> "Message-ID of root message" fails, and I don't see an alternative
> that can be predicted.  So it may as well be arbitrary (eg, any
> message in the thread) and stored in the database with appropriate
> linkage from thread IDs to message IDs (one-to-many), and vice versa
> (many-to-one).

Ok, I missed a something here.
So when it parses the email, it checks for 'References' or
'In-Reply-To'.
- If it finds them, it looks for the preceding email
    - if it finds the preceding email, then the current email gets the
ThreadID from the preceding email
    - if it does not find the preceding email, then the current email is
assumed to be a new thread and thus its ThreadID is its Message-ID
- if it does not find 'References' or 'In-Reply-To', then the current
email is assumed to be a new thread and thus its ThreadID is its
Message-ID

So for the example you give, the archiver will receive your email and
make a new thread out of it.

> > The other solution of course being that I regenerate the thread on the
> > fly based on the first email (which is still easy to find), but that
> > will be a lot of db querying.
> 
> I haven't thought about it deeply, but I would say just give the
> thread an arbitrary ID in the database.  Message-IDs are supposed to
> universally unique, so what's wrong with keeping the thread in the
> database as a tree of message IDs?  Some Message-IDs will not have
> corresponding messages but that's always a problem with threading (see
> http://www.jwz.org/doc/threading.html, and RFC 5256).

The idea of using the Message-ID for ThreadID (instead of a integer) is
that, if I whether I load one months or two months of archives into the
database, the link to the thread
(http://mm3test.fedoraproject.org/thread/packaging at fp.o/XU7HT5JC5GND2O4JII7MTQILLTB4IN4S) will remain the same (so consistent urls).

> There are other problems with threading that need to be dealt with as
> well, such as References being inconsistent across messages in the
> same thread and people who continue a thread with a new message, etc.

For these I am not sure I can do something (at least automatically, we
could always allow an admin to edit the field).

Pierre


From richard at NFSNet.org  Thu Apr  5 22:00:24 2012
From: richard at NFSNet.org (Richard Wackerbarth)
Date: Thu, 5 Apr 2012 15:00:24 -0500
Subject: [Mailman-Developers] From the creation of a ThreadID
In-Reply-To: <1333651371.6278.16.camel@ambre.pingoured.fr>
References: <1333633288.23207.26.camel@ambre.pingoured.fr>
	<CAL_0O19fP89g_jYH1rpkhUJZ4v+2D7FQRqi-VKeoQPmjgqvk2A@mail.gmail.com>
	<1333651371.6278.16.camel@ambre.pingoured.fr>
Message-ID: <2E1CE263-285A-42DF-8841-5DB1E5633901@NFSNet.org>

Pierre,

There is nothing wrong with using a message ID as a thread ID. They are different namespaces (with an intuitive mapping for the first post.)

The problem is only that the mapping is not stable under the "restore after deleting some messages" scenario.
If you expect to be able to restore messages and keep stable thread IDs, then you will need to assure that the mapping of message to thread ID does not depend on the presence of other messages remaining in the database.

Richard

On Apr 5, 2012, at 1:42 PM, Pierre-Yves Chibon wrote:

> The idea of using the Message-ID for ThreadID (instead of a integer) is
> that, if I whether I load one months or two months of archives into the
> database, the link to the thread
> (http://mm3test.fedoraproject.org/thread/packaging at fp.o/XU7HT5JC5GND2O4JII7MTQILLTB4IN4S) will remain the same (so consistent urls).
> 
>> There are other problems with threading that need to be dealt with as
>> well, such as References being inconsistent across messages in the
>> same thread and people who continue a thread with a new message, etc.
> 
> For these I am not sure I can do something (at least automatically, we
> could always allow an admin to edit the field).
> 
> Pierre
> 
> _______________________________________________
> Mailman-Developers mailing list
> Mailman-Developers at python.org
> http://mail.python.org/mailman/listinfo/mailman-developers
> Mailman FAQ: http://wiki.list.org/x/AgA3
> Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/
> Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/richard%40nfsnet.org
> 
> Security Policy: http://wiki.list.org/x/QIA9


From mark at msapiro.net  Thu Apr  5 22:10:21 2012
From: mark at msapiro.net (Mark Sapiro)
Date: Thu, 5 Apr 2012 13:10:21 -0700
Subject: [Mailman-Developers] From the creation of a ThreadID
In-Reply-To: <1333651371.6278.16.camel@ambre.pingoured.fr>
Message-ID: <PC195201204051310210843ede1e920@MSAPIRO>

Pierre-Yves Chibon wrote:
>
>Ok, I missed a something here.
>So when it parses the email, it checks for 'References' or
>'In-Reply-To'.
>- If it finds them, it looks for the preceding email
>    - if it finds the preceding email, then the current email gets the
>ThreadID from the preceding email
>    - if it does not find the preceding email, then the current email is
>assumed to be a new thread and thus its ThreadID is its Message-ID
>- if it does not find 'References' or 'In-Reply-To', then the current
>email is assumed to be a new thread and thus its ThreadID is its
>Message-ID


This is still incomplete. One of the MUAs I use generates In-Reply-To:
headers but not References: headers. Thus in cases where someone has
replied to me but not included the list (and may or may not have
subsequently sent the reply to the list with a different Message-ID),
and I reply and include the list, the Message-ID in my In-Reply-To: is
not in the archive.

Another situation is someone replies to me and the list, but the list
reply is greylisted and not retried for a while. Meanwhile, I reply to
my copy and the Message-ID in my In-Reply-To: is not yet in the
archive.

Threading is not easy.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan


From pingou at pingoured.fr  Thu Apr  5 22:20:18 2012
From: pingou at pingoured.fr (Pierre-Yves Chibon)
Date: Thu, 05 Apr 2012 22:20:18 +0200
Subject: [Mailman-Developers] From the creation of a ThreadID
In-Reply-To: <PC195201204051310210843ede1e920@MSAPIRO>
References: <PC195201204051310210843ede1e920@MSAPIRO>
Message-ID: <1333657218.6278.25.camel@ambre.pingoured.fr>

On Thu, 2012-04-05 at 13:10 -0700, Mark Sapiro wrote:
> Pierre-Yves Chibon wrote:
> >
> >Ok, I missed a something here.
> >So when it parses the email, it checks for 'References' or
> >'In-Reply-To'.
> >- If it finds them, it looks for the preceding email
> >    - if it finds the preceding email, then the current email gets the
> >ThreadID from the preceding email
> >    - if it does not find the preceding email, then the current email is
> >assumed to be a new thread and thus its ThreadID is its Message-ID
> >- if it does not find 'References' or 'In-Reply-To', then the current
> >email is assumed to be a new thread and thus its ThreadID is its
> >Message-ID
> 
> 
> This is still incomplete. One of the MUAs I use generates In-Reply-To:
> headers but not References: headers. Thus in cases where someone has
> replied to me but not included the list (and may or may not have
> subsequently sent the reply to the list with a different Message-ID),
> and I reply and include the list, the Message-ID in my In-Reply-To: is
> not in the archive.
> 
> Another situation is someone replies to me and the list, but the list
> reply is greylisted and not retried for a while. Meanwhile, I reply to
> my copy and the Message-ID in my In-Reply-To: is not yet in the
> archive.
> 
> Threading is not easy.

I haven't completely read the link that Stephen sent earlier, hopefully
the answer to these two points is in there :)

Pierre

From terri at zone12.com  Thu Apr  5 23:29:26 2012
From: terri at zone12.com (Terri Oda)
Date: Thu, 05 Apr 2012 17:29:26 -0400
Subject: [Mailman-Developers] From the creation of a ThreadID
In-Reply-To: <1333633288.23207.26.camel@ambre.pingoured.fr>
References: <1333633288.23207.26.camel@ambre.pingoured.fr>
Message-ID: <4F7E0EB6.7030905@zone12.com>

I haven't read the whole thread so maybe someone else has mentioned 
this, but we may want to take advantage of the dynamic sublists code for 
this, since it produces "conversations" or "topics" sublists and already 
has to generate and maintain a code for each.  Rather than messageids 
these are meant to be a bit more human-readable, so they're often words 
with numbers suffixed.  But yeah; there exists code for Mailman 2.1 that 
might be reusable here, and there's a GSoC project on the table to port 
to 3.0 so this might be a thing that we could pass to the archive utility.

  Terri

On 12-04-05 9:41 AM, Pierre-Yves Chibon wrote:
> Hi,
>
> In HyperKitty to be able to easily retrieve from the database all the
> threads of a given month or just all the emails of a thread, I created a
> Field in the database called ThreadID.
> When I load the archives from mailman into mongo, I look for the absence
> of the headers 'References' or 'In-Reply-To' to define an email that
> starts a new thread.
> Then all emails which have the header 'References' or 'In-Reply-To' will
> look for the preceding email and extract the ThreadID from it.
> This seems to work fine.
>
> At the beginning I was using a simple integer as identifier but of
> course if you changed the order in which the archives are loaded or just
> if you miss like one month than the ThreadID is not consistent anymore.
> So I changed to use the Message-ID of the first email of the Thread as
> ThreadID.
> Problem is of course, if the admin removes the first email of a thread
> for x or y reasons, then when reloading the archives (for z or a
> reasons), we will loose the ThreadID and actually, the integrity of the
> Thread (each reply to the first email will be split into their own
> thread).
>
> Would anyone have an idea on how to generate a stable and delete/reload
> proof ThreadID?
> The other solution of course being that I regenerate the thread on the
> fly based on the first email (which is still easy to find), but that
> will be a lot of db querying.
>
> Thinking about it, generating the thread on the fly would also give the
> possibility to regenerate the thread view from anywhere (so you could
> generate a thread view for only a sub-thread).
>
> Do you have any suggestions/preferences ?
>
> Thanks,
> Pierre
> _______________________________________________
> Mailman-Developers mailing list
> Mailman-Developers at python.org
> http://mail.python.org/mailman/listinfo/mailman-developers
> Mailman FAQ: http://wiki.list.org/x/AgA3
> Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/
> Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/terri%40zone12.com
>
> Security Policy: http://wiki.list.org/x/QIA9
>

From terri at zone12.com  Thu Apr  5 23:40:23 2012
From: terri at zone12.com (Terri Oda)
Date: Thu, 05 Apr 2012 17:40:23 -0400
Subject: [Mailman-Developers] Reminder: Get those GSoC proposals in!
Message-ID: <4F7E1147.5040702@zone12.com>

Just a reminder: GSoC proposals are due April 6th!

The Melange system sometimes has problems on the day that things are 
due, so if you can get something into the Melange system now, even if 
it's just a draft, please do so.  Google will not extend the deadline 
for any reason under any circumstance, so don't wait 'till the last minute!

You'll be hearing from us next week as we review your proposals and ask 
questions, and you'll have a chance to update it then if there's 
something that needs adjustment!

  Terri


From mark at msapiro.net  Fri Apr  6 02:18:09 2012
From: mark at msapiro.net (Mark Sapiro)
Date: Thu, 5 Apr 2012 17:18:09 -0700
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <CAL_0O1-WvuRi1onM_-CECSd9Gsr45KddDKNpY1pP1aDQ7ZYFvA@mail.gmail.com>
Message-ID: <PC19520120405171809025029ee0443@MSAPIRO>

Stephen J. Turnbull wrote:
>
>The caveats above notwithstanding, at this point I'm definitely with
>David and Bob on this issue -- +1 for including batteries.  I'd like
>to hear from Mark, though (even more so than from Barry; Mark is the
>guy who's been guiding people through upgrades on a daily basis for
>the last decade or so).


I put this thread aside for "later", but just so people don't think I'm
ignoring it, I'm +1 for an archiver or choice of archivers out of the
box.

I'd like to see a default install provide list owners with at a minimum
a choice of public, private or no archives and the archives to be
searchable.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan


From stephen at xemacs.org  Fri Apr  6 05:00:32 2012
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 6 Apr 2012 12:00:32 +0900
Subject: [Mailman-Developers] From the creation of a ThreadID
In-Reply-To: <1333651371.6278.16.camel@ambre.pingoured.fr>
References: <1333633288.23207.26.camel@ambre.pingoured.fr>
	<CAL_0O19fP89g_jYH1rpkhUJZ4v+2D7FQRqi-VKeoQPmjgqvk2A@mail.gmail.com>
	<1333651371.6278.16.camel@ambre.pingoured.fr>
Message-ID: <CAL_0O18g0eiik__HGBAE-nxqgm02AFN88dxq36kUFBUGPixcVQ@mail.gmail.com>

On Fri, Apr 6, 2012 at 3:42 AM, Pierre-Yves Chibon <pingou at pingoured.fr> wrote:

> So when it parses the email, it checks for 'References' or
> 'In-Reply-To'.
> - If it finds them, it looks for the preceding email
> ? ?- if it finds the preceding email, then the current email gets the
> ThreadID from the preceding email

So far, so good.

> ? ?- if it does not find the preceding email, then the current email is
> assumed to be a new thread

This is unacceptable.  Mailing lists are not synchronous (eg, because
of greylisting for one, but there are plenty of reasons why the mail
doesn't always go through immediately).  Threads must be able to
integrate new messages as they arrive, even if out of order.

> and thus its ThreadID is its Message-ID
> - if it does not find 'References' or 'In-Reply-To', then the current
> email is assumed to be a new thread and thus its ThreadID is its
> Message-ID

This isn't quite unacceptable, but it's clearly suboptimal.
(Well-known algorithms that handle this case nicely are available.)

> So for the example you give, the archiver will receive your email and
> make a new thread out of it.

That's an archiver that I won't use, and will strongly oppose as a
candidate for the bundled archiver for Mailman (any version).

>> I haven't thought about it deeply, but I would say just give the
>> thread an arbitrary ID in the database. ?Message-IDs are supposed to
>> universally unique, so what's wrong with keeping the thread in the
>> database as a tree of message IDs? ?Some Message-IDs will not have
>> corresponding messages but that's always a problem with threading (see
>> http://www.jwz.org/doc/threading.html, and RFC 5256).
>
> The idea of using the Message-ID for ThreadID (instead of a integer) is
> that, if I whether I load one months or two months of archives into the
> database, the link to the thread
> (http://mm3test.fedoraproject.org/thread/packaging at fp.o/XU7HT5JC5GND2O4JII7MTQILLTB4IN4S) will remain the same (so consistent urls).

Sure, but this is a matter of a persistent ID in the database.  When I
say "arbitrary" I don't mean you can't use a message ID to represent a
thread if you like, I mean that you can't algorithmically compute it
in a reliable, history-independent way.  From the point of view of a
user, you can't even be sure that a message without References or
In-Reply-To is a thread root (users will note the subject and the
content, and they will be displeased with any threading algorithm that
doesn't at least group subjects).

I don't say you need to implement that part of the JWZ/5256 algorithm
immediately, but you must not use a database schema that makes it hard
to add that feature later.

In most cases, users will have access to a Message-ID for some message
in the thread.  So I would want an URL like

    http://lists.example.com/archive/some-list/thread/MessageID/root/

to find the thread root for any message in the thread, not just a
particular representative of the the thread.  (YMMV for the URL
scheme, of course.)  The last component of the URL path just gives the
focus (message to actually display and/or highlight in a tree widget);
other useful focuses might be "latest" (a message in the thread with
the most recent Date or Received header) and "self" (the message
itself is the focus).  More speculative focuses would be "parent"
(obvious, I hope) and "node" (the most recent ancestor message with
multiple children).

>> There are other problems with threading that need to be dealt with as
>> well, such as References being inconsistent across messages in the
>> same thread and people who continue a thread with a new message, etc.
>
> For these I am not sure I can do something (at least automatically, we
> could always allow an admin to edit the field).

You must do something about inconsistent References.  Suppose there is
a References loop?  It needs to be broken, somehow, or your program
will infloop.

Anyway, this is all already taken care of in Jamie's algorithm.

From davidj at gmail.com  Fri Apr  6 09:00:49 2012
From: davidj at gmail.com (David Jeske)
Date: Fri, 6 Apr 2012 00:00:49 -0700
Subject: [Mailman-Developers] From the creation of a ThreadID
In-Reply-To: <1333633288.23207.26.camel@ambre.pingoured.fr>
References: <1333633288.23207.26.camel@ambre.pingoured.fr>
Message-ID: <CA+CP9O4AyX6RXbwHv6XT5GR5hqyjzWOkWb6GFY9yrEB-jOYpeA@mail.gmail.com>

On Apr 5, 2012 6:42 AM, "Pierre-Yves Chibon" <pingou at pingoured.fr> wrote:
> So I changed to use the Message-ID of the first email of the Thread as
ThreadID.
> Problem is of course, if the admin removes the first email of a thread
> for x or y reasons, then when reloading the archives (for z or a
> reasons), we will loose the ThreadID and actually, the integrity of the
> Thread (each reply to the first email will be split into their own
> thread).
>
> Would anyone have an idea on how to generate a stable and delete/reload
proof ThreadID?

I believe "deletion proof" (i.e. stable thread-ids in the case of arbitrary
deletions) may be provably not possible.

If you really want to be resiliant to arbitrary deletions/reloads, I think
your solution is ultimately going to involve referencing more than one
message in thread URLs..

For example, here is a scheme where 'messages in the thread name the
thread':

1) don't publish thread-ids, but just message-ids... for example, a thread
URL could be allowed to reference the message-id of 'any' message in the
thread.... They could then include more than one message-id, making them
resiliant to a lost messageid later. if some messageid are lost, hopefully
a url someone is holding onto has another messageid that was not lost.

As for how to pick the message-ids, paged display could include a messageid
for a message on the page, in addition to the 'first' messageid of the
thread.

2) create an 'internal only threadid' which you use to correlate messages
together into a thread. (don't show this to anyone) you could generate this
as a GUID, Hash, or the message-id of the message..doesn't matter, since
nobody will see it...

3) when indexing messages, search in both directions
(references/in-reply-to -> messageid, and vice-versa) to find out if the
message belongs in a thread.. if it does, then adopt the 'internal thread
id'.. if you find two different threadids in the two directions, then
rewrite/combine into a single internal-thread-id

-> urls can be somewhat resiliant of deleted/missing messages within a
thread... and completely resilient to changes in other threads
-> threads can be manually edited and merged/split after the fact, with
some level of success
-> could be designed to 'break down' threads that get too big, again with
minimal damage, and some url compatibility

From a.badger at gmail.com  Fri Apr  6 19:49:47 2012
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Fri, 6 Apr 2012 10:49:47 -0700
Subject: [Mailman-Developers] From the creation of a ThreadID
In-Reply-To: <CA+CP9O4AyX6RXbwHv6XT5GR5hqyjzWOkWb6GFY9yrEB-jOYpeA@mail.gmail.com>
References: <1333633288.23207.26.camel@ambre.pingoured.fr>
	<CA+CP9O4AyX6RXbwHv6XT5GR5hqyjzWOkWb6GFY9yrEB-jOYpeA@mail.gmail.com>
Message-ID: <20120406174946.GM11151@unaka.lan>

On Fri, Apr 06, 2012 at 12:00:49AM -0700, David Jeske wrote:
> On Apr 5, 2012 6:42 AM, "Pierre-Yves Chibon" <pingou at pingoured.fr> wrote:
> > So I changed to use the Message-ID of the first email of the Thread as
> ThreadID.
> > Problem is of course, if the admin removes the first email of a thread
> > for x or y reasons, then when reloading the archives (for z or a
> > reasons), we will loose the ThreadID and actually, the integrity of the
> > Thread (each reply to the first email will be split into their own
> > thread).
> >
> > Would anyone have an idea on how to generate a stable and delete/reload
> proof ThreadID?
> 
> I believe "deletion proof" (i.e. stable thread-ids in the case of arbitrary
> deletions) may be provably not possible.
> 
> If you really want to be resiliant to arbitrary deletions/reloads, I think
> your solution is ultimately going to involve referencing more than one
> message in thread URLs..
> 
I don't see any way to make this 100% resilient against deletion + reload
(where reload == from the available messages without the benefit of the old
metadata) either.  I think with slight modification to your steps below, we
can get to resiliency against deletion or resiliency against total reload.

> For example, here is a scheme where 'messages in the thread name the
> thread':
> 
> 1) don't publish thread-ids, but just message-ids... for example, a thread
> URL could be allowed to reference the message-id of 'any' message in the
> thread.... They could then include more than one message-id, making them
> resiliant to a lost messageid later. if some messageid are lost, hopefully
> a url someone is holding onto has another messageid that was not lost.
> 
This sounds good.  So instead of relying on the first message-id of the thread
we internally keep a mapping of all message-ids and stableurl hashes to
either an internal message-id or a tree of messages in the thread.

When deleting messages, always retain the message-id and stableurl hashes
for that message in the mapping.  That way a url that pointed to the thread
by that message-id will continue to function even though the message itself
has been deleted.

-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120406/00abdf1a/attachment.pgp>

From a.badger at gmail.com  Fri Apr  6 19:54:00 2012
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Fri, 6 Apr 2012 10:54:00 -0700
Subject: [Mailman-Developers] From the creation of a ThreadID
In-Reply-To: <CAL_0O19fP89g_jYH1rpkhUJZ4v+2D7FQRqi-VKeoQPmjgqvk2A@mail.gmail.com>
References: <1333633288.23207.26.camel@ambre.pingoured.fr>
	<CAL_0O19fP89g_jYH1rpkhUJZ4v+2D7FQRqi-VKeoQPmjgqvk2A@mail.gmail.com>
Message-ID: <20120406175400.GN11151@unaka.lan>

On Fri, Apr 06, 2012 at 12:10:22AM +0900, Stephen J. Turnbull wrote:
> Some Message-IDs will not have
> corresponding messages but that's always a problem with threading (see
> http://www.jwz.org/doc/threading.html, and RFC 5256).
> 
> There are other problems with threading that need to be dealt with as
> well, such as References being inconsistent across messages in the
> same thread and people who continue a thread with a new message, etc.
>
Looks like amk coded jqz's algorithm into a python library too:
  https://github.com/akuchling/jwzthreading

All other links to that code that I found (on amk.ca and bitbucket) were
broken so someone may want to clone that/ask andrew what's going on with it
:-)

-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120406/076060fa/attachment.pgp>

From richard at NFSNet.org  Sat Apr  7 04:48:46 2012
From: richard at NFSNet.org (Richard Wackerbarth)
Date: Fri, 6 Apr 2012 21:48:46 -0500
Subject: [Mailman-Developers] [Bug 967951] The LMTP runner should reject
	messages with duplicate Message-IDs
In-Reply-To: <20120406230612.6048.2774.launchpad@soybean.canonical.com>
References: <20120329030405.10615.26178.malonedeb@soybean.canonical.com>
	<20120406230612.6048.2774.launchpad@soybean.canonical.com>
Message-ID: <3AB034DF-713F-417A-BFE1-5522217F873C@NFSNet.org>

What is the issue here?
How far back in time is the runner expected to remember?

On Apr 6, 2012, at 6:06 PM, Barry Warsaw wrote:

> ** Changed in: mailman
>       Status: New => Confirmed
> 
> ** Changed in: mailman
>   Importance: Undecided => High
> 
> ** Changed in: mailman
>    Milestone: None => 3.0.0b2


From stephen at xemacs.org  Sat Apr  7 20:48:35 2012
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sun, 8 Apr 2012 03:48:35 +0900
Subject: [Mailman-Developers] From the creation of a ThreadID
In-Reply-To: <20120406175400.GN11151@unaka.lan>
References: <1333633288.23207.26.camel@ambre.pingoured.fr>
	<CAL_0O19fP89g_jYH1rpkhUJZ4v+2D7FQRqi-VKeoQPmjgqvk2A@mail.gmail.com>
	<20120406175400.GN11151@unaka.lan>
Message-ID: <CAL_0O18Argy6ZBRyi1t8y5X=fX6uRHWBqStJmoQhWuoZ07Uhag@mail.gmail.com>

Bill Janssen has one too, I forget it it's based on amk's or not, but
it's current.  See thread in email-sig:

http://mail.python.org/pipermail/email-sig/2012-January/000882.html

On Sat, Apr 7, 2012 at 2:54 AM, Toshio Kuratomi <a.badger at gmail.com> wrote:
> On Fri, Apr 06, 2012 at 12:10:22AM +0900, Stephen J. Turnbull wrote:
>> Some Message-IDs will not have
>> corresponding messages but that's always a problem with threading (see
>> http://www.jwz.org/doc/threading.html, and RFC 5256).
>>
>> There are other problems with threading that need to be dealt with as
>> well, such as References being inconsistent across messages in the
>> same thread and people who continue a thread with a new message, etc.
>>
> Looks like amk coded jqz's algorithm into a python library too:
> ?https://github.com/akuchling/jwzthreading
>
> All other links to that code that I found (on amk.ca and bitbucket) were
> broken so someone may want to clone that/ask andrew what's going on with it
> :-)
>
> -Toshio
>
> _______________________________________________
> Mailman-Developers mailing list
> Mailman-Developers at python.org
> http://mail.python.org/mailman/listinfo/mailman-developers
> Mailman FAQ: http://wiki.list.org/x/AgA3
> Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/
> Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/stephen%40xemacs.org
>
> Security Policy: http://wiki.list.org/x/QIA9

From syst3m.w0rm at gmail.com  Sun Apr  8 03:28:14 2012
From: syst3m.w0rm at gmail.com (Aamir Khan)
Date: Sun, 8 Apr 2012 06:58:14 +0530
Subject: [Mailman-Developers] Integrating HyperKitty with Mailman3
Message-ID: <CAOb12VVP53Y1Aq8MffWbUAUkKgra5-BWuunvvQCaUSxk36qFKA@mail.gmail.com>

I believe that after integrating HyperKitty with mailman, there will be
archiver['hyperkitty'] which can be used to archive the messages. Am i
correct?

http://packages.python.org/mailman/src/mailman/archiving/docs/common.html#sending-the-message-to-the-archiver

I know that mailman3 offers pluggable architecture, but still after going
through some documentation it is not apparent to me how exactly HyperKitty
will be integrated with mailman3. Can somebody briefly explain and point
out to relevant files in source code ?


-- 
Aamir Khan | 3rd Year  | Computer Science & Engineering | IIT Roorkee

From davidj at gmail.com  Sun Apr  8 07:53:59 2012
From: davidj at gmail.com (David Jeske)
Date: Sat, 7 Apr 2012 22:53:59 -0700
Subject: [Mailman-Developers] From the creation of a ThreadID
In-Reply-To: <20120406174946.GM11151@unaka.lan>
References: <1333633288.23207.26.camel@ambre.pingoured.fr>
	<CA+CP9O4AyX6RXbwHv6XT5GR5hqyjzWOkWb6GFY9yrEB-jOYpeA@mail.gmail.com>
	<20120406174946.GM11151@unaka.lan>
Message-ID: <CA+CP9O6C3eTMuLHHHqKoC26BhTPoGUFVYNjpjgmua6hVXLodwQ@mail.gmail.com>

On Apr 6, 2012 10:49 AM, "Toshio Kuratomi" <a.badger at gmail.com> wrote:
> > 1) don't publish thread-ids, but just message-ids... for example, a
thread
> > URL could be allowed to reference the message-id of 'any' message in the
> > thread.... They could then include more than one message-id, making them
> > resiliant to a lost messageid later. if some messageid are lost,
hopefully
> > a url someone is holding onto has another messageid that was not lost.
> >
> This sounds good.  So instead of relying on the first message-id of the
thread
> we internally keep a mapping of all message-ids and stableurl hashes to
> either an internal message-id or a tree of messages in the thread.

I think of this as keeping a mapping from "rfc822 message-id" to "internal
thread-id". I think you are using different words to say the same thing.

> When deleting messages, always retain the message-id and stableurl hashes
> for that message in the mapping.  That way a url that pointed to the
thread
> by that message-id will continue to function even though the message
itself
> has been deleted.

Perhaps I misunderstood. If you are going to have a record of the deletion
(i.e. you can keep a deleted message around in some form), this problem
becomes much easier. I thought this desire was to have stable urls and
threads when you rebuild and a message is missing.

Absolutly if there is a message 'deletion' feature, it should delete the
message contents but leave a 'stub' that links the message-id and
references/in-reply-to, so it can help hold the thread together during a
rebuild. My memory is foggy, but I think we used a technique like this in
Yahoo Groups.

From stephen at xemacs.org  Sun Apr  8 11:14:25 2012
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sun, 8 Apr 2012 18:14:25 +0900
Subject: [Mailman-Developers] From the creation of a ThreadID
In-Reply-To: <CA+CP9O6C3eTMuLHHHqKoC26BhTPoGUFVYNjpjgmua6hVXLodwQ@mail.gmail.com>
References: <1333633288.23207.26.camel@ambre.pingoured.fr>
	<CA+CP9O4AyX6RXbwHv6XT5GR5hqyjzWOkWb6GFY9yrEB-jOYpeA@mail.gmail.com>
	<20120406174946.GM11151@unaka.lan>
	<CA+CP9O6C3eTMuLHHHqKoC26BhTPoGUFVYNjpjgmua6hVXLodwQ@mail.gmail.com>
Message-ID: <CAL_0O1_QFz473w-D3AgEcJW95LYnZxyC03W4cKmn9y=2VR5M0g@mail.gmail.com>

On Sun, Apr 8, 2012 at 2:53 PM, David Jeske <davidj at gmail.com> wrote:

> Perhaps I misunderstood. If you are going to have a record of the deletion
> (i.e. you can keep a deleted message around in some form), this problem
> becomes much easier.

In practice, complete deletion is occasionally necessary.  The
archives should be robust to that.

> Absolutly if there is a message 'deletion' feature, it should delete the
> message contents but leave a 'stub' that links the message-id and
> references/in-reply-to, so it can help hold the thread together during a
> rebuild.

If a message is actually a referencable member of a thread, there will
be references to it in other messages.  Only in the (increasingly
rare) case of a thread held together only by In-Reply-Tos will the
thread be cut by removing a message; otherwise the reference headers
are enough to rebuild.

I think it's reasonable to leave a stub whose only content is "This
message was administratively removed" plus the References, Message-ID,
and Date header fields.  I don't know what to do about the required
>From field, but since it's not going out on the wire in a certain
technical sense the RFC doesn't apply.  Alternatively, use 'From: "J.
Redacted User" <anonymous at example.com>".  The only real problem I can
see with this is that third parties who see it may go searching
personal archives for a local copy of the offending message, which
goes against the spirit of deletion -- better to pretend the message
was never publicly posted.

From barry at list.org  Sun Apr  8 18:39:21 2012
From: barry at list.org (Barry Warsaw)
Date: Sun, 8 Apr 2012 12:39:21 -0400
Subject: [Mailman-Developers] Integrating HyperKitty with Mailman3
In-Reply-To: <CAOb12VVP53Y1Aq8MffWbUAUkKgra5-BWuunvvQCaUSxk36qFKA@mail.gmail.com>
References: <CAOb12VVP53Y1Aq8MffWbUAUkKgra5-BWuunvvQCaUSxk36qFKA@mail.gmail.com>
Message-ID: <20120408123921.0ff6d14f@limelight.wooz.org>

On Apr 08, 2012, at 06:58 AM, Aamir Khan wrote:

>I believe that after integrating HyperKitty with mailman, there will be
>archiver['hyperkitty'] which can be used to archive the messages. Am i
>correct?

Yes, but that's mostly an implementation detail you don't need to worry
about.  config.archivers is just for internal bookkeeping and use in the
ArchiveRunner.

>http://packages.python.org/mailman/src/mailman/archiving/docs/common.html#sending-the-message-to-the-archiver

>I know that mailman3 offers pluggable architecture, but still after going
>through some documentation it is not apparent to me how exactly HyperKitty
>will be integrated with mailman3. Can somebody briefly explain and point out
>to relevant files in source code ?

It's relatively straightforward, once you understand how the configuration
system works.

The file src/mailman/config/schema.cfg is a kind of template for the
mailman.cfg ini file.  Search down for the [archive.master] section; this is a
template for other [archiver.foo] sections.  In here, you'll see all the
default variables and their values for configuring an archiver.

Look a little farther down and you'll see for example the [archiver.prototype]
section which provides just the relevant overrides for the `prototype`
archiver.

So, to enable hyperkitty, you would have to add something like the following
section in your mailman.cfg file:

-----snip snip-----
[archiver.hyperkitty]
class: python.path.to.hyperkitty.HyperKitty
-----snip snip-----

Of course, you'd probably want to `enable` it too.

One tricky thing here is that the `class` value names a Python dotted-module
path, so the class must be importable.  Ensuring that the hyperkitty module
(and this is just a suggestion, YMMV) is importable by the core engine may not
be fully baked.  For now, just set $PYTHONPATH.

The final bit of the puzzle is that the python.path.to.hyperkitty.HyperKitty
class must implement the IArchiver interface, although if it's impossible to
implement something like permalink(), it should just raise a
NotImplementedError.  Take a look at the Prototype class for an example.

Note that all the magic of getting a message into the archiver happens through
the archive_message() method of IArchiver.  This can do anything you need to
do to inject the message into the archiver.  It can make direct Python calls,
like the prototype archiver does, or it shell out to a command like the
MHonArc archiver does, or it can send an email like the MailArchive one does.

Hope that helps.
-Barry


From barry at list.org  Sun Apr  8 19:00:06 2012
From: barry at list.org (Barry Warsaw)
Date: Sun, 8 Apr 2012 13:00:06 -0400
Subject: [Mailman-Developers] [Bug 967951] The LMTP runner should reject
 messages with duplicate Message-IDs
In-Reply-To: <3AB034DF-713F-417A-BFE1-5522217F873C@NFSNet.org>
References: <20120329030405.10615.26178.malonedeb@soybean.canonical.com>
	<20120406230612.6048.2774.launchpad@soybean.canonical.com>
	<3AB034DF-713F-417A-BFE1-5522217F873C@NFSNet.org>
Message-ID: <20120408130006.7903f75d@limelight.wooz.org>

On Apr 06, 2012, at 09:48 PM, Richard Wackerbarth wrote:

>What is the issue here?
>How far back in time is the runner expected to remember?

Forever? :)

What I'm thinking is that early in the process we'll register every Message-ID
we've seen (or maybe it should be every one we've accepted) in the message
store.  Then the LMTP server can check the message store before it accepts any
new message.

I should note though that I've tried several branches in the (distant-ish)
past to implement this, and there are lots of tricky details to work out,
especially around some assumptions in the test suite.

I marked it High because I think it should be tackled once more before 3.0
final.

Cheers,
-Barry

>On Apr 6, 2012, at 6:06 PM, Barry Warsaw wrote:
>
>> ** Changed in: mailman
>>       Status: New => Confirmed
>> 
>> ** Changed in: mailman
>>   Importance: Undecided => High
>> 
>> ** Changed in: mailman
>>    Milestone: None => 3.0.0b2

From barry at list.org  Sun Apr  8 19:05:31 2012
From: barry at list.org (Barry Warsaw)
Date: Sun, 8 Apr 2012 13:05:31 -0400
Subject: [Mailman-Developers] Automating Migration from MM2 to MM3
In-Reply-To: <F0AEF025-C82A-4920-BE28-617DB0EC42F9@NFSNet.org>
References: <F0AEF025-C82A-4920-BE28-617DB0EC42F9@NFSNet.org>
Message-ID: <20120408130531.15bfdf43@limelight.wooz.org>

(Removing mailman-coders which should just be fore commit messages.)

On Apr 06, 2012, at 09:41 PM, Richard Wackerbarth wrote:

>In order to provide migration from MM2 to MM3, we will need to reformat the
>list configuration, membership rosters and the user preferences.

LP: #965532

>What should we do about: pending messages and the message archives?

I think pending messages *could* be upgraded, but it would probably be a
pain.

>Is it sufficient to assume that the MM2 lists will be shut down and all
>pending messages (and digests) will be delivered/flushed before restarting
>the list on the MM3 server?

I think this is a fair requirement.  Basically, you need a clean, non-running
system in order to fully upgrade.

>Do we need to provide a legacy interface to pipermail, thus "kicking the
>issue down the road", or Will someone migrate the legacy archives, or Will
>the user be expected to maintain two archives?

I think they'll probably *have* to maintain two archives, if they want to
continue to support legacy URLs.  Pipermail just has too many problems too
ensure that you could regenerate the archive even with itself and guarantee
your urls won't change.  You wouldn't want to break the googles. :)

So I think the safest thing for a site to do would be to leave the old
Pipermail archives around, and then regenerate the new archives from the .mbox
file.

>Have I omitted any additional transition issues?

Probably, but I can't think of what atm. :)

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120408/ef0dd177/attachment.pgp>

From barry at list.org  Sun Apr  8 19:38:16 2012
From: barry at list.org (Barry Warsaw)
Date: Sun, 8 Apr 2012 13:38:16 -0400
Subject: [Mailman-Developers] From the creation of a ThreadID
In-Reply-To: <4F7E0EB6.7030905@zone12.com>
References: <1333633288.23207.26.camel@ambre.pingoured.fr>
	<4F7E0EB6.7030905@zone12.com>
Message-ID: <20120408133816.29cb74b4@limelight.wooz.org>

On Apr 05, 2012, at 05:29 PM, Terri Oda wrote:

>I haven't read the whole thread so maybe someone else has mentioned this, but
>we may want to take advantage of the dynamic sublists code for this, since it
>produces "conversations" or "topics" sublists and already has to generate and
>maintain a code for each.  Rather than messageids these are meant to be a bit
>more human-readable, so they're often words with numbers suffixed.  But yeah;
>there exists code for Mailman 2.1 that might be reusable here, and there's a
>GSoC project on the table to port to 3.0 so this might be a thing that we
>could pass to the archive utility.

Don't forget too that we have the Stable URL proposal, which turns arbitrary
Message-IDs into 32 upper-case ASCII letter and digit character base 32
hashes.

-Barry

From richard at NFSNet.org  Sun Apr  8 20:11:57 2012
From: richard at NFSNet.org (Richard Wackerbarth)
Date: Sun, 8 Apr 2012 13:11:57 -0500
Subject: [Mailman-Developers] From the creation of a ThreadID
In-Reply-To: <20120408133816.29cb74b4@limelight.wooz.org>
References: <1333633288.23207.26.camel@ambre.pingoured.fr>
	<4F7E0EB6.7030905@zone12.com>
	<20120408133816.29cb74b4@limelight.wooz.org>
Message-ID: <5C7FAC2C-67D1-4C23-B104-F2963750E6E5@NFSNet.org>

I would propose a slightly different scheme for converting messages to stable URIs..

If we create our ID by concatenation of some hash and a part of the date, then the mail server need remember only those messages that fall in the same date-sensitive part of the namespace. It can "forget" about ancient history.
Further, if we maintain sufficient Hamming distance, we can perform "error correction" (mapping multiple IDs to the same canonical one)) and, thus compensate for minor encoding differences caused by timing skew.


On Apr 8, 2012, at 12:38 PM, Barry Warsaw wrote:

> On Apr 05, 2012, at 05:29 PM, Terri Oda wrote:
> 
>> I haven't read the whole thread so maybe someone else has mentioned this, but
>> we may want to take advantage of the dynamic sublists code for this, since it
>> produces "conversations" or "topics" sublists and already has to generate and
>> maintain a code for each.  Rather than messageids these are meant to be a bit
>> more human-readable, so they're often words with numbers suffixed.  But yeah;
>> there exists code for Mailman 2.1 that might be reusable here, and there's a
>> GSoC project on the table to port to 3.0 so this might be a thing that we
>> could pass to the archive utility.
> 
> Don't forget too that we have the Stable URL proposal, which turns arbitrary
> Message-IDs into 32 upper-case ASCII letter and digit character base 32
> hashes.
> 
> -Barry


From barry at list.org  Sun Apr  8 23:35:46 2012
From: barry at list.org (Barry Warsaw)
Date: Sun, 8 Apr 2012 17:35:46 -0400
Subject: [Mailman-Developers] From the creation of a ThreadID
In-Reply-To: <CA+CP9O6C3eTMuLHHHqKoC26BhTPoGUFVYNjpjgmua6hVXLodwQ@mail.gmail.com>
References: <1333633288.23207.26.camel@ambre.pingoured.fr>
	<CA+CP9O4AyX6RXbwHv6XT5GR5hqyjzWOkWb6GFY9yrEB-jOYpeA@mail.gmail.com>
	<20120406174946.GM11151@unaka.lan>
	<CA+CP9O6C3eTMuLHHHqKoC26BhTPoGUFVYNjpjgmua6hVXLodwQ@mail.gmail.com>
Message-ID: <20120408173546.1e04728d@limelight.wooz.org>

On Apr 07, 2012, at 10:53 PM, David Jeske wrote:

>Perhaps I misunderstood. If you are going to have a record of the deletion
>(i.e. you can keep a deleted message around in some form), this problem
>becomes much easier. I thought this desire was to have stable urls and
>threads when you rebuild and a message is missing.
>
>Absolutly if there is a message 'deletion' feature, it should delete the
>message contents but leave a 'stub' that links the message-id and
>references/in-reply-to, so it can help hold the thread together during a
>rebuild. My memory is foggy, but I think we used a technique like this in
>Yahoo Groups.

I like the scheme outlined by Toshio where (IIRC) any message-id can be used
to index into its thread.  I also agree with David that a deletion should keep
enough of a stub around to maintain consistent thread links.  I think this is
also important for the end-user.

Imagine you've found a particular taken-down message through a search engine
cache.  You then follow the url.  I think it would be better to give them an
informative message about the take-down rather than just 404'ing the url
(although the latter or similar might also be useful for spiders so that they
know the message is no longer available).

Stephen observes that complete deletion is occasionally necessary.  While
true, I still think a placeholder/stub could be inserted to keep the thread
integrity whole.

Cheers,
-Barry

From barry at list.org  Sun Apr  8 23:35:54 2012
From: barry at list.org (Barry Warsaw)
Date: Sun, 8 Apr 2012 17:35:54 -0400
Subject: [Mailman-Developers] What is "GNU Mailman?" (was Re: mailman /
 archive-ui / licensing questions)
In-Reply-To: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
Message-ID: <20120408173554.2a5a64a2@limelight.wooz.org>

See what happens when you go on vacation?  So many interesting issues to
untangle!  I'll try to catch up but my responses will no doubt be somewhat
fractured.

First of all, thanks David for bringing ClearSilver to our attention and
offering it to the Mailman project.  We can all agree that Pipermail is dated,
is missing key features, and needs to be replaced.  It's been removed from the
lp:mailman branch.

What does it mean to be part of the "GNU Mailman" project?

This used to be an easy question to answer because everything was bundled.
When you download the mm2.1 tarball, you get an archiver, a web ui, and an
engine.  Everything is GPLv2+'d with copyright owned by the FSF.  Life is
simple.

Now we have at least two separate subprojects, the core and the web ui.  It's
very likely that what we'll call "GNU Mailman 3.0" will be just the core.
For various reasons, this needs to be released as soon as possible, but it's
important to understand that it won't be a full replacement for Mailman 2.1.
I think of it more like the Python 3.0 release - a critical milestone for the
project, usable on its own, but with deficiencies we both know about and don't
yet know.  We'll need to manage expectations, but it's also true that it just
will not get a full workout until it's got that "final" stamp on it.  We can't
wait for Postorius or an official archiver, but both those will probably be
included in future releases.

One of the core principles of mm3 is that site admins get more choices.  You
can use Postorius as your web ui, but it's also easy to use something else, or
no web ui at all.  Want to use your own archiver?  No problem.  Want to throw
all your data into PostgreSQL and drive the user database off your corporate
databa$e?  No problem (hopefully :).  So again, in a mm3 world, what does it
mean to be part of the GNU Mailman project?

Here are some of my own principles, and I'd like to hear yours.

No fiefdoms in the code.  I much prefer projects where everyone feels a shared
sense of stewardship for the code base.  Of course we'll have experts in one
area or another, and everyone is going to have an opinion about how things
should work.  Hopefully you won't depose me as BDFL just yet.  But I do think
that nobody should have veto power over any particular aspect of the code.  An
expert, or even myself, should be able to make a convincing technical argument
for why something should be done or not done, and that should hold sway over a
collective solution.  Now, it might be because one way is the right way to do
it, or because another way is the expedient way.  And not everyone will agree
with every decision, but I also think it's important to fight the good fight
and then work hard to make this a successful collaboration.  Of course, you're
never forced to hack on something you disagree with.  Almost above all else,
this should be *fun* :).

This relates to big code donations like the archiver.  Once we as the Mailman
project accept something under our umbrella, we all have the right and
responsibility to dig in and hack on it.  In Python, I don't think it's really
worked out very well to have some big donated module be "owned" by one person.
There are both pros and cons for getting subsumed into a project, so only do
it when everyone understand and agrees to that.

As I mentioned, bundling and releasing was easy in 2.1.  It'll be easy in 3.0
only because that will probably just include the core.  What does a release
look like in Mailman 3.1 and beyond?  How do we take all these disparate
projects (Postorius, the API client, an archiver, etc.) and release these in
an easy to download and install format?  I'm still not sure, and I'm not
holding up the 3.0 release to figure that out, but we will have to figure that
out at some point (and probably get it wrong the first few times :).

Licensing was also an easy decision.  Everything was GPLv2+.  Now the core is
GPLv3+, as is Postorius.  I'm not a licensing zealot; I'm pretty much happy to
hack on anything that has a FLOSS license.  I think a copyleft Python would
fail miserably, but copyleft has worked well for Mailman for probably 15
years, and it's too late to change the license.

I'm happy for people to make money off of Mailman, but the GPL helps ensure
and encourage that folks give back to the project.  GPLv3+ seems right for the
core, and pretty right for the web ui (AGPL might be better, but Toshio has
identified some problems in practice with it).  I would certainly prefer that
any archiver that gets bundled under the GNU Mailman moniker be copyleft, with
copyright assigned to the FSF.  The core and web-ui are already structured
this way, so I think the consistency ultimately makes our lives, and more
importantly the lives of our users, easier.

Those are my preferences anyway.  Maybe copyright assignment to the FSF isn't
right for the archiver, but I need a more convincing argument than "I don't
like it".  Same for choice of license.  From a project management perspective,
consistency is a big win, so convince us why it's better, or at least okay, to
have different licensing and ownership regimes under a single project's
banner.

Oh, one more principle I'd like to maintain: please write it in Python.  No
disrespect to other languages, but Python is just more fun and consistency
counts.  Okay, okay, you can include some JavaScript for the web bling if you
must. :)

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120408/ebe269f2/attachment.pgp>

From barry at list.org  Sun Apr  8 23:48:55 2012
From: barry at list.org (Barry Warsaw)
Date: Sun, 8 Apr 2012 17:48:55 -0400
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
Message-ID: <20120408174855.35a479ac@limelight.wooz.org>

On Apr 02, 2012, at 08:04 PM, David Jeske wrote:

>The question i "would you BUNDLE another archiver even if the licenses
>don't match?"

If you're donating the archiver to the GNU Mailman project, for us to
maintain, release, bundle, and develop, then I think that would be a very high
hurdle to overcome.  Sorry, but it just is.

I really don't want our developers to have to think about whether they can
copy a chunk of useful code from the archiver to the core.  Or whether they
can refactor some web-ui code, developed under the GPLv3+, and re-use it as a
library in the S-BSD licensed archiver code base.  I know with absolute
certainty that I personally don't want to have to think about stuff like
that.  Do you really want to spend your time trying to figure out all the
insane legalistic conundrums that's going to bring up?

>My archiver has been available for download (like many others) for ten
>years. All these sites are still running a limping pipermail archive,
>because it's bundled. I want to get Mailman a better bundled archive.

Which is fantastic, and which I fully encourage.

One of the reasons why Pipermail is so ubiquitous is that it was bundled with
Mailman 2.1.  But another reason is that it was so painful to replace.
Mailman 3's architecture fixes the latter, and `bzr rm` fixed the former.

>HOWEVER, I personally will not write GPL code. I might submit a tiny patch
>or bugfix, but I'm simply opposed to restrictions on how someone uses
>something that I'm trying to donate to the software community. (i.e. you're
>never going to turn me into a mailman developer, the best you'd get is me
>writing my own mailman-ish and releassing it under S-BSD.. if you want
>that, let me know)

I'm not going to spend time on this list arguing for the GPL.  The bottom line
is that the core, and by extension the web ui, are GPLv3+ and that cannot be
changed.  Having a different licensing and ownership regime for one component
of the project will make our lives more difficult, and drain resources from
developers who would rather hack than worry about legal crap.

Probably the only way I'd change my mind about that is if RMS personally told
us that we could still treat the non-copyleft donation the same way we treat
all the other code, i.e. we can use the code and freely copy between them
without any additional administrative overhead.

Cheers,
-Barry

From barry at list.org  Sun Apr  8 23:53:02 2012
From: barry at list.org (Barry Warsaw)
Date: Sun, 8 Apr 2012 17:53:02 -0400
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <20120403185822.GI11151@unaka.lan>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
	<20120403185822.GI11151@unaka.lan>
Message-ID: <20120408175302.6d0ef973@limelight.wooz.org>

On Apr 03, 2012, at 11:58 AM, Toshio Kuratomi wrote:

>Distributed/pointed to by list.org along with mailman and postorius might be
>negotiable though :-)

Absolutely.  I'm committed to making it as easy as possible for an admin to
integrate third-party FLOSS archivers with mm3.

>I don't think you're going to find the will to make this sort of decision
>right at this instant because what we want the archiver ecosystem to look
>like for mailman3 is somewhat in the air.  Do we really want an obviously
>less capable archiver to be the bundled archiver?  Do we want to have
>a single blessed archiver (probably in a separate tarball as postorius, the
>admin web ui, is separate) as an eventual goal?  Do we want (at least for
>a year or two) to let people go to town with their new ideas for archivers
>and then see if a best-of-breed archiver is raising its head?  I don't
>believe any of this is decided inside of our minds yet, so, for now, people
>are defaulting to wait and see.

A hearty +1 to all of the above.

I know for sure that 3.0 final won't be held up for lack of a robust
archiver.  Having this conversation now is important for future releases
though.

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120408/48893bf1/attachment.pgp>

From barry at list.org  Mon Apr  9 00:01:58 2012
From: barry at list.org (Barry Warsaw)
Date: Sun, 8 Apr 2012 18:01:58 -0400
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <CA+CP9O4t6GciMjQ6fg4Lv1__GypsNS7HzrKdbmJ+Q974Euu_wA@mail.gmail.com>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
	<20120403185822.GI11151@unaka.lan>
	<CAL_0O1-zd9C4YgcTLiB_SD5ivjVcgDTGvTfJ7n3J_Xhp83BeuA@mail.gmail.com>
	<4F7BE7F4.7070607@zone12.com>
	<CA+CP9O4t6GciMjQ6fg4Lv1__GypsNS7HzrKdbmJ+Q974Euu_wA@mail.gmail.com>
Message-ID: <20120408180158.4e205e28@limelight.wooz.org>

On Apr 03, 2012, at 11:56 PM, David Jeske wrote:

>If this GPL/S-BSD issue turns out to be a blocker, then I'll just make my
>own site and maintain (my version) there because I want to release my code
>S-BSD.
>
>Also, there will be *zero* ill-will if you folks want to wrap it up in a
>GPL license and stick it into mailman... i just won't be maintaining that,
>or assigning copyright, and any patches I make will be into my S-BSD tree.
>Perhaps not ideal, but still seems a better outcome than pipermail.

David, there's one thing that's not clear to me.  If you donated the code to
GNU Mailman and we bundled it under our banner, would you continue to
maintain, develop, and release it as a separate project?

-Barry

From barry at list.org  Mon Apr  9 00:09:47 2012
From: barry at list.org (Barry Warsaw)
Date: Sun, 8 Apr 2012 18:09:47 -0400
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <CA+CP9O4E3x+LE-H4jcBwirxEkkPJGoHxiZX1mi691DGKCdVYRQ@mail.gmail.com>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
	<20120403185822.GI11151@unaka.lan>
	<CA+CP9O4E3x+LE-H4jcBwirxEkkPJGoHxiZX1mi691DGKCdVYRQ@mail.gmail.com>
Message-ID: <20120408180947.30630aca@limelight.wooz.org>

On Apr 03, 2012, at 09:33 PM, David Jeske wrote:

>I'm just going to charge down the path I was on and finish up something
>that's a great drop in for MM2/MM3. I'll even try to add some pipermail URL
>compatibility.

I think that's an excellent way to go, especially right now.

>I'm a bit scared of a world where MM3 does not include any archiver. If
>pipermail popularity is any indication of how often admins 'stick with the
>bundled defaults', we could have an unreasonable number of MM3 lists with
>no archives at all.

Eventually, the GNU Mailman project will bundle a real archiver (not just the
dumb prototype one).  It won't happen for 3.0 so there's plenty of time for
folks to charge ahead and make the case for their favorite through
awesomeness.

>Obviously the team is free to bless any archiver it wants, mine or others.

It depends on what "bless" means.  I think as a project we should make it easy
to integrate Mailman with any FLOSS archiver.  I'm also quite happy to include
the IArchiver implementation shim in the core for any FLOSS archiver (as long
as we can test it!) so that a site admin only needs to flip a few mailman.cfg
switches to turn it on.

It gets a lot more complicated if "bless" means to borg it into our project
management, legal, release, and development structure.

Cheers,
-Barry

From barry at list.org  Mon Apr  9 00:14:46 2012
From: barry at list.org (Barry Warsaw)
Date: Sun, 8 Apr 2012 18:14:46 -0400
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <20120404031831.M75170@nleaudio.com>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
	<20120403185822.GI11151@unaka.lan>
	<CAKTje6FtA2i9F9VvbcSDPkhrOinB84-N86yTdQXdsiPTjo993A@mail.gmail.com>
	<20120404031831.M75170@nleaudio.com>
Message-ID: <20120408181446.45756d7d@limelight.wooz.org>

On Apr 03, 2012, at 11:21 PM, Bob Puff wrote:

>I think the majority of MM users will be simply using the RPM that comes with
>their distro, and there is a real benefit to stuff working right "out of the
>box".  This includes the Archiving functions.  

Distros are of course free to make their own opinionated decisions about how
components work together.  Think about the rest of the email stack: a distro
makes decisions about MTA, antispam, IMAP/POP servers, etc. etc. including how
they all work together and how much effort it takes to configure and run those
services.  Heck, entire businesses are springing up over service provisioning.

So I have full confidence that distros will make things way more easy for
people than it would be if you had to download and install all the individual
upstream source packages.

I think our job as a project is to make that possible, and easy.  A secondary
job is to make our own opinionated choices where appropriate.  We're not yet
there with the archiver, IMHO.

-Barry

From barry at list.org  Mon Apr  9 00:17:12 2012
From: barry at list.org (Barry Warsaw)
Date: Sun, 8 Apr 2012 18:17:12 -0400
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <CA+CP9O6u3y7iOzmpdiry2WNzotu29iRx+UXb_Ub+DKvbjhkCog@mail.gmail.com>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
	<20120403185822.GI11151@unaka.lan>
	<CAKTje6FtA2i9F9VvbcSDPkhrOinB84-N86yTdQXdsiPTjo993A@mail.gmail.com>
	<20120404031831.M75170@nleaudio.com>
	<CA+CP9O6u3y7iOzmpdiry2WNzotu29iRx+UXb_Ub+DKvbjhkCog@mail.gmail.com>
Message-ID: <20120408181712.2e375a0d@limelight.wooz.org>

On Apr 03, 2012, at 09:16 PM, David Jeske wrote:

>I'd personally like to see a better archiver rolled into an MM2 point
>release, as well as upcoming MM3 development. (I understand pipermail URL
>compat would be nice in that case).

I'd strongly oppose any change in default archiver for Mailman 2.1.

I don't think it's possible to make that decision yet for Mailman 3.0.

Including a default archiver for Mailman 3.1 should be a top priority.  A web
ui should be a priority as well!

-Barry

From barry at list.org  Mon Apr  9 00:19:53 2012
From: barry at list.org (Barry Warsaw)
Date: Sun, 8 Apr 2012 18:19:53 -0400
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <PC19520120405171809025029ee0443@MSAPIRO>
References: <CAL_0O1-WvuRi1onM_-CECSd9Gsr45KddDKNpY1pP1aDQ7ZYFvA@mail.gmail.com>
	<PC19520120405171809025029ee0443@MSAPIRO>
Message-ID: <20120408181953.0778290e@limelight.wooz.org>

On Apr 05, 2012, at 05:18 PM, Mark Sapiro wrote:

>I'd like to see a default install provide list owners with at a minimum
>a choice of public, private or no archives and the archives to be
>searchable.

See also Jeff's first paragraph in comment #1 here:

https://bugs.launchpad.net/mailman/+bug/967238

-Barry

From barry at list.org  Mon Apr  9 00:29:00 2012
From: barry at list.org (Barry Warsaw)
Date: Sun, 8 Apr 2012 18:29:00 -0400
Subject: [Mailman-Developers] From the creation of a ThreadID
In-Reply-To: <5C7FAC2C-67D1-4C23-B104-F2963750E6E5@NFSNet.org>
References: <1333633288.23207.26.camel@ambre.pingoured.fr>
	<4F7E0EB6.7030905@zone12.com>
	<20120408133816.29cb74b4@limelight.wooz.org>
	<5C7FAC2C-67D1-4C23-B104-F2963750E6E5@NFSNet.org>
Message-ID: <20120408182900.7251d339@limelight.wooz.org>

On Apr 08, 2012, at 01:11 PM, Richard Wackerbarth wrote:

>I would propose a slightly different scheme for converting messages to stable
>URIs..
>
>If we create our ID by concatenation of some hash and a part of the date,
>then the mail server need remember only those messages that fall in the same
>date-sensitive part of the namespace. It can "forget" about ancient history.

Hi Richard,

We had a very lengthy discussion about the hash a year or so ago, when the
current algorithm was agreed upon.  I'm too swamped at the moment to dig up
the links, but look for input from Jeff Breidenbach and Jeff Marshall.

The conclusion was that Message-ID was both sufficient and preferable as the
sole input into the X-Message-ID-Hash value used for stable URLs.

Of course date information could certainly be used to determine expiration
from any kind of Message-ID cache for LMTP acceptance purposes.  It doesn't
have to be part of the hash input for that.

Note though that Mailman has long had a feature to "clobber" the Date header
when forwarding the message on to the archive.  In mm2.1 this was closely tied
to Pipermail, but in mm3 this can be enabled for any archiver.  The problem
was that Date headers can get skewed enough that it would cause threading
problems in Pipermail.  It's probably true that most bogus Date headers come
from spam (trying to get their message at the top or bottom of my date sorted
inbox summary).

>Further, if we maintain sufficient Hamming distance, we can perform "error
>correction" (mapping multiple IDs to the same canonical one)) and, thus
>compensate for minor encoding differences caused by timing skew.

Hmm, I'm having trouble seeing how useful this would be if the Date is not
used to calculate the stable url.

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120408/5b4b8fcb/attachment.pgp>

From barry at list.org  Mon Apr  9 01:00:10 2012
From: barry at list.org (Barry Warsaw)
Date: Sun, 8 Apr 2012 19:00:10 -0400
Subject: [Mailman-Developers] triaging the remaining bugs for 3.0 final
Message-ID: <20120408190010.4116fbb6@limelight.wooz.org>

I see the light at the end of the Mailman 3.0 final release tunnel.

I spent a few hours triaging all the bugs on Launchpad tagged with
'mailman3'.  For those which I think are important to fix or investigate for
3.0 final, I marked with an importance of Critical or High.  The difference
there being Critical bugs I'd like to fix in beta2 whereas High bugs should be
fixed before the final release, and may get knocked down in priority.  There
should be no Medium or Undecided bugs (the latter are possible if they are
also Incomplete).

Low bugs are those which would be nice to fix but won't block the 3.0 final
release.

For Status, Confirmed means I really do think it's a bug.  Triaged just means
I've looked at it but haven't yet decided whether it's a bug or not.

Here's how to find all the relevant bugs for the mm3.0 final release:

http://tinyurl.com/7799oek

So, this means that if you're looking to help out, start with Critical bugs,
then High bugs.  Of course, if you find a Low bug that itches you, feel free
to take a crack at it!

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120408/0497cce5/attachment-0001.pgp>

From bwinton at mozilla.com  Mon Apr  9 01:14:28 2012
From: bwinton at mozilla.com (Blake Winton)
Date: Sun, 08 Apr 2012 19:14:28 -0400
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <20120408174855.35a479ac@limelight.wooz.org>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
	<20120408174855.35a479ac@limelight.wooz.org>
Message-ID: <4F821BD4.1080807@mozilla.com>

On 08-04-12 17:48 , Barry Warsaw wrote:
> On Apr 02, 2012, at 08:04 PM, David Jeske wrote:
>> The question i "would you BUNDLE another archiver even if the licenses
>> don't match?"
> If you're donating the archiver to the GNU Mailman project, for us to
> maintain, release, bundle, and develop, then I think that would be a very high
> hurdle to overcome.  Sorry, but it just is.
Would it work for everyone if David licensed the archiver to Mailman 
under the GPLv3+?

(There could still be a question about the license for contributed 
patches over whether they could be pulled back into the main tree or 
not, but as long as it was reasonably clear one way or the other, I 
don't think it would be a problem in practice.  On the other hand, I am 
an optimist...  ;)

Later,
Blake.

-- 
Blake Winton   Thunderbird User Experience Lead
bwinton at mozilla.com


From bwinton at mozilla.com  Mon Apr  9 01:27:49 2012
From: bwinton at mozilla.com (Blake Winton)
Date: Sun, 08 Apr 2012 19:27:49 -0400
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <4F821E2F.1010203@msapiro.net>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
	<20120408174855.35a479ac@limelight.wooz.org>
	<4F821BD4.1080807@mozilla.com> <4F821E2F.1010203@msapiro.net>
Message-ID: <4F821EF5.10708@mozilla.com>

On 08-04-12 19:24 , Mark Sapiro wrote:
> On 04/08/2012 04:14 PM, Blake Winton wrote:
>> Would it work for everyone if David licensed the archiver to Mailman
>> under the GPLv3+?
> It won't work for David. See, e.g.,
> <http://mail.python.org/pipermail/mailman-developers/2012-April/021921.html>
Well, that's not exactly what David said.  ;)

(I'm not proposing he stops releasing it under S-BSD, just that he 
re-licenses the copy in Mailman as GPL.  So he can continue to work on 
the code and release it under a permissive license, but Mailman can also 
use and distribute it. )

Later,
Blake.

-- 
Blake Winton   Thunderbird User Experience Lead
bwinton at mozilla.com


From mark at msapiro.net  Mon Apr  9 01:24:31 2012
From: mark at msapiro.net (Mark Sapiro)
Date: Sun, 08 Apr 2012 16:24:31 -0700
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <4F821BD4.1080807@mozilla.com>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
	<20120408174855.35a479ac@limelight.wooz.org>
	<4F821BD4.1080807@mozilla.com>
Message-ID: <4F821E2F.1010203@msapiro.net>

On 04/08/2012 04:14 PM, Blake Winton wrote:

> Would it work for everyone if David licensed the archiver to Mailman
> under the GPLv3+?


It won't work for David. See, e.g.,
<http://mail.python.org/pipermail/mailman-developers/2012-April/021921.html>

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan


From stephen at xemacs.org  Mon Apr  9 04:05:20 2012
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Mon, 9 Apr 2012 11:05:20 +0900
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <4F821EF5.10708@mozilla.com>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
	<20120408174855.35a479ac@limelight.wooz.org>
	<4F821BD4.1080807@mozilla.com> <4F821E2F.1010203@msapiro.net>
	<4F821EF5.10708@mozilla.com>
Message-ID: <CAL_0O1_ejxyhprf3C0sZtBN16GXNdT0DDYU8xV-U_8DvNo+sXA@mail.gmail.com>

On Mon, Apr 9, 2012 at 8:27 AM, Blake Winton <bwinton at mozilla.com> wrote:
> On 08-04-12 19:24 , Mark Sapiro wrote:
>>
>> On 04/08/2012 04:14 PM, Blake Winton wrote:
>>>
>>> Would it work for everyone if David licensed the archiver to Mailman
>>> under the GPLv3+?
>>
>> It won't work for David.
>
> Well, that's not exactly what David said. ?;)

No, that *is* what David said, and repeatedly.  He will not license
under GPL, period.

What he has also said is that he would be happy to maintain his
original distribution in parallel to a GPLed branch bundled with
Mailman.  He would be willing to do a (very) small amount of work to
keep them in sync, I believe, but his releases will be under
simplified BSD so any contributions that he is going to maintain must
be licensed that way.

This matters because, in practice, if there are significant
contributions under GPL to the Mailman branch, it will become a real
(though friendly) fork, and we will lose the benefit of David's
maintenance because we'll have to integrate his changes into our
branch.  He won't do that for us any more.

I personally see that as win-win.  Barry doesn't, presumably because
(1) to keep David as maintainer means that contributions either need
to go through him (implicitly making themm BSD), or we'll need to do
some legal dance to explicitly relicense every such contribution BSD
(since in practice our contributor agreement will make any
contribution to Mailman itself GPLv3+ only), which (2) implicitly
gives David veto power over the bundled archiver.

The reason I see it as win-win is that I don't think there will be a
lot of contribution from the current Mailman core to David's archiver.
 There clearly is a lot of enthusiasm for something with social
networking features, and David's archiver doesn't look like a good
platform for that to me.  Eventually, the recommended (and bundled)
archiver will be something else.

> (I'm not proposing he stops releasing it under S-BSD, just that he
> re-licenses the copy in Mailman as GPL.

David doesn't need to do anything.  We just copy the code and release
it in Mailman under the GPLv3+ like the rest of Mailman.  That's just
a special case of the main reason for using a BSD license.

>?So he can continue to work on the
> code and release it under a permissive license, but Mailman can also use and
> distribute it. )

There's nothing stopping us from doing that, not even the possibility
of offending David.  That's *why* he uses BSD in the first place, so
we can do that if we want to.

But he won't do it for us.

From stephen at xemacs.org  Mon Apr  9 04:14:51 2012
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Mon, 9 Apr 2012 11:14:51 +0900
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <20120408174855.35a479ac@limelight.wooz.org>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
	<20120408174855.35a479ac@limelight.wooz.org>
Message-ID: <CAL_0O18rt=qnh113969tOrWfy=4ajjJExdcWuFkMBufvy9SU6g@mail.gmail.com>

On Mon, Apr 9, 2012 at 6:48 AM, Barry Warsaw <barry at list.org> wrote:
> On Apr 02, 2012, at 08:04 PM, David Jeske wrote:
> Probably the only way I'd change my mind about that is if RMS personally told
> us that we could still treat the non-copyleft donation the same way we treat
> all the other code, i.e. we can use the code and freely copy between them
> without any additional administrative overhead.

He won't do that, because it's not possible.  You cannot freely copy
from a copyleft code base into a non-copyleft code base; you must
indenture the latter.

What we can do is branch the code, and freely copy back-and-forth
between Mailman core and the code we got from the non-copyleft code
base.

The potential costs of that I point out in another message, so don't
reply to this one. :-)

From stephen at xemacs.org  Mon Apr  9 04:18:49 2012
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Mon, 9 Apr 2012 11:18:49 +0900
Subject: [Mailman-Developers] From the creation of a ThreadID
In-Reply-To: <20120408182900.7251d339@limelight.wooz.org>
References: <1333633288.23207.26.camel@ambre.pingoured.fr>
	<4F7E0EB6.7030905@zone12.com>
	<20120408133816.29cb74b4@limelight.wooz.org>
	<5C7FAC2C-67D1-4C23-B104-F2963750E6E5@NFSNet.org>
	<20120408182900.7251d339@limelight.wooz.org>
Message-ID: <CAL_0O1_taJKaRqLHSX3gEC+A2NsCWCAAbjWj2YriNmqht-Xxpg@mail.gmail.com>

On Mon, Apr 9, 2012 at 7:29 AM, Barry Warsaw <barry at list.org> wrote:
> On Apr 08, 2012, at 01:11 PM, Richard Wackerbarth wrote:
>
>>I would propose a slightly different scheme for converting messages to stable
>>URIs..
>>
>>If we create our ID by concatenation of some hash and a part of the date,
>>then the mail server need remember only those messages that fall in the same
>>date-sensitive part of the namespace. It can "forget" about ancient history.
>
> We had a very lengthy discussion about the hash a year or so ago, when the
> current algorithm was agreed upon. ?I'm too swamped at the moment to dig up
> the links, but look for input from Jeff Breidenbach and Jeff Marshall.

I believe it's the thread including this message:

http://mail.python.org/pipermail/email-sig/2012-January/000883.html

I don't really see the point of not storing all the IDs, anyway.  A
million message IDs isn't even going to take up a gigabyte! (I think
it's reasonable to reject a 1000-byte Message-ID as an attack, don't
you?)  Anybody who's running an archive that receives unique messages
in mega-message units presumably has enough resources that they can
afford the odd gigabyte (heck, even in RAM ;-) even if not all the
messages are going to be stored in the archive due to expiration
policies or whatever.

From davidj at gmail.com  Mon Apr  9 06:37:46 2012
From: davidj at gmail.com (David Jeske)
Date: Sun, 8 Apr 2012 21:37:46 -0700
Subject: [Mailman-Developers] mailman / archive-ui / licensing questions
In-Reply-To: <20120408180158.4e205e28@limelight.wooz.org>
References: <CA+CP9O48F8FXr5ya4_z=g3jPujs=c=_30=9O0Np0iCpFjibt0w@mail.gmail.com>
	<CAL_0O18Hytf3yP5xcVm+Xp4yPzPM69akPf=fSMZUT0vn9u-vZw@mail.gmail.com>
	<CA+CP9O4LZuwPHATTUQdDo68c1h3MvtZoHjFJowmHfCdizcGmqQ@mail.gmail.com>
	<4F7A2304.5060408@zone12.com>
	<CA+CP9O5VTm-g1N_MK=XSoL_NqFoqCUpu2gfL-_76B9hBuEAJxw@mail.gmail.com>
	<20120403185822.GI11151@unaka.lan>
	<CAL_0O1-zd9C4YgcTLiB_SD5ivjVcgDTGvTfJ7n3J_Xhp83BeuA@mail.gmail.com>
	<4F7BE7F4.7070607@zone12.com>
	<CA+CP9O4t6GciMjQ6fg4Lv1__GypsNS7HzrKdbmJ+Q974Euu_wA@mail.gmail.com>
	<20120408180158.4e205e28@limelight.wooz.org>
Message-ID: <CA+CP9O6=xkcHZrdjppmDjRv=jgYpsccxnPw-F5iJtRC5F+rRaA@mail.gmail.com>

I think the last several messages covered whats-what pretty well.
Summarizing what already seems to have been reclarified a few times
excellently by others... ClearSilver List Archive is S-BSD, and will remain
so. That license allows you folks to wrap it in GPLv3 if you wish, but I
won't be doing so myself or assigning copyright as I don't wish those
restrictions to be enforced on my code.

I apologize that this license discussion has lasted as long as it has, as
I'm sure we'd all rather be talking about cool archiver UI code and
features. :)

The only remaining question I saw was Barry's here...

On Apr 8, 2012 3:02 PM, "Barry Warsaw" <barry at list.org> wrote:
> David, there's one thing that's not clear to me.  If
> you donated the code to GNU Mailman and
> we bundled it under our banner, would you continue
> to maintain, develop, and release it as a separate
> project?

If MM bundled (some version of) the code, wrapped it in GPLv3, and
maintained it, I don't anticipate I'd maintain, develop, and continue to
release a separate project. I'd merely keep my webpage up distributing the
S-BSD code-release.

If I did make changes, I'd distribute them as S-BSD patches to my S-BSD
code. However, seeing as CSLA hasn't changed in a decade, after I'm done
updating it, my contributions probably wouldn't change for a decade more.

By my view of this entire license and bundling discussion it seems like the
most practical possibilities are:

1) If MM really likes how CSLA ends up, you folks can adopt and GPLv3 the
code, effectively becoming the official maintainers of the project..
(accepting that the GPLv3 restrictions couldn't be enforced on the original
code, as it's also released S-BSD)

2) If MM likes how CSLA ends up, but would rather have me maintain it... I
can maintain it as a separate S-BSD project, and MM can point-to or
reference it as one of the external (yet easy to install) archiver options.

3) If MM doesn't like how CSLA ends up, then we can all have a good laugh
at how much time we spent in theoretical license discussions over something
that didn't matter.

Let's hope not #3. I'm going to have to work extra hard now to be sure that
doesn't happen. :)

I learned more about license nuances and general MM dev thoughts from this
thread that I expected, so thanks too everyone that replied and contributed!

From davidj at gmail.com  Mon Apr  9 06:52:34 2012
From: davidj at gmail.com (David Jeske)
Date: Sun, 8 Apr 2012 21:52:34 -0700
Subject: [Mailman-Developers] Integrating HyperKitty with Mailman3
In-Reply-To: <20120408123921.0ff6d14f@limelight.wooz.org>
References: <CAOb12VVP53Y1Aq8MffWbUAUkKgra5-BWuunvvQCaUSxk36qFKA@mail.gmail.com>
	<20120408123921.0ff6d14f@limelight.wooz.org>
Message-ID: <CA+CP9O7hKzn-aMJV5P3oHG=XQNkbrK-_-3qNpL-2ANbeDs8MBw@mail.gmail.com>

Are you expecting this direct python configuration import to actually "be"
an archiver, or simply to be a configuration shim to get data to an
archiver?

Python imports are not version-dependent (like C-shlibs are), so it seems
dubious to expect an external archiver to necessarily be compatible with
the same version of python that MM3 is. I know I've run into this problem
in the past, especially because of how much the python MIME message classes
changed over each python release (though hopefully they are more stable now)

On Apr 8, 2012 9:39 AM, "Barry Warsaw" <barry at list.org> wrote:
> -----snip snip-----
> [archiver.hyperkitty]
> class: python.path.to.hyperkitty.HyperKitty
> -----snip snip-----
>
> Of course, you'd probably want to `enable` it too.
>
> One tricky thing here is that the `class` value names a Python
dotted-module
> path, so the class must be importable.  Ensuring that the hyperkitty
module
> (and this is just a suggestion, YMMV) is importable by the core engine
may not
> be fully baked.  For now, just set $PYTHONPATH.

From richard at nfsnet.org  Mon Apr  9 14:10:45 2012
From: richard at nfsnet.org (Richard Wackerbarth)
Date: Mon, 9 Apr 2012 07:10:45 -0500
Subject: [Mailman-Developers] From the creation of a ThreadID
In-Reply-To: <CAL_0O1_taJKaRqLHSX3gEC+A2NsCWCAAbjWj2YriNmqht-Xxpg@mail.gmail.com>
References: <1333633288.23207.26.camel@ambre.pingoured.fr>
	<4F7E0EB6.7030905@zone12.com>
	<20120408133816.29cb74b4@limelight.wooz.org>
	<5C7FAC2C-67D1-4C23-B104-F2963750E6E5@NFSNet.org>
	<20120408182900.7251d339@limelight.wooz.org>
	<CAL_0O1_taJKaRqLHSX3gEC+A2NsCWCAAbjWj2YriNmqht-Xxpg@mail.gmail.com>
Message-ID: <E3EF2A35-E29F-45B3-A2AA-C8E6B7DACDEB@nfsnet.org>

On Apr 8, 2012, at 9:18 PM, Stephen J. Turnbull wrote:

> I don't really see the point of not storing all the IDs, anyway.

Not only does this require excessive resources, but it requires significant infrastructure for failure recovery.
(Think backups, journaling, etc.) That requirement may not be an issue for Google, but it is a significant additional burden for small operations, migrations, etc.

I support the concept of Stable URI. The concept of using a hash into a large namespace is probably adequate.
However, at a minimum, the URI SHOULD include an easily identifiable schema-revision indicator.
That way, if the present scheme is found lacking, we can, compatibly, switch to a new schema and a new namespace.

Further, by intentionally changing the namespace, based on time, it becomes reasonable to assure uniqueness in all but exceptional situations without requiring infinite perfect memory. Further, by switching namespaces, past faults in that memory become self-healing.

I think that migrations, alone, justify the use of a scheme that does not require infinite preservation of all past message IDs.

I would hope that the historical experience in the crypto world would convince you of the need to make provision for an unknown future. There, schemes that were thought to be "unbreakable" have been adopted and widely used. Only, well after that time, was it discovered that there was a flaw and a new scheme needed to be utilized.

The use of long-lived stable URIs needs to be prepared for that eventuality. Therefore, the URI must self-identify its namespace and the namespace must not be based solely on something that can outlive the use of the namespace.


From barry at list.org  Mon Apr  9 16:59:26 2012
From: barry at list.org (Barry Warsaw)
Date: Mon, 9 Apr 2012 10:59:26 -0400
Subject: [Mailman-Developers] Integrating HyperKitty with Mailman3
In-Reply-To: <CA+CP9O7hKzn-aMJV5P3oHG=XQNkbrK-_-3qNpL-2ANbeDs8MBw@mail.gmail.com>
References: <CAOb12VVP53Y1Aq8MffWbUAUkKgra5-BWuunvvQCaUSxk36qFKA@mail.gmail.com>
	<20120408123921.0ff6d14f@limelight.wooz.org>
	<CA+CP9O7hKzn-aMJV5P3oHG=XQNkbrK-_-3qNpL-2ANbeDs8MBw@mail.gmail.com>
Message-ID: <20120409105926.782f1e51@limelight.wooz.org>

On Apr 08, 2012, at 09:52 PM, David Jeske wrote:

>Are you expecting this direct python configuration import to actually "be"
>an archiver, or simply to be a configuration shim to get data to an
>archiver?

Whatever makes the most sense for that particular archiver.

The prototype archiver is so (purposely) dumb that it's implemented right in
process.

The Mail Archive shim just drops a copy of the message into the outgoing
queue, after it calculates the appropriate recipient address.  IOW, it sets
the message up to be forwarded to their service over SMTP.

The MHonArc shim just shells out to the appropriate command, piping the
message bytes to that command's stdin.

So I can turn this question around and ask, what's the best way to get
messages into ClearSilver?

>Python imports are not version-dependent (like C-shlibs are), so it seems
>dubious to expect an external archiver to necessarily be compatible with
>the same version of python that MM3 is. I know I've run into this problem
>in the past, especially because of how much the python MIME message classes
>changed over each python release (though hopefully they are more stable now)

Well, until email6 is released, Mailman 3 is ported to Python 3, and we can
all (finally) do email the right way in Python. :)

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120409/49aba1d1/attachment.pgp>

From barry at list.org  Mon Apr  9 17:16:23 2012
From: barry at list.org (Barry Warsaw)
Date: Mon, 9 Apr 2012 11:16:23 -0400
Subject: [Mailman-Developers] From the creation of a ThreadID
In-Reply-To: <CAL_0O1_taJKaRqLHSX3gEC+A2NsCWCAAbjWj2YriNmqht-Xxpg@mail.gmail.com>
References: <1333633288.23207.26.camel@ambre.pingoured.fr>
	<4F7E0EB6.7030905@zone12.com>
	<20120408133816.29cb74b4@limelight.wooz.org>
	<5C7FAC2C-67D1-4C23-B104-F2963750E6E5@NFSNet.org>
	<20120408182900.7251d339@limelight.wooz.org>
	<CAL_0O1_taJKaRqLHSX3gEC+A2NsCWCAAbjWj2YriNmqht-Xxpg@mail.gmail.com>
Message-ID: <20120409111623.330b3ef9@limelight.wooz.org>

On Apr 09, 2012, at 11:18 AM, Stephen J. Turnbull wrote:

>On Mon, Apr 9, 2012 at 7:29 AM, Barry Warsaw <barry at list.org> wrote:
>> We had a very lengthy discussion about the hash a year or so ago, when the
>> current algorithm was agreed upon. ?I'm too swamped at the moment to dig up
>> the links, but look for input from Jeff Breidenbach and Jeff Marshall.
>
>I believe it's the thread including this message:
>
>http://mail.python.org/pipermail/email-sig/2012-January/000883.html

Shockingly, it's even older than that.  I just did a quick perusal of the page
in the wiki which defines this.  Revision 18 dated 2008-07-03 is the first one
that contain the current description of the algorithm:

"X-Message-ID-Hash is calculated from the Base 32 encoded SHA 1 hash of the
Message-ID header. As with RFC 2822, the angle bracket delimiters are not
considered part of the Message-ID and MUST NOT contribute to the hash."

http://wiki.list.org/display/DEV/Stable+URLs

So yeah, it was a little more than "a year or so ago" :).

>I don't really see the point of not storing all the IDs, anyway.  A
>million message IDs isn't even going to take up a gigabyte! (I think
>it's reasonable to reject a 1000-byte Message-ID as an attack, don't
>you?)  Anybody who's running an archive that receives unique messages
>in mega-message units presumably has enough resources that they can
>afford the odd gigabyte (heck, even in RAM ;-) even if not all the
>messages are going to be stored in the archive due to expiration
>policies or whatever.

Agreed!  As I mentioned to Richard, it's not necessary that X-Message-ID-Hash
be used as the thread id, in part or in whole.  That's not its original
purpose.

The important thing is that any message in the archiver must be discoverable
given its Message-ID or X-Message-ID-Hash.

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120409/861a8378/attachment.pgp>

From barry at list.org  Mon Apr  9 17:28:53 2012
From: barry at list.org (Barry Warsaw)
Date: Mon, 9 Apr 2012 11:28:53 -0400
Subject: [Mailman-Developers] From the creation of a ThreadID
In-Reply-To: <E3EF2A35-E29F-45B3-A2AA-C8E6B7DACDEB@nfsnet.org>
References: <1333633288.23207.26.camel@ambre.pingoured.fr>
	<4F7E0EB6.7030905@zone12.com>
	<20120408133816.29cb74b4@limelight.wooz.org>
	<5C7FAC2C-67D1-4C23-B104-F2963750E6E5@NFSNet.org>
	<20120408182900.7251d339@limelight.wooz.org>
	<CAL_0O1_taJKaRqLHSX3gEC+A2NsCWCAAbjWj2YriNmqht-Xxpg@mail.gmail.com>
	<E3EF2A35-E29F-45B3-A2AA-C8E6B7DACDEB@nfsnet.org>
Message-ID: <20120409112853.04997e02@limelight.wooz.org>

On Apr 09, 2012, at 07:10 AM, Richard Wackerbarth wrote:

>I support the concept of Stable URI. The concept of using a hash into a large
>namespace is probably adequate.  However, at a minimum, the URI SHOULD
>include an easily identifiable schema-revision indicator.  That way, if the
>present scheme is found lacking, we can, compatibly, switch to a new schema
>and a new namespace.

Should we attempt to push the stable URI concept as an RFC?  Does anybody
(Murray perhaps) have the interest and time to do that?  I think the RFC would
be pretty simple.

Having an RFC would also be nice for getting rid of the X- prefix.

In any event, we can declare the algorithm on our current wiki page to be
version 1.0 of our stable URI definition.  Archiver search algorithms can
expose this version number in their URLs if they're so inclined.  E.g.:

http://mail.example.com/1.0/7GC2V6BEDVME27VQ34W7AXMFPA3H2YWW

I should probably also be able to find the message this way:

http://mail.example.com/search?message-id=%3C20120409152339.16496.75486%40foo.example.org%3E

and probably

http://mail.example.com/search?strict=1&message-id=20120409152339.16496.75486%40foo.example.org

and maybe others.

-Barry

From stephen at xemacs.org  Mon Apr  9 19:43:16 2012
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 10 Apr 2012 02:43:16 +0900
Subject: [Mailman-Developers] From the creation of a ThreadID
In-Reply-To: <20120409112853.04997e02@limelight.wooz.org>
References: <1333633288.23207.26.camel@ambre.pingoured.fr>
	<4F7E0EB6.7030905@zone12.com>
	<20120408133816.29cb74b4@limelight.wooz.org>
	<5C7FAC2C-67D1-4C23-B104-F2963750E6E5@NFSNet.org>
	<20120408182900.7251d339@limelight.wooz.org>
	<CAL_0O1_taJKaRqLHSX3gEC+A2NsCWCAAbjWj2YriNmqht-Xxpg@mail.gmail.com>
	<E3EF2A35-E29F-45B3-A2AA-C8E6B7DACDEB@nfsnet.org>
	<20120409112853.04997e02@limelight.wooz.org>
Message-ID: <CAL_0O1-uELXWoOFoA3x28=FxWD3=z3gWv5JRV=4kLY09M3mtqQ@mail.gmail.com>

On Tue, Apr 10, 2012 at 12:28 AM, Barry Warsaw <barry at list.org> wrote:
> On Apr 09, 2012, at 07:10 AM, Richard Wackerbarth wrote:
>
>>I support the concept of Stable URI.
>
> Should we attempt to push the stable URI concept as an RFC? ?Does anybody
> (Murray perhaps) have the interest and time to do that? ?I think the RFC would
> be pretty simple.

I don't think we have sufficient agreement on how to implement yet.

> Having an RFC would also be nice for getting rid of the X- prefix.

AIUI, the X- prefix is now considered a bad idea for public protocols
in any case.  I don't think we need an RFC for it until we're pretty
sure we have it right.

> In any event, we can declare the algorithm on our current wiki page to be
> version 1.0 of our stable URI definition. ?Archiver search algorithms can
> expose this version number in their URLs if they're so inclined.

IMHO, our stable URIs should work on any of the servers we might
connect to to retrieve the message.  In terms of best current
practice, Gmane has offered stable URLs for about a decade now:

    http://msgid.gmane.org/20120409152339.16496.75486 at foo.example.org

To put it on the wire to Gmane, just URL-encode the message-id and be
done with it.  IMO, the ideal would be just like netnews:

    list-archive://mailman-developers.python.org/20120409152339.16496.75486 at foo.example.org

The List-ID is not entirely redundant due to cross-posting.

In this scheme, it's up to the MUA to decide which archive(s) to query
for this, just as with netnews looking for a newsgroup.  I really
don't see why the stable URI would want to be anything else.

So the scheme on the wiki seems overengineered to me, with the
possible exception of the "industrial-strength message IDs are too
long for the footer" problem.  But

http://mail.example.com/1.0/7GC2V6BEDVME27VQ34W7AXMFPA3H2YWW

is really too long for a footer too; what we want are tinyurls.  So I
think that footer URLs should be considered a different problem from
the stable URI problem.

From richard at nfsnet.org  Mon Apr  9 22:35:08 2012
From: richard at nfsnet.org (Richard Wackerbarth)
Date: Mon, 9 Apr 2012 15:35:08 -0500
Subject: [Mailman-Developers] [Bug 965532] [NEW] Need a script to
	upgrade from MM2 to MM3
In-Reply-To: <20120408174323.2395.52774.launchpad@gac.canonical.com>
References: <20120326174611.5464.44688.malonedeb@chaenomeles.canonical.com>
	<20120408174323.2395.52774.launchpad@gac.canonical.com>
Message-ID: <D55936E9-2282-4281-84CE-CDFDA0352315@nfsnet.org>

On Apr 8, 2012, at 12:43 PM, Launchpad Bug Tracker wrote:

> Barry Warsaw (barry) has assigned this bug to you for GNU Mailman:

> Need a script to upgrade from MM2 to MM3
> https://bugs.launchpad.net/bugs/965532

Here are some thoughts on a possible migration technique.
I would request discussion and suggestions.

In particular, what about the idea of converting the configuration file to HTML as an intermediate file format?
Selectable css could easily render it as a viewable report. It could still be edited by hand without too much difficulty.

Richard "Wacky" Wackerbarth
- - - - - - 

Steps to migrate from MM2 to MM3

1) Manually install MM3. Hook it up to the MTA, UI, and Archiver. This should include testing to assure that things are ready to create new lists.

2) Translate list configurations

  a) Use TOOL1 to extract the set of list configurations from MM2. Pipe this to TOOL2 which generates a tree of MM2 configurations. That tree hierarchy would be Root-->World-->Site-->Domain-->List-->Subscriber. TOOL2 would populate configurations at the List level. It might also reformat selected parameters. In particular, various <option type="radio" > entries might be transformed into enumerations such as "Yes"/"No" or "Hidden"/"Private"/"Public" rather than numerical values. This would enhance readability.

  b) TOOL3 would populate the World level with the MM2 defaults and recursively promote common values up the tree, leaving only those entries which would need to override their parent to derive the current value. Values which match the parent would be flagged. (The inheritance flag should be tri-state. "Differs from parent", "Same as parent", "Inherited from parent")

  c) At this point, the user might edit some of the configurations and rerun TOOL3 adjusting the inheritance flag as appropriate.

  d) Now, we begin translation to MM3 configuration options. For each MM3 option, TOOL4 computes the equivalent value from the MM2 values. Each computed value also gets the corresponding inheritance flag. Values that cannot be computed from the available information get the "Inherited from parent" flag. MM2 values used in computations are marked as "translated".
  e) After a chance to edit the MM3 configurations, TOOL5 would recompute inheritance flags, report any MM2 values that have not been translated and produce a copy of the configuration file simplified by removing all inherited entries.

  f) After a final inspection TOOL7 would actually import the configurations, committing entries to the MM3 database.

3) For the migration of rosters, we should be able to do it one subscription at a time through a pipeline that permits pre- and post- hooks.  A --dry-run option would be appropriate.

  a) We can assume that each email address is a distinct person.  The subscribers can utilize the UI to merge email addresses into a common persona.

  b) We can also assume that each subscription overrides its parent in the Persona-->EMailAddress-->Subscription hierarchy. The individual users can use the UI to consolidate their selections.

Some additional thoughts:

	All of the tools should be written in Python, hopefully in a dialect that is common to all of the versions supported by MM3.
	TOOL1 already exists (`bin/export.py`). TOOL2 can discard the roster nodes as they come in. Similarly, in step 3, we can use TOOL1 and discard the list configuration information.
	TOOL2 can reformat the XML as HTML, thus making the input data into a viewable report. The inheritance flag would become a class attribute on the option. Would it make sense to go a step further and generate html forms and run a trivial http server on localhost? It might be easier to do this in django, but I think that requiring that level of installation is probably too much for the current situation.


From barry at list.org  Tue Apr 10 22:46:15 2012
From: barry at list.org (Barry Warsaw)
Date: Tue, 10 Apr 2012 16:46:15 -0400
Subject: [Mailman-Developers] [Bug 965532] [NEW] Need a script to
 upgrade from MM2 to MM3
In-Reply-To: <D55936E9-2282-4281-84CE-CDFDA0352315@nfsnet.org>
References: <20120326174611.5464.44688.malonedeb@chaenomeles.canonical.com>
	<20120408174323.2395.52774.launchpad@gac.canonical.com>
	<D55936E9-2282-4281-84CE-CDFDA0352315@nfsnet.org>
Message-ID: <20120410164615.4cb839cc@rivendell>

On Apr 09, 2012, at 03:35 PM, Richard Wackerbarth wrote:

>On Apr 8, 2012, at 12:43 PM, Launchpad Bug Tracker wrote:
>
>> Barry Warsaw (barry) has assigned this bug to you for GNU Mailman:
>
>> Need a script to upgrade from MM2 to MM3
>> https://bugs.launchpad.net/bugs/965532
>
>Here are some thoughts on a possible migration technique.
>I would request discussion and suggestions.
>
>In particular, what about the idea of converting the configuration file to
>HTML as an intermediate file format?  Selectable css could easily render it
>as a viewable report. It could still be edited by hand without too much
>difficulty.

It's an interesting idea.  As you observed, mm2 can export to XML, so it's not
such a big stretch.

w>Steps to migrate from MM2 to MM3
>
>1) Manually install MM3. Hook it up to the MTA, UI, and Archiver. This should
>include testing to assure that things are ready to create new lists.

Right, and it should be doable even while mm2 is still functional.

>2) Translate list configurations
>
>  a) Use TOOL1 to extract the set of list configurations from MM2. Pipe this
>  to TOOL2 which generates a tree of MM2 configurations. That tree hierarchy
>  would be Root-->World-->Site-->Domain-->List-->Subscriber. TOOL2 would
>  populate configurations at the List level. It might also reformat selected
>  parameters. In particular, various <option type="radio" > entries might be
>  transformed into enumerations such as "Yes"/"No" or
>  "Hidden"/"Private"/"Public" rather than numerical values. This would
>  enhance readability.
>
>  b) TOOL3 would populate the World level with the MM2 defaults and
>  recursively promote common values up the tree, leaving only those entries
>  which would need to override their parent to derive the current
>  value. Values which match the parent would be flagged. (The inheritance
>  flag should be tri-state. "Differs from parent", "Same as parent",
>  "Inherited from parent")
>
>  c) At this point, the user might edit some of the configurations and rerun
>  TOOL3 adjusting the inheritance flag as appropriate.

I think the trickiest part will be what to do about subscriber information.
In mm2, this is always list-centric, but in mm3, you need to collate and
globalize all the membership information into the user database.  You can
probably do the same kind of up-promotion there, but it would be from
member->address->user.  IOW, if you see an address subscribed to a mailing
list with the same values across all those lists, put the preferences in the
user.  What happens if you see anne at example.com subscribed to three different
lists with three different passwords?  That's a tough one because there's no
way to express that in mm3 (nor probably should there be).

So I think you will occasionally have to just resolve some conflicts by
flipping a coin.  In the case of passwords, perhaps you'd always make the user
do a password reset.

>  d) Now, we begin translation to MM3 configuration options. For each MM3
>  option, TOOL4 computes the equivalent value from the MM2 values. Each
>  computed value also gets the corresponding inheritance flag. Values that
>  cannot be computed from the available information get the "Inherited from
>  parent" flag. MM2 values used in computations are marked as "translated".

The user herself could probably write a script for this pipeline you're
proposing, that would allow her to do bulk transformations of configuration
variable.

>  e) After a chance to edit the MM3 configurations, TOOL5 would recompute
>  inheritance flags, report any MM2 values that have not been translated and
>  produce a copy of the configuration file simplified by removing all
>  inherited entries.
>
>  f) After a final inspection TOOL7 would actually import the configurations,
>  committing entries to the MM3 database.
>
>3) For the migration of rosters, we should be able to do it one subscription
>at a time through a pipeline that permits pre- and post- hooks.  A --dry-run
>option would be appropriate.

Almost definitely.  The --dry-run step which would produce an output of those
conflicts, and impossible situations.  The user would then have a chance to
re-edit the intermediate file so that the values can be better mapped to mm3.
It's probably worth doing for both the rosters and list configurations.

>  a) We can assume that each email address is a distinct person.  The
>  subscribers can utilize the UI to merge email addresses into a common
>  persona.

We'll probably need a "claim and merge" operation in the system.  And a way to
purge unclaimed addresses after a while.

>  b) We can also assume that each subscription overrides its parent in the
>  Persona-->EMailAddress-->Subscription hierarchy. The individual users can
>  use the UI to consolidate their selections.

A good challenge for the Postoriusians :)

> All of the tools should be written in Python, hopefully in a dialect that is
> common to all of the versions supported by MM3.

+1.  Today that would be 2.6 and 2.7.

> TOOL1 already exists (`bin/export.py`). TOOL2 can discard the roster nodes
> as they come in. Similarly, in step 3, we can use TOOL1 and discard the list
> configuration information.

> TOOL2 can reformat the XML as HTML, thus making the input data into a
> viewable report. The inheritance flag would become a class attribute on the
> option. Would it make sense to go a step further and generate html forms and
> run a trivial http server on localhost? It might be easier to do this in
> django, but I think that requiring that level of installation is probably
> too much for the current situation.

It wouldn't be hard to do in standard Python.  OTOH, I'm not sure we want to
maintain a stack of html templates, forms, and form processing in the
conversion tool.

It's sounding like this suite of conversion tools may need to be a separate
sub-project.

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120410/d30948b6/attachment.pgp>

From davidj at gmail.com  Wed Apr 11 04:32:59 2012
From: davidj at gmail.com (David Jeske)
Date: Tue, 10 Apr 2012 19:32:59 -0700
Subject: [Mailman-Developers] Integrating HyperKitty with Mailman3
In-Reply-To: <20120409105926.782f1e51@limelight.wooz.org>
References: <CAOb12VVP53Y1Aq8MffWbUAUkKgra5-BWuunvvQCaUSxk36qFKA@mail.gmail.com>
	<20120408123921.0ff6d14f@limelight.wooz.org>
	<CA+CP9O7hKzn-aMJV5P3oHG=XQNkbrK-_-3qNpL-2ANbeDs8MBw@mail.gmail.com>
	<20120409105926.782f1e51@limelight.wooz.org>
Message-ID: <CA+CP9O4WScy_mmsjRCaj8W=gdz3NjVCmr7EaNuHKQbRgRwmOgQ@mail.gmail.com>

On Monday, April 9, 2012, Barry Warsaw wrote:

> So I can turn this question around and ask, what's the best way to get
> messages into ClearSilver?
>

Drop it into a Maildir, so that the version of Python we use for CSLA isn't
locked to the version of Python for MM3.

Perhaps the MM3 CSLA "handler" can do things like manage the maildir,
start/stop CSLA, etc. I'll take a look.


> >Python imports are not version-dependent (like C-shlibs are), so it seems
> >dubious to expect an external archiver to necessarily be compatible with
> >the same version of python that MM3 is. I know I've run into this problem
> >in the past, especially because of how much the python MIME message
> classes
> >changed over each python release (though hopefully they are more stable
> now)
>
> Well, until email6 is released, Mailman 3 is ported to Python 3, and we can
> all (finally) do email the right way in Python. :)


 I'll believe Python will stabilize the email APIs when I see it. :) A
decade ago we resorted to copying all the email handling classes out of the
python dist and into our own code so we could keep using the ones that we
depended on while upgrading python.

From richard at nfsnet.org  Thu Apr 12 17:19:27 2012
From: richard at nfsnet.org (Richard Wackerbarth)
Date: Thu, 12 Apr 2012 10:19:27 -0500
Subject: [Mailman-Developers] Python style
In-Reply-To: <20120411120310.01ab9144@rivendell>
References: <FAEB900A-FAB0-4858-9577-BCC93506F30A@nfsnet.org>
	<20120411120310.01ab9144@rivendell>
Message-ID: <47394D9A-418E-45A0-84F9-AEC57FE33B7C@nfsnet.org>

Just an update on my parsing.

?? Your introspection idea works very well as a substitute for a case statement in start/end Element.
It should allow me to produce readable code for the handling.

An added benefit is that, for each tag, I will be able to physically group the code for the startElement operation with the code for the endElement operation. Doing so makes it easier to see the pairing of beginning and ending side effects.

?? Having looked further into the new "print" and ".format", I think that I can use it once I get over my historical bias.
However, I still have the mental image of pipelining characters down a stream. :) Perhaps I can get over that.

On Apr 11, 2012, at 11:03 AM, Barry Warsaw wrote:
> whitespace: Use 4 space indents.  Never use tabs.

One of my biggest complaints about python -- using the amount/kind of whitespace to define the block structure.

I can say the same thing about "make" ---

>> MM3.UseEnglish = MM2.EnglishRequested or MM2.DefaultsToEnglish

> There can definitely be a value in writing these kinds of DSLs (domain
> specific languages) but I think in this case, writing idiomatic Python is
> going to be the best route. 

>> but it would be less complex if I can accomplish the goal without doing so.
> 
> Agreed.

?? I'm still looking for suggested syntax for "readable" transform rules.

?? After some thought about locating common definitions, etc., I have some thoughts. I'll put them in a new thread (or two or three) on -developers

Richard

From richard at NFSNet.org  Wed Apr 18 21:22:25 2012
From: richard at NFSNet.org (Richard Wackerbarth)
Date: Wed, 18 Apr 2012 14:22:25 -0500
Subject: [Mailman-Developers] [Bug 985149] [NEW] Add List-Post value to
	permalink hash input
In-Reply-To: <20120418185331.8553.99441.malonedeb@soybean.canonical.com>
References: <20120418185331.8553.99441.malonedeb@soybean.canonical.com>
	<20120418185331.8553.99441.malonedeb@soybean.canonical.com>
Message-ID: <1F9FB131-C86E-4551-B162-0345A0A07AD5@NFSNet.org>

Barry,

I definitely agree that "Now's the time".

I don't understand the proposal. By "added to this hash", do you mean "included in the set of elements that get hashed" or do you mean "appended to the hash value"?

Presumedly, the sole purpose in publishing an algorithm to create the hash is to make it possible for two handlers to independently develop the same hash given only the message. Otherwise, a "secret" method could be used to assign a unique identifier to the message.

In either case, this suggested change renews my argument that the resulting hash should be tagged, visibly, with a "protocol revision designator". Omitting that designation transforms the chosen calculation method into a "secret".

Richard

On Apr 18, 2012, at 1:53 PM, Barry Warsaw wrote:

> Public bug reported:
> 
> Currently, we define the X-Message-ID-Hash as the base32 encoding of the
> sha1 hash of the Message-ID content (sans angle brackets as defined in
> RFC 5322).  The suggestion is made that List-Post value should be added
> to this hash so as to be able to distinguish cross-posted messages.
> 
> This should be fine, and pretty easy.  My only concern is that the
> header name is now a misnomer.
> 
> I wonder, is it worth coming up with a better header?  Now's the time to
> do it since it's likely that there are almost no consumers of this
> standard.
> 
> What about `Permalink-Hash` ?
> 
> ** Affects: mailman
>     Importance: High
>         Status: Confirmed
> 
> 
> ** Tags: mailman3


From barry at python.org  Wed Apr 18 21:58:37 2012
From: barry at python.org (Barry Warsaw)
Date: Wed, 18 Apr 2012 15:58:37 -0400
Subject: [Mailman-Developers] [Bug 985149] [NEW] Add List-Post value to
	permalink hash input
In-Reply-To: <1F9FB131-C86E-4551-B162-0345A0A07AD5@NFSNet.org>
References: <20120418185331.8553.99441.malonedeb@soybean.canonical.com>
	<1F9FB131-C86E-4551-B162-0345A0A07AD5@NFSNet.org>
Message-ID: <20120418155837.6cff2e2e@resist.wooz.org>

On Apr 18, 2012, at 07:22 PM, Richard Wackerbarth wrote:

>I don't understand the proposal. By "added to this hash", do you mean
>"included in the set of elements that get hashed" or do you mean
>"appended to the hash value"?

I mean "append (or prepend, we have to decide ;) to the hash input.

Specifically.  Let's say you have this message snippet:

    List-Post: foo.example.com
    Message-ID: <bar>

under the current algorithm is:

    >>> from base64 import b32encode
    >>> from hashlib import sha1
    >>> s = sha1('bar')
    >>> b32encode(s.digest())
    'MLG3OAQP7EQOLKTEFQ6UAZUVBXI7AH2N'
    
but after the elaboration suggested in this bug would be:

    >>> s = sha1('bar')
    >>> s.update('foo.example.com')
    >>> b32encode(s.digest())
    'P67IMDMX6CRPP3TXX26OMJEOX2DDK6WN'

>Presumedly, the sole purpose in publishing an algorithm to create the
>hash is to make it possible for two handlers to independently develop
>the same hash given only the message. Otherwise, a "secret" method could
>be used to assign a unique identifier to the message.

Exactly.

>In either case, this suggested change renews my argument that the
>resulting hash should be tagged, visibly, with a "protocol revision
>designator". Omitting that designation transforms the chosen calculation
>method into a "secret".

The way to do that is probably to use a parameter on the header, e.g.

    Permalink-Hash: P67IMDMX6CRPP3TXX26OMJEOX2DDK6WN; version=1

From barry at list.org  Wed Apr 18 22:03:32 2012
From: barry at list.org (Barry Warsaw)
Date: Wed, 18 Apr 2012 16:03:32 -0400
Subject: [Mailman-Developers] [Bug 985149] [NEW] Add List-Post value to
 permalink hash input
In-Reply-To: <1F9FB131-C86E-4551-B162-0345A0A07AD5@NFSNet.org>
References: <20120418185331.8553.99441.malonedeb@soybean.canonical.com>
	<20120418185331.8553.99441.malonedeb@soybean.canonical.com>
	<1F9FB131-C86E-4551-B162-0345A0A07AD5@NFSNet.org>
Message-ID: <20120418160332.47a0ff90@resist.wooz.org>

On Apr 18, 2012, at 02:22 PM, Richard Wackerbarth wrote:

>I definitely agree that "Now's the time".

Full response in the bug, but tl;dr:

 - Proposal is to append the List-Post value as input to the hash, after
   the Message-ID value (sans angle brackets).

 - Add version=1 as a parameter to the header value, whatever we decide that
   will be (assuming we all agree that with this elaboration X-Message-ID-Hash
   is a misnomer).

https://bugs.launchpad.net/mailman/+bug/985149

Cheers,
-Barry


From stephen at xemacs.org  Thu Apr 19 03:30:28 2012
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 19 Apr 2012 10:30:28 +0900
Subject: [Mailman-Developers] [Bug 985149] [NEW] Add List-Post value to
 permalink hash input
In-Reply-To: <20120418160332.47a0ff90@resist.wooz.org>
References: <20120418185331.8553.99441.malonedeb@soybean.canonical.com>
	<1F9FB131-C86E-4551-B162-0345A0A07AD5@NFSNet.org>
	<20120418160332.47a0ff90@resist.wooz.org>
Message-ID: <CAL_0O19X8Tz8kLs7nbb_fEK1pAJ8GfCPYMfPymYyWkEc0UHhTw@mail.gmail.com>

On Thu, Apr 19, 2012 at 5:03 AM, Barry Warsaw <barry at list.org> wrote:

> ?- Proposal is to append the List-Post value as input to the hash, after
> ? the Message-ID value (sans angle brackets).

First, List-POST, not List-ID?  List-Post is not permanent!

Second, that order is wrong IMHO; the idea of the hash is to identify
the message in a fixed-length format.  If you want to qualify it with
list information, why not add the list identifier to the *output* of
the hash?  Now you have a well-defined[1] format that (1) allows you
to distinguish cross-posted instances of the same message *and* (2)
identify cross-posted instances of the same message, depending on your
application.

Yoroshiku,
Steve

[1] I haven't read the List-ID RFC recently, but I think its format is
quite restricted and likely to be of reasonable length.  I don't see
why Mailman can't require a List-ID for every list.

From richard at NFSNet.org  Thu Apr 19 17:32:42 2012
From: richard at NFSNet.org (Richard Wackerbarth)
Date: Thu, 19 Apr 2012 10:32:42 -0500
Subject: [Mailman-Developers] Technical Discussion extracted from private
	messages
References: <70978107-79C1-4083-9507-57D4F98B4E23@NFSNet.org>
Message-ID: <990A18D9-7584-4D59-9B97-274F92B18468@NFSNet.org>

Here's a summary of a discussion that unfortunately got sparked offline, and belatedly is being moved to Mailman Developers.  The basic point is that we need to assure that the structure of MM3 is such that it provides appropriate APIs through which proposed extensions could be added.

I am aware that the issue of API design bears on several of the proposals being considered for GSoC inclusion, so active contributions to collective wisdom would be very much appreciated!

N.B.: In reposting these comments, in the interest of brevity, where possible, I have deleted quotations contained in the replies.

On Apr 17, 2012, at 7:47 AM, Pierre-Yves Chibon wrote:

> I am having some question about the NNTP, how will this co-exist with the archiver?

> If I read this correctly, it will give access to the archives so that it can be used within a mail client. To do so, I will need to access the emails stored somewhere, but it is also my understanding that the emails are stored in the archiver. 

> Do we have duplication of information there? If so, shouldn't we build the email part of the archiver on the top of the NNTP bits? This would avoid having to stored the complete email archives on another system/ in another way. We could then just store, tags, categories and this sort of things only in the archiver, leaving the emails/threads part to be retrieved from the NNTP.

On Wed, Apr 18, 2012 at 6:24 AM, Richard Wackerbarth replied:
> As I read the proposal, there are two possible uses for NNTP.

> 

> One would be as a list distribution mechanism providing delivery using the NNTP protocol in place of SMTP.

> In a similar manner, we might have RSS feeds, etc.

> 
> The other use would be, as Pierre-Yves indicated, as a mechanism to retrieve messages from an archive.


And, on Apr 18, 2012, at 1:12 AM, Stephen J. Turnbull responded:
> This is somewhat incoherent, as NNTP is inherently a pull mechanism

> ("store and serve" vs. SMTP's "store and forward").  Any "delivery"

> that is done will be to an archive (by whatever name) that waits for

> users to pull from it; it may as well provide interfaces for

> conventional web access (ie as HTML pages), RSS/Atom feeds, NNTP, and

> anything anybody dreams up (for example, a request-by-email

> mechanism).

> 

> So I don't see how the implementation would end up being different from:

<retrieval from an archive>

Richard's thread continued:
> If we are going down this path as a part of MM


Stephen interjected:
> -1, except to the extent that we define APIs to the message store to be used by such modules.


> rather than the archiver, then it might be appropriate to split the archiver into a "message store" and a UI to that store. In that context, an NNTP interface might provide an alternate UI.


Stephen responded:
> That's precisely how I see this, and maildir will do for a start on

> the message store.  However, we need to hide the fact that we have a

> maildir from the NNTP module and Hyperkitty etc.  They just need to

> know how to retrieve messages and summary results for list summary

> screens (and maybe a search interface - NNTP doesn't need that, but

> Hyperkitty will, and I can imagine IMAP etc wanting it).


An additional part of the thread contained:

On Apr 17, 2012, at 7:10 AM, Richard Wackerbarth wrote:
> I do see some blurring between the MM core and other components.

> 

> For example, <...> seems to relate to accessing documents stored in the system. I view that as a function related to the archiver and not the core.

> 

> Similarly, going down the "user profile" road <...>, perhaps we should split out a "persona" component just as we have split out the archiver and the UI. This component would handle login/authentication, etc. as a service for both the core and for the UI. Personal profiles, could then be a plugin that "subscribes" to message events using a mechanism that could also drive archivers, NNTP feeds, etc.


In an analysis of how various components might be organized and interact within a defined structure, on Apr 17, 2012, at 4:24 PM, Richard Wackerbarth wrote:
> In any case, we should assume that something of this nature <an alternate UI> will want to be "plugged in" in the (near?) future. As such, we need to provide a framework as a standard interface.

> 

> I view a plug-in as having four components:

> 1) Some kind of information storage.

> 2) An interface to the MM core. This would receive notifications of events, and might include the retreival and/or injection of messages.

> 3) A configuration interface. This should modify the Protorious interface in much the same way as django models get registered and included in the django-admin.

> 4) Custom user displays. These should integrate back into the django template structure so that a consistent, site-customized website presentation is available.

> 

> I think that we should be developing this framework NOW so that various GSoC projects, and others do not access information in an ad hoc manner.


To which, in his Apr 18, 2012, at 1:12 AM reply , Stephen J. Turnbull added:
> +1 I like your analysis.

> 

> Agreed; I had already begun to feel that NNTP will have to coordinate with hyperkitty and any other archive UIs.


We now solicit additional observations, suggestions, and comments.

Richard & Stephen


From sm at resistor.net  Fri Apr 20 04:02:18 2012
From: sm at resistor.net (SM)
Date: Thu, 19 Apr 2012 19:02:18 -0700
Subject: [Mailman-Developers] [Bug 985149] [NEW] Add List-Post value to
 permalink hash input
In-Reply-To: <20120418160332.47a0ff90@resist.wooz.org>
References: <20120418185331.8553.99441.malonedeb@soybean.canonical.com>
	<20120418185331.8553.99441.malonedeb@soybean.canonical.com>
	<1F9FB131-C86E-4551-B162-0345A0A07AD5@NFSNet.org>
	<20120418160332.47a0ff90@resist.wooz.org>
Message-ID: <6.2.5.6.2.20120419185425.0b437680@resistor.net>

At 13:03 18-04-2012, Barry Warsaw wrote:
>https://bugs.launchpad.net/mailman/+bug/985149

The List-ID: can be assumed to be unique across different mailing 
lists.  There's a corner case though.

Regards,
-sm 


From barry at list.org  Fri Apr 20 18:22:14 2012
From: barry at list.org (Barry Warsaw)
Date: Fri, 20 Apr 2012 12:22:14 -0400
Subject: [Mailman-Developers] [Bug 985149] [NEW] Add List-Post value to
 permalink hash input
In-Reply-To: <CAL_0O19X8Tz8kLs7nbb_fEK1pAJ8GfCPYMfPymYyWkEc0UHhTw@mail.gmail.com>
References: <20120418185331.8553.99441.malonedeb@soybean.canonical.com>
	<1F9FB131-C86E-4551-B162-0345A0A07AD5@NFSNet.org>
	<20120418160332.47a0ff90@resist.wooz.org>
	<CAL_0O19X8Tz8kLs7nbb_fEK1pAJ8GfCPYMfPymYyWkEc0UHhTw@mail.gmail.com>
Message-ID: <20120420122214.77e56e06@resist.wooz.org>

On Apr 19, 2012, at 10:30 AM, Stephen J. Turnbull wrote:

>On Thu, Apr 19, 2012 at 5:03 AM, Barry Warsaw <barry at list.org> wrote:
>
>> ?- Proposal is to append the List-Post value as input to the hash, after
>> ? the Message-ID value (sans angle brackets).
>
>First, List-POST, not List-ID?  List-Post is not permanent!

Sorry, yes I definitely meant List-ID.

>Second, that order is wrong IMHO; the idea of the hash is to identify
>the message in a fixed-length format.  If you want to qualify it with
>list information, why not add the list identifier to the *output* of
>the hash?  Now you have a well-defined[1] format that (1) allows you
>to distinguish cross-posted instances of the same message *and* (2)
>identify cross-posted instances of the same message, depending on your
>application.

I think the hash value should be opaque.  Jeff can perhaps elaborate his
use-case but I don't think the List-ID needs to be (or frankly *should* be)
extractable from the hash, but instead just needs to inform the hash value.
IOW, if you cross-post a message with Message-ID: <foo> to one at example.org and
two at example.com, you'd get two different messages forwarded to the archives,
and they would have different Permalink: hash values.  Before this proposal,
they'd have the same value.

Of course, the List-ID itself should be preserved in the message that the
archiver gets, so an archiver could still discriminate on that.

>[1] I haven't read the List-ID RFC recently, but I think its format is
>quite restricted and likely to be of reasonable length.  I don't see
>why Mailman can't require a List-ID for every list.

Mailman always adds a List-ID header.  RFC 2919 describes it.  TL;DR:

List-ID: <listname.dom.ain>

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120420/88ec0185/attachment.pgp>

From barry at list.org  Fri Apr 20 18:22:55 2012
From: barry at list.org (Barry Warsaw)
Date: Fri, 20 Apr 2012 12:22:55 -0400
Subject: [Mailman-Developers] [Bug 985149] [NEW] Add List-Post value to
 permalink hash input
In-Reply-To: <6.2.5.6.2.20120419185425.0b437680@resistor.net>
References: <20120418185331.8553.99441.malonedeb@soybean.canonical.com>
	<20120418185331.8553.99441.malonedeb@soybean.canonical.com>
	<1F9FB131-C86E-4551-B162-0345A0A07AD5@NFSNet.org>
	<20120418160332.47a0ff90@resist.wooz.org>
	<6.2.5.6.2.20120419185425.0b437680@resistor.net>
Message-ID: <20120420122255.6d5415aa@resist.wooz.org>

On Apr 19, 2012, at 07:02 PM, SM wrote:

>The List-ID: can be assumed to be unique across different mailing lists.
>There's a corner case though.

What's the corner case?

Cheers,
-Barry

From richard at NFSNet.org  Fri Apr 20 19:09:32 2012
From: richard at NFSNet.org (Richard Wackerbarth)
Date: Fri, 20 Apr 2012 12:09:32 -0500
Subject: [Mailman-Developers] [Bug 985149] Add List-Post value to
	permalink hash input
References: <98DBF82D-37D6-4E9D-A70E-712F3DF3D963@NFSNet.org>
Message-ID: <3586C042-91F4-4E1C-B214-5B5589F27C8F@NFSNet.org>

On Apr 20, 2012, at 11:22 AM, Barry Warsaw wrote:

> On Apr 19, 2012, at 10:30 AM, Stephen J. Turnbull wrote:
> 
>> Second, that order is wrong IMHO; the idea of the hash is to identify
>> the message in a fixed-length format.  If you want to qualify it with
>> list information, why not add the list identifier to the *output* of
>> the hash?  Now you have a well-defined[1] format that (1) allows you
>> to distinguish cross-posted instances of the same message *and* (2)
>> identify cross-posted instances of the same message, depending on your
>> application.
> 
> I think the hash value should be opaque.  Jeff can perhaps elaborate his
> use-case but I don't think the List-ID needs to be (or frankly *should* be)
> extractable from the hash, but instead just needs to inform the hash value.
> IOW, if you cross-post a message with Message-ID: <foo> to one at example.org and
> two at example.com, you'd get two different messages forwarded to the archives,
> and they would have different Permalink: hash values.  Before this proposal,
> they'd have the same value.

I can see use cases for both.

As for the "opaque" hash, the order is not important. The order of the inputs is arbitrary. It needs to be fixed and published so that multiple encoders will derive the same hash as that generated by another encoder.

If the List ID made a visible part of the message identifier, then it is creating a separate namespace for each list. Here the order may have implications when viewed in the context of other uses.

Here, we might be wish to be able to have only one copy of the message in the archive and/or the distribution channels even when that message gets cross-posted to multiple lists.

The one thing that does need to be visible is the designation of the revision of the hashing algorithm. Otherwise, without that visible indicator, there is no way to recreate a "stable" value if a rehashing needs to be performed.

Richard


From sm at resistor.net  Fri Apr 20 19:30:16 2012
From: sm at resistor.net (SM)
Date: Fri, 20 Apr 2012 10:30:16 -0700
Subject: [Mailman-Developers] [Bug 985149] [NEW] Add List-Post value to
 permalink hash input
In-Reply-To: <20120420122255.6d5415aa@resist.wooz.org>
References: <20120418185331.8553.99441.malonedeb@soybean.canonical.com>
	<20120418185331.8553.99441.malonedeb@soybean.canonical.com>
	<1F9FB131-C86E-4551-B162-0345A0A07AD5@NFSNet.org>
	<20120418160332.47a0ff90@resist.wooz.org>
	<6.2.5.6.2.20120419185425.0b437680@resistor.net>
	<20120420122255.6d5415aa@resist.wooz.org>
Message-ID: <6.2.5.6.2.20120420102645.098221b8@resistor.net>

Hi Barry,
At 09:22 20-04-2012, Barry Warsaw wrote:
>What's the corner case?

The corner case is Nested lists.

Regards,
-sm 


From terri at zone12.com  Fri Apr 20 19:33:45 2012
From: terri at zone12.com (Terri Oda)
Date: Fri, 20 Apr 2012 11:33:45 -0600
Subject: [Mailman-Developers] [Bug 985149] Add List-Post value to
 permalink hash input
In-Reply-To: <3586C042-91F4-4E1C-B214-5B5589F27C8F@NFSNet.org>
References: <98DBF82D-37D6-4E9D-A70E-712F3DF3D963@NFSNet.org>
	<3586C042-91F4-4E1C-B214-5B5589F27C8F@NFSNet.org>
Message-ID: <4F919DF9.401@zone12.com>


On 12-04-20 11:09 AM, Richard Wackerbarth wrote:
> On Apr 20, 2012, at 11:22 AM, Barry Warsaw wrote:
>
>> I think the hash value should be opaque.  Jeff can perhaps elaborate his
>> use-case but I don't think the List-ID needs to be (or frankly *should* be)
>> extractable from the hash, but instead just needs to inform the hash value.
>> IOW, if you cross-post a message with Message-ID:<foo>  to one at example.org and
>> two at example.com, you'd get two different messages forwarded to the archives,
>> and they would have different Permalink: hash values.  Before this proposal,
>> they'd have the same value.
> Here, we might be wish to be able to have only one copy of the message in the archive and/or the distribution channels even when that message gets cross-posted to multiple lists.

I may be mis-remembering, but I believe one reason to put List-ID in the 
hash is in part to shorten URLs so that you can just have

http://example.com/archiver/$hash

Instead of the longer

http://example.com/archiver/listname.example.com/$hash

And still have the message appear in the appropriate list context (with 
next/prev links, etc.) when using the shorter URL because it will be a 
unique ID even if the message has been cross-posted.

A question, though: what if the list gets migrated to a new server and 
the list id changes (e.g. because the domain or hostname changes)?  I'm 
guessing we can handle it, but we should make sure there's a path for that.

  Terri

From barry at list.org  Fri Apr 20 19:43:45 2012
From: barry at list.org (Barry Warsaw)
Date: Fri, 20 Apr 2012 13:43:45 -0400
Subject: [Mailman-Developers] [Bug 985149] Add List-Post value to
 permalink hash input
In-Reply-To: <3586C042-91F4-4E1C-B214-5B5589F27C8F@NFSNet.org>
References: <98DBF82D-37D6-4E9D-A70E-712F3DF3D963@NFSNet.org>
	<3586C042-91F4-4E1C-B214-5B5589F27C8F@NFSNet.org>
Message-ID: <20120420134345.360b522f@resist.wooz.org>

On Apr 20, 2012, at 12:09 PM, Richard Wackerbarth wrote:

>As for the "opaque" hash, the order is not important. The order of the inputs
>is arbitrary. It needs to be fixed and published so that multiple encoders
>will derive the same hash as that generated by another encoder.

Right.  I've updated the description of bug 985149 to be explicit about the
proposal.  I like Permalink-Hash as the header name.

https://bugs.launchpad.net/mailman/+bug/985149

>If the List ID made a visible part of the message identifier, then it is
>creating a separate namespace for each list. Here the order may have
>implications when viewed in the context of other uses.
>
>Here, we might be wish to be able to have only one copy of the message in the
>archive and/or the distribution channels even when that message gets
>cross-posted to multiple lists.

Note that RFC 5064 defines the Archived-At header.  IMO, this would be the
appropriate place to add any list-specific namespace discriminator.  Also, RFC
2369 defines the List-Archive header, which could contain the base URL to the
archiver, including the List-ID information.

>The one thing that does need to be visible is the designation of the revision
>of the hashing algorithm. Otherwise, without that visible indicator, there is
>no way to recreate a "stable" value if a rehashing needs to be performed.

Yep, see the bug for details.  Below is an example in Python code.

Cheers,
-Barry

>>> from email import message_from_string as mfs
>>> msg = mfs("""\
... To: mylist at example.com
... Message-ID: <foo>
... 
... """)
>>> from hashlib import sha1
>>> from base64 import b32encode
>>> bare_msgid = msg['message-id'][1:-1]
>>> bare_msgid
'foo'
>>> msg['List-ID'] = '<mylist.example.com>'
>>> bare_listid = msg['list-id'][1:-1]
>>> bare_listid
'mylist.example.com'
>>> h = sha1(bare_msgid)
>>> h.update(bare_listid)
>>> permalink_hash = b32encode(h.digest())
>>> permalink_hash
'FW7VLQIZV3P6O64PL7OGLM5Y3RUBQZ4F'
>>> msg.add_header('Permalink-Hash', permalink_hash, version='1')
>>> msg['permalink-hash']
'FW7VLQIZV3P6O64PL7OGLM5Y3RUBQZ4F; version="1"'
>>> msg['List-Archive'] = 'http://list.example.com/{}'.format(bare_listid)
>>> msg['list-archive']
'http://list.example.com/mylist.example.com'
>>> msg['Archived-At'] = '{}/{}'.format(msg['list-archive'], permalink_hash)
>>> msg['archived-at']
'http://list.example.com/mylist.example.com/FW7VLQIZV3P6O64PL7OGLM5Y3RUBQZ4F'

From barry at list.org  Fri Apr 20 19:48:46 2012
From: barry at list.org (Barry Warsaw)
Date: Fri, 20 Apr 2012 13:48:46 -0400
Subject: [Mailman-Developers] [Bug 985149] Add List-Post value to
 permalink hash input
In-Reply-To: <4F919DF9.401@zone12.com>
References: <98DBF82D-37D6-4E9D-A70E-712F3DF3D963@NFSNet.org>
	<3586C042-91F4-4E1C-B214-5B5589F27C8F@NFSNet.org>
	<4F919DF9.401@zone12.com>
Message-ID: <20120420134846.56b31f2d@resist.wooz.org>

On Apr 20, 2012, at 11:33 AM, Terri Oda wrote:

>A question, though: what if the list gets migrated to a new server and the
>list id changes (e.g. because the domain or hostname changes)?  I'm guessing
>we can handle it, but we should make sure there's a path for that.

I have to read RFC 2919 more carefully, but it does have this to say:

4. Persistence of List Identifiers

   Although the list identifier MAY be changed by the mailing list
   administrator this is not desirable.  (Note that there is no disadvantage
   to changing the description portion of the List-Id header.)  A MUA may not
   recognize the change to the list identifier because the MUA SHOULD treat a
   different list identifier as a different list.  As such the mailing list
   administrator SHOULD avoid changing the list identifier even when the host
   serving the list changes.  On the other hand, transitioning from an
   informal unmanaged-list-id-namespace to a domain namespace is an acceptable
   reason to change the list identifier.  Also if the focus of the list
   changes sufficiently the administrator may wish to retire the previous list
   and its associated identifier to start a new list reflecting the new focus.

So a migration would typically not change the List-ID.  If it does, then it's
considered a different mailing list after the migration.  Of course, this
wouldn't change any headers in messages already sent through the pre-migrated
list, nor the archiving of any such messages.  It would however change for any
subsequent messages sent through the migrated list.

Cheers,
-Barry


From barry at list.org  Fri Apr 20 19:54:26 2012
From: barry at list.org (Barry Warsaw)
Date: Fri, 20 Apr 2012 13:54:26 -0400
Subject: [Mailman-Developers] [Bug 985149] Add List-Post value to
 permalink hash input
In-Reply-To: <20120420134345.360b522f@resist.wooz.org>
References: <98DBF82D-37D6-4E9D-A70E-712F3DF3D963@NFSNet.org>
	<3586C042-91F4-4E1C-B214-5B5589F27C8F@NFSNet.org>
	<20120420134345.360b522f@resist.wooz.org>
Message-ID: <20120420135426.191298a3@resist.wooz.org>

On Apr 20, 2012, at 01:43 PM, Barry Warsaw wrote:

>Note that RFC 5064 defines the Archived-At header.  IMO, this would be the
>appropriate place to add any list-specific namespace discriminator.  Also, RFC
>2369 defines the List-Archive header, which could contain the base URL to the
>archiver, including the List-ID information.

Note that one problem with including the List-ID value in the hash is that if
you receive an off-list copy of the message, you may not be able to calculate
the hash to that message in the archive, because you will not have the List-ID
header in your copy.  You will still have the Message-ID.

It will *usually* be possible to calculate this, given a reasonable assumption
of the mapping from the list posting address (in the To field, remember you
also won't have the List-Post header!).  E.g. if you see:

    To: test at example.com

you can reasonably guess that List-ID will be <test.example.com>.  It may not
be though, or the list may have gotten migrated and given a different List-ID.

That was the beauty of the original algorithm; all you needed was the
Message-ID.

I don't think that's a fatal flaw to not include the List-ID in the hash,
since I think it will be rare in practice for List-ID to be incalculable from
the To header, but it's something to be aware of.

-Barry

From sm at resistor.net  Fri Apr 20 20:12:40 2012
From: sm at resistor.net (SM)
Date: Fri, 20 Apr 2012 11:12:40 -0700
Subject: [Mailman-Developers] [Bug 985149] Add List-Post value to
 permalink hash input
In-Reply-To: <4F919DF9.401@zone12.com>
References: <98DBF82D-37D6-4E9D-A70E-712F3DF3D963@NFSNet.org>
	<3586C042-91F4-4E1C-B214-5B5589F27C8F@NFSNet.org>
	<4F919DF9.401@zone12.com>
Message-ID: <6.2.5.6.2.20120420110144.09c08eb0@resistor.net>

At 10:33 20-04-2012, Terri Oda wrote:
>A question, though: what if the list gets migrated to a new server 
>and the list id changes (e.g. because the domain or hostname 
>changes)?  I'm guessing we can handle it, but we should make sure 
>there's a path for that.

   'While it is perfectly acceptable for a List Identifier to be
    completely independent of the domain name of the host machine
    servicing the mailing list, the owner of a mailing list MUST NOT
    generate List Identifiers in any domain name space for which they do
    not have authority.  For example, a mailing list hosting service may
    choose to assign List Identifiers in their own domain-based name
    space, or they may allow their clients (the list owners) to provide
    List Identifiers in a namespace for which the owner has authority.'

The List-ID: is not tied to the server host name.  To avoid migration 
pain, pick a domain name carefully.

Regards,
-sm 


From barry at list.org  Fri Apr 20 20:16:12 2012
From: barry at list.org (Barry Warsaw)
Date: Fri, 20 Apr 2012 14:16:12 -0400
Subject: [Mailman-Developers] [Bug 985149] Add List-Post value to
 permalink hash input
In-Reply-To: <6.2.5.6.2.20120420110144.09c08eb0@resistor.net>
References: <98DBF82D-37D6-4E9D-A70E-712F3DF3D963@NFSNet.org>
	<3586C042-91F4-4E1C-B214-5B5589F27C8F@NFSNet.org>
	<4F919DF9.401@zone12.com>
	<6.2.5.6.2.20120420110144.09c08eb0@resistor.net>
Message-ID: <20120420141612.1431bf93@resist.wooz.org>

On Apr 20, 2012, at 11:12 AM, SM wrote:

>   'While it is perfectly acceptable for a List Identifier to be
>    completely independent of the domain name of the host machine
>    servicing the mailing list, the owner of a mailing list MUST NOT
>    generate List Identifiers in any domain name space for which they do
>    not have authority.  For example, a mailing list hosting service may
>    choose to assign List Identifiers in their own domain-based name
>    space, or they may allow their clients (the list owners) to provide
>    List Identifiers in a namespace for which the owner has authority.'
>
>The List-ID: is not tied to the server host name.  To avoid migration pain,
>pick a domain name carefully.

mm3 currently does not provide a knob to customize the List-ID.  It wouldn't
be hard to do but not until after 3.0.

-Barry

From jeff at jab.org  Fri Apr 20 22:19:44 2012
From: jeff at jab.org (Jeff Breidenbach)
Date: Fri, 20 Apr 2012 13:19:44 -0700
Subject: [Mailman-Developers] [Bug 985149] Add List-Post value to
 permalink hash input
In-Reply-To: <20120420141612.1431bf93@resist.wooz.org>
References: <98DBF82D-37D6-4E9D-A70E-712F3DF3D963@NFSNet.org>
	<3586C042-91F4-4E1C-B214-5B5589F27C8F@NFSNet.org>
	<4F919DF9.401@zone12.com>
	<6.2.5.6.2.20120420110144.09c08eb0@resistor.net>
	<20120420141612.1431bf93@resist.wooz.org>
Message-ID: <CAHjiUbrMN=vG7r_C8Ax938kMnx3Rd+Bvq_YQBXgLvTL8jhO-aA@mail.gmail.com>

A couple quick practical notes:

1) Terri is exactly right. The reason for including list identity as
part of the hash calculation is for cross-posted messages. An
archiving service shows context. Here's the message AND the thread it
fits into, AND information about the list it travelled over AND the
ability to search that list further. Archives need to know the list to
provide context.

2) The reason mail-archive.com uses List-Post and not List-Id in the
calculation is because every list, RFC2369 compliant or not, has a
concept of a posting address. It is natural idea, easy to think of and
understand. Hence all mail-archive.com archives are keyed off of
posting address. It would be technical possible (but an architectural
pain) for mail-archive.com to calculate using List-Id. We'd probably
not bother and instead store whatever was calculated by mailman and
placed in the Archived-At: header. Okay, I'll admit my prejudice. I've
always found List-Id annoying, and wish that it didn't exist.

3) As long as things are changing, I want to mention that these URLs
feel too long. SHA-1 is a 160 bit hash consuming 32 URL characters. I
think trimming to a 64 bit (13 character) hash is plenty. According to
wikipedia collision tables, with the shorter hash we'd expect to get
our first collision after archiving 5 billion messages. That's 50X the
current corpus size of public archival services like GMane. And it
isn't like an occasional hash collision is a big deal or a security
problem. http://en.wikipedia.org/wiki/Birthday_attack

3b) For that matter, a sequence number would also do the trick, but I
can understand that this is much more dangerous; it is easy for a
sequence number to get reset and cause all hell to break loose.

4) I'm really not that picky. Our archival service could deal with all
sorts of URLs, including the ones Terri was trying to avoid, such as
http://example.com/archiver/listname.example.com/$hash
In fact, we've found that lots of small, per-list databases have speed
and reliability advantages over big global databases. But I also like
short URLs. Bottom line, please don't let these comments delay or
derail forward progress.

-Jeff

From stephen at xemacs.org  Sat Apr 21 03:19:24 2012
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 21 Apr 2012 10:19:24 +0900
Subject: [Mailman-Developers] [Bug 985149] [NEW] Add List-Post value to
 permalink hash input
In-Reply-To: <20120420122214.77e56e06@resist.wooz.org>
References: <20120418185331.8553.99441.malonedeb@soybean.canonical.com>
	<1F9FB131-C86E-4551-B162-0345A0A07AD5@NFSNet.org>
	<20120418160332.47a0ff90@resist.wooz.org>
	<CAL_0O19X8Tz8kLs7nbb_fEK1pAJ8GfCPYMfPymYyWkEc0UHhTw@mail.gmail.com>
	<20120420122214.77e56e06@resist.wooz.org>
Message-ID: <CAL_0O1_8aZyWgr+8QQ_yKha9gP6Ry-v1Pr1=Ob0k7q0Y82uy5Q@mail.gmail.com>

On Sat, Apr 21, 2012 at 1:22 AM, Barry Warsaw <barry at list.org> wrote:

> I think the hash value should be opaque. ?Jeff can perhaps elaborate his
> use-case but I don't think the List-ID needs to be (or frankly *should* be)
> extractable from the hash, but instead just needs to inform the hash value.
> IOW, if you cross-post a message with Message-ID: <foo> to one at example.org and
> two at example.com, you'd get two different messages forwarded to the archives,
> and they would have different Permalink: hash values. ?Before this proposal,
> they'd have the same value.

Which is a FAQ: how do I avoid getting two copies of the same message
from multiple lists I subscribe to?  If Mailman is maintaining a list
of messages received, with full personalization this FAQ now has an
acceptable answer.  If Mailman distinguishes the same message posted
to different lists in an opaque way, the answer is "we're sorry,
Mailman cannot do that by design."

Or do you see a way to do this that I don't?

> Of course, the List-ID itself should be preserved in the message that the
> archiver gets, so an archiver could still discriminate on that.

Not good enough, because the de-dupe db will store hashes AIUI.  If
the de-dupe db stores Message-IDs, then you have enough information.

From stephen at xemacs.org  Sat Apr 21 11:52:57 2012
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 21 Apr 2012 18:52:57 +0900
Subject: [Mailman-Developers] [Bug 985149] Add List-Post value to
 permalink hash input
In-Reply-To: <CAHjiUbrMN=vG7r_C8Ax938kMnx3Rd+Bvq_YQBXgLvTL8jhO-aA@mail.gmail.com>
References: <98DBF82D-37D6-4E9D-A70E-712F3DF3D963@NFSNet.org>
	<3586C042-91F4-4E1C-B214-5B5589F27C8F@NFSNet.org>
	<4F919DF9.401@zone12.com>
	<6.2.5.6.2.20120420110144.09c08eb0@resistor.net>
	<20120420141612.1431bf93@resist.wooz.org>
	<CAHjiUbrMN=vG7r_C8Ax938kMnx3Rd+Bvq_YQBXgLvTL8jhO-aA@mail.gmail.com>
Message-ID: <CAL_0O19esrSQZheYnzYxxFWY4niastK5-XT6UWQoY-d1viB1ZA@mail.gmail.com>

On Sat, Apr 21, 2012 at 5:19 AM, Jeff Breidenbach <jeff at jab.org> wrote:

> 2) The reason mail-archive.com uses List-Post and not List-Id in the
> calculation is because every list, RFC2369 compliant or not, has a
> concept of a posting address.

That would be fine, except that in my personal practice for some lists
List-Post changes predictably (once a year, but it is quite regular),
and the original List-Post is reused every year.  Pretty
idiosyncratic, yes, but if we have a List-Id, I think we should use it
in preference to List-Post.

I think it's true that you can use "whatever was calculated by mailman
and placed in the Archived-At: header."

> I've always found List-Id annoying,

You have a practical reason you can share?  (N.B. I have no quarrel if
you just say it's an obnoxious YAGNI or the like, I'm just curious.)

> 3) As long as things are changing, I want to mention that these URLs
> feel too long. SHA-1 is a 160 bit hash consuming 32 URL characters.

Agreed, these are long.  However, I don't really see why lists
shouldn't provide both a canonical URL in Archived-At and a tinyurl
for the message footer and user use.  (There might not be a permancy
guarantee for the tinyurl, though.)  More design and programming work
for us, true, but that's a one-off.

From jeff at jab.org  Mon Apr 23 00:31:47 2012
From: jeff at jab.org (Jeff Breidenbach)
Date: Sun, 22 Apr 2012 15:31:47 -0700
Subject: [Mailman-Developers] [Bug 985149] Add List-Post value to
 permalink hash input
Message-ID: <CAHjiUboZzYAE1dgApzfFNMqv_X+X6xA9E1bOc3F+mQ3qcTeeeQ@mail.gmail.com>

I find List-Id annoying because I like the world to be simple and easy to
understand. People who know nothing about RFCs natually consider the
posting address to be the canonical name of a mailing list. We should be
embracing that. Instead, RFC2369 introduces this entire alternate namespace
with List-Id, competing for attention, with its own weird rules like the
domain-control one quoted earlier in its thread. All this confusion, and
the main problem it tried to address isn't very important. Is it really
such a disaster for a list to be considered different if it hops to a new
domain? I don't think so, or there would be a lot more clamoring for
editable List-Id in mailman. Archival services certainly don't need it. It
smells like design by committee where everyone's pet feature for a rare use
case gets added in, without appreciating the benefits of small and simple
and less-stuff-is-better.

Regarding hashes, the whole point of a archival hash is to make a shorter,
human friendly URL. This is not very to implement; one can take the SHA-1
and truncate it. If we aren't worried about length then Message-Id is a
perfectly usable identifier. Certainly no need for a triumverate of short
hash, long hash, and message-id. Less is better.
On Apr 21, 2012 2:53 AM, "Stephen J. Turnbull" <stephen at xemacs.org> wrote:

From stephen at xemacs.org  Mon Apr 23 07:33:52 2012
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Mon, 23 Apr 2012 14:33:52 +0900
Subject: [Mailman-Developers] [Bug 985149] Add List-Post value to
 permalink hash input
In-Reply-To: <CAHjiUboZzYAE1dgApzfFNMqv_X+X6xA9E1bOc3F+mQ3qcTeeeQ@mail.gmail.com>
References: <CAHjiUboZzYAE1dgApzfFNMqv_X+X6xA9E1bOc3F+mQ3qcTeeeQ@mail.gmail.com>
Message-ID: <CAL_0O19pK022vAd7yNsF1LjiZK9W0W64P2gQUnEMDsu=1qN=Qg@mail.gmail.com>

On Mon, Apr 23, 2012 at 7:31 AM, Jeff Breidenbach <jeff at jab.org> wrote:

> I find List-Id annoying because I like the world to be simple and easy to
> understand.

What are you doing hanging out in e-mail circles, then? ;-)  It has to
be the most prominent example of a computing field where smtplicity
has led to a disaster, and it has never been easy to understand!

> People who know nothing about RFCs natually consider the
> posting address to be the canonical name of a mailing list.

Sure, and people who know nothing about modern physics naturally
consider space to be flat.  They're quite happy to use GPS devices
whose accuracy to within kilometers (let alone the few meters actually
achieved) depends on general relativity and calculations involving the
curvature of space, though.

> We should be embracing that.

Why?  Just because people who will never actually have to deal with
the problems created by lists that change their identities mid-stream
find it natural?

> Instead, RFC2369 introduces this entire alternate namespace
> with List-Id, competing for attention, with its own weird rules like the
> domain-control one quoted earlier in its thread.

I don't see any competition, to be honest.  List-Id is nice for
mechanically managing continuity, and the list's mailbox (*not* the
List-Post URL, which is less stable than the mailbox IME) is what most
humans use to name it.  Different use cases, different methods.

And there's nothing weird about that rule.  It's actually the same
rule that we use to identify users in many situations: control over a
name (for individuals, implemented as the ability to read mail at a
specific address).  Both give the user a public, Internet-wide unique
ID, which can be used as a component of UUIDs for other resources the
user controls.

> Is it really such a disaster for a list to be considered different if it hops to a new
> domain?

Disaster, no.  Pain in the ass for the administrator and users?  Yes,
in my experience.  The lists I administer have archives going back to
1996, at least, and two years ago we scoured the whole history for
hints to where certain former members might be found.  There were a
couple of historical cases where lists changed names, and since the
archives were organized by posting address, the people who were trying
to follow threads had problems picking them up (since many of them
weren't around to know the history).  That took up my time which could
have been put to better use.

> I don't think so, or there would be a lot more clamoring for
> editable List-Id in mailman.

Why would anyone want to edit List-Id?  It's not really for human
consumption!  I find it hard to believe that many people would want
anything but the original posting address with a dot substituted for
the @-sign as a List-Id.

> Archival services certainly don't need it.

Maybe yours doesn't.  I have several lists whose purposes have evolved
over time, and they haven't changed names and posting address to match
only because that would break threads in the archives.

I also subscribe to one list whose posting address has changed a
couple of times because of domain changes, but my own mail filters
just kept working because they were based on List-Id, not List-Post.
Since the archive host's domain also changed, everybody had to change
their bookmarks anyway, but if it's not necessary it would be nice to
be able to keep them.

> It smells like design by committee where everyone's pet feature
> for a rare use case gets added in, without appreciating the benefits
> of small and simple and less-stuff-is-better.

I dunno.  It seems to me that a lot of the lists where the users and
admins would not care about flexibility, machine-friendliness, and
continuity are also good candidates for moving to web forums in any
case.  (Yeah, I know that Barry wants to kill web forums; but if so,
those users are going to have to coexist with mine!)

In sum, while I only know my small corner of the world, I've had
several experiences where List-Id has been (or would have been) quite
useful, and I really don't understand yet why it's problematic for you
(except that for you it's a YAGNI, so you could have a somewhat
simpler life if it would go away).

> Regarding hashes, the whole point of a archival hash is to make a shorter,
> human friendly URL.

I don't find hashes (or most message IDs) to be human-friendly[1] at
all, and (if it's not going to be a tinyurl) I really only want the
line containing the URL to be less than 78 characters so I can be
pretty sure nothing is going to try to insert a linebreak.  I guess
we're just going to have to agree to disagree on a lot of things. ;-)

Bottom line: I don't have a problem with you having your preferences,
and if you "win" I can work around it, but I do have multiple use
cases for List-Id != List-Post that I raise for consideration by the
group.

[1] FVO "human" that includes a larger population than "geeks like
me"!  My MUA is set to display Message-Id and References by default!

From pingou at pingoured.fr  Mon Apr 23 20:17:16 2012
From: pingou at pingoured.fr (Pierre-Yves Chibon)
Date: Mon, 23 Apr 2012 20:17:16 +0200
Subject: [Mailman-Developers] Speaking about kitties (or archivers)
Message-ID: <1335205036.2188.29.camel@ambre.pingoured.fr>

Meeow miaou*

We spoke on IRC about the archiver the other day and I said that I
should present here my thoughts about it. So here they are (beware that
might be long).

First I think we should think about the structure/architecture of
things. We have a number of component which need to be archives aware,
without being exhaustive I'm thinking about:
- the archiver itself (which present the archive (ie: mails and threads)
- the NNTP bits which should be able to return emails and/or threads
- the stats module which want to give information to the user about the
health of the list itself (emails/month, last threads, biggest
threads...)
- archives retrieval (we probably want to give the user a way to
download the archives since the creation of the list/the last
year/month)

All of these components needs to be aware about the archives. We agreed
that the core does not want to know about it.

So we have several solutions:
- each module becomes an "archiver" wrt to core, meaning each module has
its own way to storing the archives (and eventually its own system to do
so)
- we create a archive-core module which manage the archives and provides
an API to access, modify, extend them.

Of course, we prefer the second solution :)
So we would have the following architecture:

  mm-core (handles the lists themselves) --send emails to archivers-->
archive-core (store the emails and expose them through an API) -->
archivers/stats/NNTP

The questions are then:
- how do we store the emails ?
- how do we expose the API ?
- how to make it such that it becomes easy to extend ? (ie: the stats
module wants to read the db, but probably also to store information on
it)

Having played with mongodb (HK relies on it atm), I quite like the
possibilities it gives us. We can easily store the emails in it, query
them and since it is a NoSQL database system extending it becomes also
easy.
On the other hand, having the archiver-core relying on the same system
as the core itself would be nicer from a sysadmin pov. I have not tried
to upload archives to a RDBMS and test its speed, but for mongodb the
results of the tests are presented at [1].

The challenge will be speed and designing an API which allow each
component to do its work.
I think it would be nice if we could reach some kind of agreement before
the GSoC starts (even if we change our mind later on) to be sure that if
we get students their work don't overlap too much.


The second point I want to present is with respect to the archiver
itself.
At the moment we have HyperKitty (HK), the current version:
- exposes single emails
- exposes single threads
- presents the archives for one month or day
- allows to search the archives using the sender, subject, content or
subject and content
- presents a summary of the recent activities on the list (including the
evolution of the number of post sent over the last month)

I think these are the basis functionality that we would like to see in
an archiver.
But HK aims at much more, the ultimate goal of HK is to provide a
"forum-like" interface to the mailing-lists, with it HK would provide a
number of option (social-web like) allowing to "like" or "dislike" a
post or a thread, allowing to "+1" someone, allowing to tag the mails or
assign them categories.
These are all nice feature but, imho, they go beyond what one would want
from a basic archiver.

So what I would like to propose is to split HK into a sub-project
(MiniKitty?) which would provide these basic functionality.

We would keep HyperKitty as a more extensive archiver and try to bring
HK to its ultimate goal. This will need some more work and time as we
will have to make HK speak with core for authentication, find a way to
send emails to core/the lists and of course add all the other features
(tags, categories...)


Comments welcome :)

Thanks,
Pierre


[1]
http://blog.pingoured.fr/index.php?post/2012/03/16/Mailman-archives-and-mongodb
* Hi everyone

From barry at list.org  Tue Apr 24 00:20:18 2012
From: barry at list.org (Barry Warsaw)
Date: Mon, 23 Apr 2012 18:20:18 -0400
Subject: [Mailman-Developers] Speaking about kitties (or archivers)
In-Reply-To: <1335205036.2188.29.camel@ambre.pingoured.fr>
References: <1335205036.2188.29.camel@ambre.pingoured.fr>
Message-ID: <20120423182018.50067886@resist.wooz.org>

Thanks for posting this Pierre-Yves!

On Apr 23, 2012, at 08:17 PM, Pierre-Yves Chibon wrote:

>  mm-core (handles the lists themselves) --send emails to archivers-->

Note that the core doesn't *have* to send an email to the archiver.  From the
core's perspective, the `IArchiver` interface has three functions:

 - add a message to the archive
 - get a 'permalink' to the message in the archive
 - get the url to the "top" of the list's archive

The important things are 1) calculating the 'permalink' should not require a
round-trip with the archiver; 2) the details of adding a message to the
archiver are irrelevant to the core.

For external archivers, such as M-A or Gmane, the implementation of IArchiver
may indeed send an email.  For a local archiver like MHonArch, the
implementation just shells out to a command.  For HK or anything else, it
could be anything.  Every archiver needs a way to get messages sent to it, and
the core can adapt to any of those.

>archive-core (store the emails and expose them through an API) -->
>archivers/stats/NNTP
>
>The questions are then:
>- how do we store the emails ?
>- how do we expose the API ?
>- how to make it such that it becomes easy to extend ? (ie: the stats
>module wants to read the db, but probably also to store information on
>it)

Sharing is good, but it's also important to remember that any specific system
may or may not have a local archiver.  I could certainly imagine a site that
only archives on M-A or Gmane and doesn't waste the space to archive locally.

I think we've pretty much come to agreement that the core itself doesn't need
a full copy of all the messages after it's sent them, but of course, the
"prototype" archiver could be used to keep a local copy of everything in a
maildir.  That could be shared at the lower level (maildir) or through some
kind of API in minikitty.

>Having played with mongodb (HK relies on it atm), I quite like the
>possibilities it gives us. We can easily store the emails in it, query
>them and since it is a NoSQL database system extending it becomes also
>easy.
>On the other hand, having the archiver-core relying on the same system
>as the core itself would be nicer from a sysadmin pov. I have not tried
>to upload archives to a RDBMS and test its speed, but for mongodb the
>results of the tests are presented at [1].
>
>The challenge will be speed and designing an API which allow each
>component to do its work.

I think the archiver should *definitely* have a REST API for programmatic
access to its messages and data.

>I think it would be nice if we could reach some kind of agreement before
>the GSoC starts (even if we change our mind later on) to be sure that if
>we get students their work don't overlap too much.
>
>
>The second point I want to present is with respect to the archiver
>itself.
>At the moment we have HyperKitty (HK), the current version:
>- exposes single emails
>- exposes single threads
>- presents the archives for one month or day
>- allows to search the archives using the sender, subject, content or
>subject and content
>- presents a summary of the recent activities on the list (including the
>evolution of the number of post sent over the last month)
>
>I think these are the basis functionality that we would like to see in
>an archiver.
>But HK aims at much more, the ultimate goal of HK is to provide a
>"forum-like" interface to the mailing-lists, with it HK would provide a
>number of option (social-web like) allowing to "like" or "dislike" a
>post or a thread, allowing to "+1" someone, allowing to tag the mails or
>assign them categories.
>These are all nice feature but, imho, they go beyond what one would want
>from a basic archiver.

I think it would be fine for a basic archiver to be essentially
feature-equivalent to Pipermail, with two caveats:

 * Truly stable URLs, so that when you regenerate the archive from the raw
   maildir, none of your links break.
 * Search.

Other than that, it's all gravy (as we say :).  Nice-to-have features like CSS
for customizing the look and feel, dynamic rendering of raw messages,
etc. would be cool, but IMHO of secondary importance.

Cheers,
-Barry

From barry at list.org  Tue Apr 24 01:31:15 2012
From: barry at list.org (Barry Warsaw)
Date: Mon, 23 Apr 2012 19:31:15 -0400
Subject: [Mailman-Developers] [Bug 985149] Add List-Post value to
 permalink hash input
In-Reply-To: <CAL_0O19pK022vAd7yNsF1LjiZK9W0W64P2gQUnEMDsu=1qN=Qg@mail.gmail.com>
References: <CAHjiUboZzYAE1dgApzfFNMqv_X+X6xA9E1bOc3F+mQ3qcTeeeQ@mail.gmail.com>
	<CAL_0O19pK022vAd7yNsF1LjiZK9W0W64P2gQUnEMDsu=1qN=Qg@mail.gmail.com>
Message-ID: <20120423193115.71c5e2ad@resist.wooz.org>

I want to step back for a moment and look at some fundamentals.  I'll reply to
other messages in this thread later, but in the context of my own thoughts
expressed here.

TL;DR: I'm going to propose we keep the hash algorithm to include only the
Message-ID as input.

0) Mailman core doesn't care what the algorithm is for calculating a permalink
   to the message in the archiver.  All it cares about is that this can be
   calculated using local information only, with no round-trip to the
   archiver.  Local information includes system configuration values, mailing
   list settings, and information available in the message being posted.

1) Multiple archivers can be enabled in any running Mailman core system, and
   these do not have to agree on the permalink calculation.  Every archiver is
   free to use whatever algorithm it wants.

2) RFC 2369 and RFC 5064 rule us.  This means that Mailman core will add a
   List-Archive header, which RFC 2369 defines as the "field describ[ing] how
   to access archives for the list".  This header does not point to a specific
   message in the archive, but instead to the list's archive as a whole.

   RFC 5064 defines the Archived-At header which "refer[s] to the archived
   form of a single message."  If you get a message from Mailman which
   contains an Archived-At header, you should be able to click on that to view
   the message in the archive.  It's this that I'm calling the 'permalink' to
   the message, and which must be calculated without round-tripping to the
   archiver.

So what is this hash thing and why do we need it?  Well, strictly speaking, we
don't.  If UltraKitty wants to define the permalink as the URL-encoded
Message-ID concatenate with the List-ID and Date, that's fine by Mailman.  As
long as item 0 above is satisfied, the core is happy.

Where I believe the hash is useful is by providing a more human-friendly
string for *a* permalink to the message.  It doesn't have to be the only one;
it's just a convenience that you could imagine me reading to you over the
phone or typing into an SMS with my stubby old bass player fingers.

If you think of an archiver in REST terms, the RFC 5064 header value is just
one location for the resource (i.e. the message you care about) in the
archiver.  That same resource can have many different addresses; maybe you can
look it up by raw Message-ID, URL-encoded Message-ID, permalink hash, or
whatever.  All roads lead to the same resource in the archiver.  The permalink
hash isn't required for any of this to work, it's purely a convenience.

It doesn't even matter if the permalink points to multiple resources.

Let's say a message gets cross-posted so that multiple copies of it show up at
the archiver with the same Message-ID.  The archiver can certainly treat these
as separate resources, living at different canonical locations in its resource
tree.  But it could *also* honor a permalink that is identical for both
messages.  If you think of this as a tiny url for a search query, that could
return multiple hits, each of which would be the different versions of the
cross-posted message.

I could imagine a better UI though.  Let's say this message got cross-posted:

    From: Anne Person <aperson at example.com>
    To: ant at example.org, bee at example.org
    Subject: Ants and Bees are best friends!
    Message-ID: <alpha>

    Why should we fight?  The mosquitoes are our common enemy!

Now, if all we use is the Message-ID to calculate the permalink, you might see
both messages delivered to both mailing lists with the following RFC 5064
header:

    Archived-At: http://lists.example.org/XZ3DGG4V37BZTTLXNUX4NABB4DNQHTCP

You click on that url and you're taken to a page which contains the archived
message for one of the mailing lists (it doesn't matter which one), but you
see a little extra link on the page:

    View cross-posted thread in [ant at example.org] or [bee at example.org]

and those two links take you to the separate messages, at their canonical
locations, in the thread appropriate for one or the other mailing list.

I'll note that in another message, Jeff advocates for using something shorter
than 32 bytes for the hash, and letting collisions just work themselves out.
Which frankly would be fine by me, but I see that as a very similar problem to
the cross-posting problem.  If the header said this instead:

    Archived-At: http://lists.example.org/4DNQHTCP

clicking on it might bring you to a page like this:

    Did you mean:
      * [Ants and Bees are best friends] in [ant at example.org]
      * [Ants and Bees are best friends] in [bee at example.org]
      * [Mosquitoes unite!] in [bloodsuckers at example.org]
      * [What about us dung beetles?] in [pests at example.org]

Notice I haven't advocated for a particular hash algorithm, because for my
purposes right here, it doesn't matter.  What I do feel strongly about is that
the input to that hash should only include information directly available in
the originally posted message, e.g. Message-ID.  That way, if I get a copy
from the mailing list, but you get a copy directly, we both have (almost[*])
all the information we need to calculate the same RFC 5064 URL.

Cheers,
-Barry

[*] The one missing piece for the off-list copy is the value of the
List-Archive header.  If you can't find that out any other way, you're
screwed, but my guess is that in practice, it will be easy to find that out.
Or you can just Google the permalink hash or Message-ID to find the message.

From barry at list.org  Tue Apr 24 01:38:41 2012
From: barry at list.org (Barry Warsaw)
Date: Mon, 23 Apr 2012 19:38:41 -0400
Subject: [Mailman-Developers] [Bug 985149] [NEW] Add List-Post value to
 permalink hash input
In-Reply-To: <CAL_0O1_8aZyWgr+8QQ_yKha9gP6Ry-v1Pr1=Ob0k7q0Y82uy5Q@mail.gmail.com>
References: <20120418185331.8553.99441.malonedeb@soybean.canonical.com>
	<1F9FB131-C86E-4551-B162-0345A0A07AD5@NFSNet.org>
	<20120418160332.47a0ff90@resist.wooz.org>
	<CAL_0O19X8Tz8kLs7nbb_fEK1pAJ8GfCPYMfPymYyWkEc0UHhTw@mail.gmail.com>
	<20120420122214.77e56e06@resist.wooz.org>
	<CAL_0O1_8aZyWgr+8QQ_yKha9gP6Ry-v1Pr1=Ob0k7q0Y82uy5Q@mail.gmail.com>
Message-ID: <20120423193841.3d0517e0@resist.wooz.org>

On Apr 21, 2012, at 10:19 AM, Stephen J. Turnbull wrote:

>On Sat, Apr 21, 2012 at 1:22 AM, Barry Warsaw <barry at list.org> wrote:
>
>> I think the hash value should be opaque. ?Jeff can perhaps elaborate his
>> use-case but I don't think the List-ID needs to be (or frankly *should* be)
>> extractable from the hash, but instead just needs to inform the hash value.
>> IOW, if you cross-post a message with Message-ID: <foo> to one at example.org and
>> two at example.com, you'd get two different messages forwarded to the archives,
>> and they would have different Permalink: hash values. ?Before this proposal,
>> they'd have the same value.
>
>Which is a FAQ: how do I avoid getting two copies of the same message
>from multiple lists I subscribe to?  If Mailman is maintaining a list
>of messages received, with full personalization this FAQ now has an
>acceptable answer.  If Mailman distinguishes the same message posted
>to different lists in an opaque way, the answer is "we're sorry,
>Mailman cannot do that by design."
>
>Or do you see a way to do this that I don't?

That's actually a separate question from what gets transmitted to the
archiver.

Mailman *could* de-dupe the rosters for any cross-posted messages to mailing
lists that it manages, but it would have to know how to prefer one mailing
list copy over another.  E.g. do you get the footers from one at example.org or
two at example.org?  mm3 current does not do this de-duping.

Regardless of what it delivers to actually list recipients, what would it do
when transmitting the message to the archiver?  There are a number of things
it could do, but right now, the archiver would get two messages with identical
Message-IDs.  In the implementation of IArchive for any particular archiver,
some persistent state could be managed and de-duping could happen there.  I
think it's not worth doing it there, but it wouldn't be infeasible.

>> Of course, the List-ID itself should be preserved in the message that the
>> archiver gets, so an archiver could still discriminate on that.
>
>Not good enough, because the de-dupe db will store hashes AIUI.  If
>the de-dupe db stores Message-IDs, then you have enough information.

I think the core's db will have to store Message-IDs.  It may also store the
hashes, or other information, but as we've determined, it won't need to store
the whole message contents.

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120423/f51104be/attachment-0001.pgp>

From barry at list.org  Tue Apr 24 01:53:22 2012
From: barry at list.org (Barry Warsaw)
Date: Mon, 23 Apr 2012 19:53:22 -0400
Subject: [Mailman-Developers] [Bug 985149] Add List-Post value to
 permalink hash input
In-Reply-To: <CAHjiUbrMN=vG7r_C8Ax938kMnx3Rd+Bvq_YQBXgLvTL8jhO-aA@mail.gmail.com>
References: <98DBF82D-37D6-4E9D-A70E-712F3DF3D963@NFSNet.org>
	<3586C042-91F4-4E1C-B214-5B5589F27C8F@NFSNet.org>
	<4F919DF9.401@zone12.com>
	<6.2.5.6.2.20120420110144.09c08eb0@resistor.net>
	<20120420141612.1431bf93@resist.wooz.org>
	<CAHjiUbrMN=vG7r_C8Ax938kMnx3Rd+Bvq_YQBXgLvTL8jhO-aA@mail.gmail.com>
Message-ID: <20120423195322.17c5bad9@resist.wooz.org>

On Apr 20, 2012, at 01:19 PM, Jeff Breidenbach wrote:

>1) Terri is exactly right. The reason for including list identity as
>part of the hash calculation is for cross-posted messages. An
>archiving service shows context. Here's the message AND the thread it
>fits into, AND information about the list it travelled over AND the
>ability to search that list further. Archives need to know the list to
>provide context.

Agreed, but I think you'll get all that information anyway, without it being
expressed in the hash.  You'll get a full copy of the posted message, so
you'll get the Message-ID, To header (i.e. the posting address), List-Post (if
there is one), List-ID, etc.

>2) The reason mail-archive.com uses List-Post and not List-Id in the
>calculation is because every list, RFC2369 compliant or not, has a
>concept of a posting address. It is natural idea, easy to think of and
>understand. Hence all mail-archive.com archives are keyed off of
>posting address. It would be technical possible (but an architectural
>pain) for mail-archive.com to calculate using List-Id. We'd probably
>not bother and instead store whatever was calculated by mailman and
>placed in the Archived-At: header. Okay, I'll admit my prejudice. I've
>always found List-Id annoying, and wish that it didn't exist.

Note that the message you receive may not have a useful List-Post header at
all!  From RFC 2369:

3.4. List-Post

   The List-Post field describes the method for posting to the list.
   This is typically the address of the list, but MAY be a moderator, or
   potentially some other form of submission. For the special case of a
   list that does not allow posting (e.g., an announcements list), the
   List-Post field may contain the special value "NO".

(I think neither mm2 nor mm3 does this right.  See LP: #987563)

>3) As long as things are changing, I want to mention that these URLs
>feel too long. SHA-1 is a 160 bit hash consuming 32 URL characters. I
>think trimming to a 64 bit (13 character) hash is plenty. According to
>wikipedia collision tables, with the shorter hash we'd expect to get
>our first collision after archiving 5 billion messages. That's 50X the
>current corpus size of public archival services like GMane. And it
>isn't like an occasional hash collision is a big deal or a security
>problem. http://en.wikipedia.org/wiki/Birthday_attack

Let's say we take the lower 80 bits of the SHA1.  After base32 encoding, that
leaves us with 16 bytes.  Of course, we could also use the full 160 bit SHA1
hash, and take only the lower X number of bytes after the base32 encoding.
I'm all in favor of a shorter URL, but someone with better Maths-Fu will have
to propose a specific algorithm that adequately trades off collisions for
human-friendliness.  Also, note the implications of increased collisions on
the whole argument, which I brought up in my previous message.

>3b) For that matter, a sequence number would also do the trick, but I
>can understand that this is much more dangerous; it is easy for a
>sequence number to get reset and cause all hell to break loose.

It would also be nearly impossible to preserve the zeroth principle, that
Mailman and the archiver can agree on the permalink for a message with no
communication between them.

>4) I'm really not that picky. Our archival service could deal with all
>sorts of URLs, including the ones Terri was trying to avoid, such as
>http://example.com/archiver/listname.example.com/$hash
>In fact, we've found that lots of small, per-list databases have speed
>and reliability advantages over big global databases. But I also like
>short URLs. Bottom line, please don't let these comments delay or
>derail forward progress.

No worries!  We'll hash (pun intended ;) this out in plenty of time before 3.0
final.  With Richard's suggestion of a version number, we could even roll out
updates in future versions, although it would probably be more of a PITA for
you by then, than us. :)

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120423/3a52860b/attachment.pgp>

From barry at list.org  Tue Apr 24 02:02:22 2012
From: barry at list.org (Barry Warsaw)
Date: Mon, 23 Apr 2012 20:02:22 -0400
Subject: [Mailman-Developers] [Bug 985149] Add List-Post value to
 permalink hash input
In-Reply-To: <CAHjiUboZzYAE1dgApzfFNMqv_X+X6xA9E1bOc3F+mQ3qcTeeeQ@mail.gmail.com>
References: <CAHjiUboZzYAE1dgApzfFNMqv_X+X6xA9E1bOc3F+mQ3qcTeeeQ@mail.gmail.com>
Message-ID: <20120423200222.1a157d25@resist.wooz.org>

On Apr 22, 2012, at 03:31 PM, Jeff Breidenbach wrote:

>I find List-Id annoying because I like the world to be simple and easy to
>understand. People who know nothing about RFCs natually consider the
>posting address to be the canonical name of a mailing list. We should be
>embracing that.

In theory, I agree, which is why you'll see the posting address used as the
primary key into the mailinglist table in the database.  You'll almost always
see that in the To: header, although some situations might cause mismatches,
e.g. acceptable aliases.  According to the language in RFC 2369, this also
(only!) may show up in the List-Post header for the from-list copy of the
message, but you might find the List-Post has nothing useful, e.g. "NO" (with
some random goo of trailing comment).  List-ID will always be included in the
from-list copy.

>Instead, RFC2369 introduces this entire alternate namespace with List-Id,
>competing for attention, with its own weird rules like the domain-control one
>quoted earlier in its thread. All this confusion, and the main problem it
>tried to address isn't very important. Is it really such a disaster for a
>list to be considered different if it hops to a new domain? I don't think so,
>or there would be a lot more clamoring for editable List-Id in
>mailman. Archival services certainly don't need it. It smells like design by
>committee where everyone's pet feature for a rare use case gets added in,
>without appreciating the benefits of small and simple and
>less-stuff-is-better.
>
>Regarding hashes, the whole point of a archival hash is to make a shorter,
>human friendly URL. This is not very to implement; one can take the SHA-1
>and truncate it. If we aren't worried about length then Message-Id is a
>perfectly usable identifier.

Well, Message-ID also fails the "human friendly" part, e.g. from your message:

Message-ID:
 <CAHjiUboZzYAE1dgApzfFNMqv_X+X6xA9E1bOc3F+mQ3qcTeeeQ at mail.gmail.com>

My fingers already hurt typing that into my spell-correcting smartphone. :)

FWIW, Base 32 was deliberately chosen because it's case insensitive, and
allows for optional mapping of commonly mistaken substitutions (e.g. zero for
oh, and one for eye or el).  IOW, it's much more forgiving of the human part
of the process.

>Certainly no need for a triumverate of short hash, long hash, and
>message-id. Less is better.

Agreed.  Is 32 bytes too long?  Is 4 bytes too short?  What's an acceptable
trade-off between collision likelihood and human convenience?

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120423/43bc223f/attachment.pgp>

From barry at list.org  Tue Apr 24 02:08:57 2012
From: barry at list.org (Barry Warsaw)
Date: Mon, 23 Apr 2012 20:08:57 -0400
Subject: [Mailman-Developers] [Bug 985149] Add List-Post value to
 permalink hash input
In-Reply-To: <CAL_0O19pK022vAd7yNsF1LjiZK9W0W64P2gQUnEMDsu=1qN=Qg@mail.gmail.com>
References: <CAHjiUboZzYAE1dgApzfFNMqv_X+X6xA9E1bOc3F+mQ3qcTeeeQ@mail.gmail.com>
	<CAL_0O19pK022vAd7yNsF1LjiZK9W0W64P2gQUnEMDsu=1qN=Qg@mail.gmail.com>
Message-ID: <20120423200857.74dea68b@resist.wooz.org>

On Apr 23, 2012, at 02:33 PM, Stephen J. Turnbull wrote:

>What are you doing hanging out in e-mail circles, then? ;-)  It has to
>be the most prominent example of a computing field where smtplicity
>has led to a disaster, and it has never been easy to understand!

For some reason, I have Ren and Smtpy on the brain today. :)

>I dunno.  It seems to me that a lot of the lists where the users and
>admins would not care about flexibility, machine-friendliness, and
>continuity are also good candidates for moving to web forums in any
>case.  (Yeah, I know that Barry wants to kill web forums; but if so,
>those users are going to have to coexist with mine!)

Well, Paul Graham made me want to kill email, forums, archives, newsgroups,
and IMAP.  I don't know how to do it, but be on notice you old <wink> media,
we're gunnin' for ya.

>I don't find hashes (or most message IDs) to be human-friendly[1] at
>all, and (if it's not going to be a tinyurl) I really only want the
>line containing the URL to be less than 78 characters so I can be
>pretty sure nothing is going to try to insert a linebreak.  I guess
>we're just going to have to agree to disagree on a lot of things. ;-)

FWIW, a tinyurl would be very cool, but I don't know how to calculate those
without state, and state probably requires communication between the archiver
and Mailman.

>Bottom line: I don't have a problem with you having your preferences,
>and if you "win" I can work around it, but I do have multiple use
>cases for List-Id != List-Post that I raise for consideration by the
>group.
>
>[1] FVO "human" that includes a larger population than "geeks like
>me"!  My MUA is set to display Message-Id and References by default!

Mine too. :)

-Barry

From jeff at jab.org  Tue Apr 24 06:40:35 2012
From: jeff at jab.org (Jeff Breidenbach)
Date: Mon, 23 Apr 2012 21:40:35 -0700
Subject: [Mailman-Developers] [Bug 985149] Add List-Post value to
 permalink hash input
In-Reply-To: <20120423193115.71c5e2ad@resist.wooz.org>
References: <CAHjiUboZzYAE1dgApzfFNMqv_X+X6xA9E1bOc3F+mQ3qcTeeeQ@mail.gmail.com>
	<CAL_0O19pK022vAd7yNsF1LjiZK9W0W64P2gQUnEMDsu=1qN=Qg@mail.gmail.com>
	<20120423193115.71c5e2ad@resist.wooz.org>
Message-ID: <CAHjiUbq47ERsR2My9UoN1aZ0vY1JJ9EU=+UszU0kT6Z16=BORg@mail.gmail.com>

I didn't mean to be presmtpuous. I think you are right that user
interfaces can do a good job with crossposts. Here's a great example
from GMane.

   http://mid.gmane.org/20120323220013.0b1c88a8 at resist.wooz.org

>32 bytes too long?

Thirty-two characters means 50% likely to have a single collision once
the archival database hits approximately 1.4 septillion messages.

>Is 4 bytes too short?

Four characters is only about a million combinations. First collision
is 50% likely at 1200 messages, and multi-million message databases
are completely screwed.

Bottom line: how big a database do we expect to have, and amongst
those messages, how many collisions are considered acceptable?

-Jeff

PS. These numbers assume a well balanced hash. This paper suggests
SHA-1 is pretty good in non-adversarial situations, but I'm not an
expert.  http://cseweb.ucsd.edu/~mihir/papers/balance.html

From stephen at xemacs.org  Tue Apr 24 06:41:19 2012
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 24 Apr 2012 13:41:19 +0900
Subject: [Mailman-Developers] Speaking about kitties (or archivers)
In-Reply-To: <20120423182018.50067886@resist.wooz.org>
References: <1335205036.2188.29.camel@ambre.pingoured.fr>
	<20120423182018.50067886@resist.wooz.org>
Message-ID: <CAL_0O1_npaKy5QRmhX8hhJqKw_Cnvkvty=Gh_zvDcUyFHdZvBw@mail.gmail.com>

On Tue, Apr 24, 2012 at 7:20 AM, Barry Warsaw <barry at list.org> wrote:
> Thanks for posting this Pierre-Yves!
>
> On Apr 23, 2012, at 08:17 PM, Pierre-Yves Chibon wrote:
>
>> ?mm-core (handles the lists themselves) --send emails to archivers-->
>
> Note that the core doesn't *have* to send an email to the archiver. ?From the
> core's perspective, the `IArchiver` interface has three functions:
>
> ?- add a message to the archive
> ?- get a 'permalink' to the message in the archive
> ?- get the url to the "top" of the list's archive

Maybe it would be better to call that the archive's "index",
"directory", or "table of contents".  The archive may not be
hierarchically organized.

> The important things are 1) calculating the 'permalink' should not require a
> round-trip with the archiver; 2) the details of adding a message to the
> archiver are irrelevant to the core.

Yes, yes, Yes, YES, yesyesyesyesyes!  I mean, FTW. ;-)

>>The questions are then:
>>- how do we store the emails ?

That's not an appropriate question.  The archive backend will decide
that, and will provide an IArchive function that can be registered
with the core and with front ends.

It would be nice if an IArchive-compatible archive provided a way for
new frontends to discover it, but I guess that's kinda bootstrappy --
if we have that, then why don't we just serve the results over that
channel?

>>- how do we expose the API ?
>>- how to make it such that it becomes easy to extend ? (ie: the stats
>>module wants to read the db, but probably also to store information on
>>it)

No storing, please.  The stats module can keep its own db if it wants
to, and should be using on-line algorithms in any case so the expense
of hitting the archive should be minimal.

> I think we've pretty much come to agreement that the core itself doesn't need
> a full copy of all the messages after it's sent them, but of course, the
> "prototype" archiver could be used to keep a local copy of everything in a
> maildir. ?That could be shared at the lower level (maildir) or through some
> kind of API in minikitty.

I don't like the idea of having a "minikitty".  As is probably
apparent (and I apologize for that, my opinion there is really
irrelevant) I am not a fan of turning ML archives into a social
network.  However, I think Pingou and other HyperKitty worker should
just do whatever it is they want to do, and do it right.  If you
really want a solid base set of functionality and only then
extensions, maybe a plugin architecture would be the way to go.  Or
you can specify and implement that base first, then add the
extensions.  (But the mockup already sports UI for the extensions!)

But if that's not really what you want to do, Clearsilver provides a
perfectly good base set for us, and I'll be happy to maintain the
GPL3-ed distro-in-the-Mailman-distro if that's how it needs to be.
*You* do what makes *you* happy.

>>On the other hand, having the archiver-core relying on the same system
>>as the core itself would be nicer from a sysadmin pov.

IMHO, premature optimization.  Among other things, there isn't going
to be a "the" archiver-core.  Mailman should provide "a"
archiver-core, and I think it should be based on maildir (which is
apparently Barry's intuition, too).  Theory and implementation of
maildir are simple and robust, and that allows us to concentrate on
the archiver interface.

>>The challenge will be speed

IMHO, Mailman should not take responsibility for speed of any archiver
backend distributed with Mailman.  It just needs to provide a robust
storage, and the two points Barry mentions above.

From stephen at xemacs.org  Tue Apr 24 09:41:41 2012
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 24 Apr 2012 16:41:41 +0900
Subject: [Mailman-Developers] [Bug 985149] Add List-Post value to
 permalink hash input
In-Reply-To: <CAHjiUbq47ERsR2My9UoN1aZ0vY1JJ9EU=+UszU0kT6Z16=BORg@mail.gmail.com>
References: <CAHjiUboZzYAE1dgApzfFNMqv_X+X6xA9E1bOc3F+mQ3qcTeeeQ@mail.gmail.com>
	<CAL_0O19pK022vAd7yNsF1LjiZK9W0W64P2gQUnEMDsu=1qN=Qg@mail.gmail.com>
	<20120423193115.71c5e2ad@resist.wooz.org>
	<CAHjiUbq47ERsR2My9UoN1aZ0vY1JJ9EU=+UszU0kT6Z16=BORg@mail.gmail.com>
Message-ID: <CAL_0O1-A3w0m9uX=-w-D=hi2YG9b9DGGVqDeFH1xKoA1i8RsWg@mail.gmail.com>

On Tue, Apr 24, 2012 at 1:40 PM, Jeff Breidenbach <jeff at jab.org> wrote:
>>Is 4 bytes too short?
>
> Four characters is only about a million combinations. First collision
> is 50% likely at 1200 messages, and multi-million message databases
> are completely screwed.

If we're willing to impose disambiguation on the user (and ability to
find and report all matching messages on the UI), then the questions
to me would be

0. Assume a 10 million message archive.
1. What percentage of permalinks need another click?
2. What percentage of permalinks will result in a list of more than 10 matches?

Rationale for 0: 10 related lists X 20 years X 365 days X 100
messages/day.  I can imagine people wanting to index into such a
corpus.
Rationale for 1: Obvious, I hope.
Rationale for 2: Maybe I'm just getting old, but that's the number of
lines I can comfortably scan in a glance.  FVO of "10" that suit you,
I guess.

Note that, like Barry, I'm assuming disambiguation will be needed for
x-posts in any case.  WDOT?


>
> Bottom line: how big a database do we expect to have, and amongst
> those messages, how many collisions are considered acceptable?
>
> -Jeff
>
> PS. These numbers assume a well balanced hash. This paper suggests
> SHA-1 is pretty good in non-adversarial situations, but I'm not an
> expert. ?http://cseweb.ucsd.edu/~mihir/papers/balance.html
> _______________________________________________
> Mailman-Developers mailing list
> Mailman-Developers at python.org
> http://mail.python.org/mailman/listinfo/mailman-developers
> Mailman FAQ: http://wiki.list.org/x/AgA3
> Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/
> Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/stephen%40xemacs.org
>
> Security Policy: http://wiki.list.org/x/QIA9

From jeff at jab.org  Tue Apr 24 19:50:39 2012
From: jeff at jab.org (Jeff Breidenbach)
Date: Tue, 24 Apr 2012 10:50:39 -0700
Subject: [Mailman-Developers] [Bug 985149] Add List-Post value to
 permalink hash input
In-Reply-To: <CAL_0O1-A3w0m9uX=-w-D=hi2YG9b9DGGVqDeFH1xKoA1i8RsWg@mail.gmail.com>
References: <CAHjiUboZzYAE1dgApzfFNMqv_X+X6xA9E1bOc3F+mQ3qcTeeeQ@mail.gmail.com>
	<CAL_0O19pK022vAd7yNsF1LjiZK9W0W64P2gQUnEMDsu=1qN=Qg@mail.gmail.com>
	<20120423193115.71c5e2ad@resist.wooz.org>
	<CAHjiUbq47ERsR2My9UoN1aZ0vY1JJ9EU=+UszU0kT6Z16=BORg@mail.gmail.com>
	<CAL_0O1-A3w0m9uX=-w-D=hi2YG9b9DGGVqDeFH1xKoA1i8RsWg@mail.gmail.com>
Message-ID: <CAHjiUboX01=BXeza=rSeE9nNNpDQUdvrTSNDnYViSnd9WTbvAg@mail.gmail.com>

> 0. Assume a 10 million message archive.
> 1. What percentage of permalinks need another click?
> 2. What percentage of permalinks will result in a list of more than 10 matches?

Ignoring cross posts, for a 4 character hash:

1. Approximately 90%
 2. Approximately 50%

Ignoring cross posts, for a 13 character hash:

1. Effectively 0%
2. Effectively 0%

Pick message count and collision tolerance, and hash size will follow.

-Jeff


========== simulation code

#!/usr/bin/python
import random
hashlength = 4
message_count = 10000000
database = {}
collisions = 0
for i in range(message_count):
  n = random.randint(0, pow(2, 5 * hashlength))
  if n in database:
    collisions += 1
    database[n] += 1
  else:
    database[n] = 1
over_ten_collisions = 0
for i in database:
  if database[i] > 10:
    over_ten_collisions += database[i]
p1 = (100.0 * collisions) / float(message_count)
p2 = (100.0 * over_ten_collisions) / float(message_count)
print("Percent coliisions %f" % p1)
print("Percent over ten collisions %f" % p2)

From a.badger at gmail.com  Tue Apr 24 20:12:21 2012
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Tue, 24 Apr 2012 11:12:21 -0700
Subject: [Mailman-Developers] Speaking about kitties (or archivers)
In-Reply-To: <20120423182018.50067886@resist.wooz.org>
References: <1335205036.2188.29.camel@ambre.pingoured.fr>
	<20120423182018.50067886@resist.wooz.org>
Message-ID: <20120424181221.GM28774@unaka.lan>

On Mon, Apr 23, 2012 at 06:20:18PM -0400, Barry Warsaw wrote:
> Thanks for posting this Pierre-Yves!
> 
> On Apr 23, 2012, at 08:17 PM, Pierre-Yves Chibon wrote:
> 
> >archive-core (store the emails and expose them through an API) -->
> >archivers/stats/NNTP
> >
> >The questions are then:
> >- how do we store the emails ?
> >- how do we expose the API ?
> >- how to make it such that it becomes easy to extend ? (ie: the stats
> >module wants to read the db, but probably also to store information on
> >it)
> 
> Sharing is good, but it's also important to remember that any specific system
> may or may not have a local archiver.  I could certainly imagine a site that
> only archives on M-A or Gmane and doesn't waste the space to archive locally.
> 
> I think we've pretty much come to agreement that the core itself doesn't need
> a full copy of all the messages after it's sent them, but of course, the
> "prototype" archiver could be used to keep a local copy of everything in a
> maildir.  That could be shared at the lower level (maildir) or through some
> kind of API in minikitty.
> 
Ive been thinking about this and I'm in mild disagreement.  I think that
a mailing list system should give people an archive-store which is acessible
behind a generalized API.  That may be a non-local archiver if it's still
possible to implement the API.  That archiver-store should be pluggable (the
storage could be SQL, mongodb, or remote) but having the store be accessbile
is important.

The store may be accessible via a REST API but I'm not certain that its the
correct level to deal with when talking about it in this contect.  The
current mailman3 doesn't have an API for plugging in archivers via REST...
it has an API for plugging in archivers via python.  That may be the correct
level to be looking at this.

Now the important part -- why an archive store is more integral than the
current architecture makes it out to be...

One way to look at this is conceptually.  Mailman2 is what I've come to
think of as a complete mailing list system.  By contrast mailman3-core is
only a mailing list manager.  Mailman3 contains the information necessary to
send messages to an address and have those message disseminated to a wider
audience.  By itself, this is just fancy management of email aliases.
Mailing lists seem to be something more than this.  In addition to being
management of where email is sent, they're also repositories of knowledge on
a particular subject.  This is the role filled by archives.

One could also look at it from a sysadmin standpoint.  If a sysadmin wants
to deploy mailman3 with archives.  And wants to have a forum-like interface,
an nntp interface, a standard archives interface, and a REST interface to
the archives are they going to want to set up for different storage
technologies for those, import the generic archives into all four of those,
and then maintain and update the storage technologies to keep them safe and
secure?  Will they want to buy warrantied storage for all of them?  I think
that theyll be happier if the design of our system could consolidate those.

A different way to look at this is from a programmers standpoint.  Many of
the interfaces to archives that were talking about are going to share common
needs.  They need access to the email messages.  They need to know how the
email messages thread together.  They're going to want to search the
messages.  Under the current scheme, programmers will be creating very
similar code to access the email messages in their particular store even if
they all choose to use the same underlying storage technology.

At the beginning I said that I was only in mild disagreement... where's the
qualifier come in?  I think that what we have with mailman3 right now is
something like this:


[mailman3 core] -- maintainance of the list metadata, sending and receiving
                   provides a REST API
    [Web UIs] -- web ui to the Core functions
    [Archivers] -- mailing list storage and user interface to those stored
                   messages.

I think we should look into something a little more symmetrical:


[mailman3 core] -- maintainance of list metadata, sending and receiving,
                   provides a REST API
    [Web UIs] -- web ui to Core functions
    [Archive-stores] -- stores the messages sent to the mailing lists.
                        Provides a (REST?) API to apps built on top of it
    [Archiver UIs] -- web ui, nntp interface, REST API (if not implemented
                      at the storage layer), etc to the archive-store

By splitting the archive storage from the archive UI similar to how
mailman3-core splits with the web ui, we can allow a sysadmin to choose one
archive-storage for all of the archive front-ends that they run on their
systems.

Question: Why have multiple stores?  The big reason is that archives are
being much more rapidly developed right now.  So I anticipate that people
are going to be working on different storage technology with different
tradeoffs.  One storage might be faster.  Another might be more generally
available.  We'll have to reexamine this in the future.  It's possible that
we'll find one storage system that is perfect for all cases.  It's also
possible that we'll find all storage solutions have tradeoffs in which case
we'll likely want to support third-party stores forever.

Question: This is all dangling off of the archiver interface for mailman3
anyway so how can we affect the outcome?  Well, in some ways people can
create anything they want in there so we cant enforce a solution.  However,
if we think that it's desirable, we can certainly document this (maybe with
an interface if we go the python route for that layer of API or with
a specification of what the REST API should look like for that.)  We can
also enhance our current archivers to provide the API that we come up with.
I have a feeling that the prototype archiver with maildir will be a little
slow but if it provides the API and comments about separation between
core, storage, and archive UI it gives people a starting point to
creating their own.

Question: Where do we start?  I think that we'll either succeed or fail very
quickly by trying to define what the API between archive-store and
archiver-ui should look like.  We'll either be able to agree on a common set
of features there (from which we'll be able to go forth and create our own
archive-storage plugins) or we'll decide that we all need/want to do
different things that no common API can address.  If there's no common API
definition then we won't be able to do any of the rest of this so there
won't be any sense continuing down that path.

-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/mailman-developers/attachments/20120424/52c842f7/attachment-0001.pgp>

From stephen at xemacs.org  Wed Apr 25 05:16:59 2012
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 25 Apr 2012 12:16:59 +0900
Subject: [Mailman-Developers] [Bug 985149] Add List-Post value to
 permalink hash input
In-Reply-To: <CAHjiUboX01=BXeza=rSeE9nNNpDQUdvrTSNDnYViSnd9WTbvAg@mail.gmail.com>
References: <CAHjiUboZzYAE1dgApzfFNMqv_X+X6xA9E1bOc3F+mQ3qcTeeeQ@mail.gmail.com>
	<CAL_0O19pK022vAd7yNsF1LjiZK9W0W64P2gQUnEMDsu=1qN=Qg@mail.gmail.com>
	<20120423193115.71c5e2ad@resist.wooz.org>
	<CAHjiUbq47ERsR2My9UoN1aZ0vY1JJ9EU=+UszU0kT6Z16=BORg@mail.gmail.com>
	<CAL_0O1-A3w0m9uX=-w-D=hi2YG9b9DGGVqDeFH1xKoA1i8RsWg@mail.gmail.com>
	<CAHjiUboX01=BXeza=rSeE9nNNpDQUdvrTSNDnYViSnd9WTbvAg@mail.gmail.com>
Message-ID: <CAL_0O1-Nkh4phXB0F+5fcVLRqRZDeODb_XkbqHYpJtfLeV0wpQ@mail.gmail.com>

Thanks!

On Wed, Apr 25, 2012 at 2:50 AM, Jeff Breidenbach <jeff at jab.org> wrote:
>> 0. Assume a 10 million message archive.
>> 1. What percentage of permalinks need another click?
>> 2. What percentage of permalinks will result in a list of more than 10 matches?
>
> Ignoring cross posts, for a 4 character hash:
>
> 1. Approximately 90%
> ?2. Approximately 50%

Too high.

> Ignoring cross posts, for a 13 character hash:
>
> 1. Effectively 0%
> 2. Effectively 0%

Way too low because of "7 plus/minus 2".  As a strawman suggestion,
I'd say that once you get down to 5% and 0.1% respectively, you're
done because in many contexts cross-posts will bring you up by a
similar order of magnitude anyway.

For five-character hash, I get 13% and 0% and for six, 0.4% and 0%.  I
think 1 in 8 messages is probably more than I want to ask users for an
extra click, especially if cross-posts are kicking in substantially
too.  So six looks to be the sweet spot.

So I'm convinced that a shorter hash makes a lot of sense.

Steve

From jeff at jab.org  Wed Apr 25 06:09:47 2012
From: jeff at jab.org (Jeff Breidenbach)
Date: Tue, 24 Apr 2012 21:09:47 -0700
Subject: [Mailman-Developers] [Bug 985149] Add List-Post value to
 permalink hash input
In-Reply-To: <CAL_0O1-Nkh4phXB0F+5fcVLRqRZDeODb_XkbqHYpJtfLeV0wpQ@mail.gmail.com>
References: <CAHjiUboZzYAE1dgApzfFNMqv_X+X6xA9E1bOc3F+mQ3qcTeeeQ@mail.gmail.com>
	<CAL_0O19pK022vAd7yNsF1LjiZK9W0W64P2gQUnEMDsu=1qN=Qg@mail.gmail.com>
	<20120423193115.71c5e2ad@resist.wooz.org>
	<CAHjiUbq47ERsR2My9UoN1aZ0vY1JJ9EU=+UszU0kT6Z16=BORg@mail.gmail.com>
	<CAL_0O1-A3w0m9uX=-w-D=hi2YG9b9DGGVqDeFH1xKoA1i8RsWg@mail.gmail.com>
	<CAHjiUboX01=BXeza=rSeE9nNNpDQUdvrTSNDnYViSnd9WTbvAg@mail.gmail.com>
	<CAL_0O1-Nkh4phXB0F+5fcVLRqRZDeODb_XkbqHYpJtfLeV0wpQ@mail.gmail.com>
Message-ID: <CAHjiUbpWHdGQk1XVyRv2Y=DeQj2fP8GvzVORuthojM2-kaM3xw@mail.gmail.com>

I apologize, the simulation code had a flaw.  I'm embarrassed that I
didn't immediately recognize this immediately from intuition.  We
could get even more accurate results by computing actual SHA-1 of
actual message-ids, but I'm not sure it is worth the effort. Here is a
revised program and the results from one run. For the record, I'm
interested in billion+ message counts, for which we'd need a few more
hash characters. But not that many.

message count 10000000, hash length 4, collisions 99.992930%
message count 10000000, hash length 5, collisions 25.755240%
message count 10000000, hash length 6, collisions 0.928050%
message count 10000000, hash length 7, collisions 0.029860%
message count 10000000, hash length 8, collisions 0.000800%
message count 10000000, hash length 9, collisions 0.000060%
message count 10000000, hash length 10, collisions 0.000000%

===

#!/usr/bin/python
import random
def compute(message_count, hashlength):
  database = {}
  for i in range(message_count):
    n = random.randint(0, pow(2, 5 * hashlength))
    if n in database:
      database[n] += 1
    else:
      database[n] = 1
  collisions = 0
  for i in database:
    if database[i] > 1:
      collisions += database[i]
  p1 = (100.0 * collisions) / float(message_count)
  print("message count %d, hash length %d, collisions %f%%" %
        (message_count, hashlength, p1))

for i in range(4, 11):
  compute(10000000, i)

From terri at zone12.com  Thu Apr 26 03:57:51 2012
From: terri at zone12.com (Terri Oda)
Date: Wed, 25 Apr 2012 19:57:51 -0600
Subject: [Mailman-Developers] Congratulations to our GSoC Students!
Message-ID: <4F98AB9F.1080300@zone12.com>

The announcement from Google went out on Monday, so I'm a bit slow, but 
I'd like to congratulate the three students who will be working with GNU 
Mailman as part of Google Summer of Code 2012:

Aamir Khan will be working on improving Hyperkitty.

Alexander Sulfrian will be working on NNTP access to the archives.

George Chatzisofroniou will be working on metrics, integrating his 
MailmanStats package with Mailman 3.0.

Congratulations to all of you, and thank you to all our other talented 
applicants this year.    It was a very tough field and we were very 
lucky to get three slots allocated to us by our umbrella organization, 
the Python Software Foundation.  I believe this is the largest number of 
students we've ever had!

We're now in what Google calls the "community bonding period" where we 
get to know each other and get things set up.   Students: this would be 
a great time to post a short email to mailman-developers reminding 
everyone about what you're planning to do this summer (Google only gives 
the short abstract in their public list).  I know the mentors are pretty 
excited about the projects, and this gives everyone else a chance to get 
excited too!

Coding starts on May 21st.  So be prepared: this is going to be one 
amazing year for Mailman!

  Terri

From pingou at pingoured.fr  Thu Apr 26 17:35:28 2012
From: pingou at pingoured.fr (Pierre-Yves Chibon)
Date: Thu, 26 Apr 2012 17:35:28 +0200
Subject: [Mailman-Developers] Speaking about kitties (or archivers)
In-Reply-To: <20120424181221.GM28774@unaka.lan>
References: <1335205036.2188.29.camel@ambre.pingoured.fr>
	<20120423182018.50067886@resist.wooz.org>
	<20120424181221.GM28774@unaka.lan>
Message-ID: <1335454528.11183.56.camel@ambre.pingoured.fr>

On Tue, 2012-04-24 at 11:12 -0700, Toshio Kuratomi wrote:
> [mailman3 core] -- maintainance of list metadata, sending and 
>                    receiving, provides a REST API
>     [Web UIs] -- web ui to Core functions
>     [Archive-stores] -- stores the messages sent to the mailing lists.
>                         Provides a (REST?) API to apps built on top 
>                         of it
>     [Archiver UIs] -- web ui, nntp interface, REST API (if not 
>                       implemented at the storage layer), etc to the
>                       archive-store
> 
> By splitting the archive storage from the archive UI similar to how
> mailman3-core splits with the web ui, we can allow a sysadmin to
> choose one archive-storage for all of the archive front-ends that they
> run on their systems.

Thank you Toshio for explaining this in a better than I was able to do.

> Question: Where do we start?  I think that we'll either succeed or
> fail very quickly by trying to define what the API between
> archive-store and archiver-ui should look like.  We'll either be able
> to agree on a common set of features there (from which we'll be able
> to go forth and create our own archive-storage plugins) or we'll
> decide that we all need/want to do different things that no common API
> can address.  If there's no common API definition then we won't be
> able to do any of the rest of this so there won't be any sense
> continuing down that path.

The current version of HK relies on mongodb for the storage, but I want
to test HK with a traditionnal SQL backend. So I have started to work on
this.

The interface I defined is there:
https://github.com/pypingou/kittystore/blob/master/kittystore/__init__.py

And its implementation using SQLAlchemy is there:
https://github.com/pypingou/kittystore/blob/master/kittystore/kittysastore.py

The mongodb implementation isn't done yet but should be quite trivial to
do (most function from the API were coming from it).

The idea is that now, we can have different backend and each module
needing access to the emails can use the API directly without having to
bother about which storage system is behind.


I hope this helps,

Pierre

From pingou at pingoured.fr  Thu Apr 26 18:36:02 2012
From: pingou at pingoured.fr (Pierre-Yves Chibon)
Date: Thu, 26 Apr 2012 18:36:02 +0200
Subject: [Mailman-Developers] Speaking about kitties (or archivers)
In-Reply-To: <20120424181221.GM28774@unaka.lan>
References: <1335205036.2188.29.camel@ambre.pingoured.fr>
	<20120423182018.50067886@resist.wooz.org>
	<20120424181221.GM28774@unaka.lan>
Message-ID: <1335458162.10650.0.camel@ambre.pingoured.fr>

Resent as the first one doesn't seem to want to arrive.

On Tue, 2012-04-24 at 11:12 -0700, Toshio Kuratomi wrote:
> [mailman3 core] -- maintainance of list metadata, sending and 
>                    receiving, provides a REST API
>     [Web UIs] -- web ui to Core functions
>     [Archive-stores] -- stores the messages sent to the mailing lists.
>                         Provides a (REST?) API to apps built on top 
>                         of it
>     [Archiver UIs] -- web ui, nntp interface, REST API (if not 
>                       implemented at the storage layer), etc to the
>                       archive-store
> 
> By splitting the archive storage from the archive UI similar to how
> mailman3-core splits with the web ui, we can allow a sysadmin to
> choose one archive-storage for all of the archive front-ends that they
> run on their systems.

Thank you Toshio for explaining this in a better than I was able to do.

> Question: Where do we start?  I think that we'll either succeed or
> fail very quickly by trying to define what the API between
> archive-store and archiver-ui should look like.  We'll either be able
> to agree on a common set of features there (from which we'll be able
> to go forth and create our own archive-storage plugins) or we'll
> decide that we all need/want to do different things that no common API
> can address.  If there's no common API definition then we won't be
> able to do any of the rest of this so there won't be any sense
> continuing down that path.

The current version of HK relies on mongodb for the storage, but I want
to test HK with a traditionnal SQL backend. So I have started to work on
this.

The interface I defined is there:
https://github.com/pypingou/kittystore/blob/master/kittystore/__init__.py

And its implementation using SQLAlchemy is there:
https://github.com/pypingou/kittystore/blob/master/kittystore/kittysastore.py

The mongodb implementation isn't done yet but should be quite trivial to
do (most function from the API were coming from it).

The idea is that now, we can have different backend and each module
needing access to the emails can use the API directly without having to
bother about which storage system is behind.


I hope this helps,

Pierre


From syst3m.w0rm at gmail.com  Sun Apr 29 15:16:48 2012
From: syst3m.w0rm at gmail.com (Aamir Khan)
Date: Sun, 29 Apr 2012 18:46:48 +0530
Subject: [Mailman-Developers] Congratulations to our GSoC Students!
In-Reply-To: <4F98AB9F.1080300@zone12.com>
References: <4F98AB9F.1080300@zone12.com>
Message-ID: <CAOb12VVGEad2=Vu16J+3+hUWKNw5t8KPn+=Q=G1svxcH7caaYQ@mail.gmail.com>

Hi!

I will start first with my introduction.

My name is Aamir Khan. I will be working on improving HyperKitty. *HyperKitty
is Mailman3 archiver, aimed to address issues listed at
ModernArchiving<http://wiki.list.org/display/DEV/ModernArchiving>.
For more details, you might want to check out my GSoC proposal[1]. I will
try to blog regularly about my work[2].*

I am 3rd year undergraduate student at IIT Roorkee, major being Computer
Science. I am very passionate about algorithms. Besides what I have been
learning in school, so far I have gained programming experience
by participating in algorithms competitions and also by creating several
pet projects, contributing to open source.

Feel free to mail me with any questions or feedback about my proposal. If
you just want to say hi, that's cool too. I am very excited for my GSoC
project and looking forward to working with all of you!


[1] =>
http://google-melange.appspot.com/gsoc/proposal/review/google/gsoc2012/syst3mw0rm/1
[2] => http://blog.aamirkhan.co.in/


*Contact info:*
Time zone: Indian Time +5:30 GMT
email: syst3m.w0rm at gmail.com
Phone: +91 9557647357
IRC: syst3mw0rm on freenode
twitter: @syst3mw0rm


On Thu, Apr 26, 2012 at 7:27 AM, Terri Oda <terri at zone12.com> wrote:

>  The announcement from Google went out on Monday, so I'm a bit slow, but
> I'd like to congratulate the three students who will be working with GNU
> Mailman as part of Google Summer of Code 2012:
>
> Aamir Khan will be working on improving Hyperkitty.
>
> Alexander Sulfrian will be working on NNTP access to the archives.
>
> George Chatzisofroniou will be working on metrics, integrating his
> MailmanStats package with Mailman 3.0.
>
> Congratulations to all of you, and thank you to all our other talented
> applicants this year.    It was a very tough field and we were very lucky
> to get three slots allocated to us by our umbrella organization, the Python
> Software Foundation.  I believe this is the largest number of students
> we've ever had!
>
> We're now in what Google calls the "community bonding period" where we get
> to know each other and get things set up.   Students: this would be a great
> time to post a short email to mailman-developers reminding everyone about
> what you're planning to do this summer (Google only gives the short
> abstract in their public list).  I know the mentors are pretty excited
> about the projects, and this gives everyone else a chance to get excited
> too!
>
> Coding starts on May 21st.  So be prepared: this is going to be one
> amazing year for Mailman!
>
>  Terri
>


-- 
Aamir Khan | 3rd Year  | Computer Science & Engineering | IIT Roorkee

From sophron at latthi.com  Sun Apr 29 15:46:35 2012
From: sophron at latthi.com (George Chatzisofroniou)
Date: Sun, 29 Apr 2012 16:46:35 +0300
Subject: [Mailman-Developers] Congratulations to our GSoC Students!
In-Reply-To: <4F98AB9F.1080300@zone12.com>
References: <4F98AB9F.1080300@zone12.com>
Message-ID: <CACeRBz=cv0bthMaeEdQhU4GzmaMjC_iHywX+ZL7jOW-aJOLpXw@mail.gmail.com>

Hello everyone,

My name is George Chatzisofroniou, i'm 20 years old and study Computer
Science. I'm a volunteer sysadmin in my university's labs and i love
to code. In my free time, i like to walk or play extreme sounds with my band.

In this summer, i'm going to implement Metrics on Mailman 3.0. Right
now, i'm going through the Django documentation and waiting for the
appropriate mechanisms to be in place so my app can collect the right
data.

I keep a GSoC diary here [1].

Any advises are very welcome!

Thanks,

[1]: http://sophron.latthi.com/gsoc-mailman/


-- 
George Chatzisofroniou
sophron.latthi.com