From matt at sidefx.com Tue Feb 8 20:20:13 2011 From: matt at sidefx.com (Matt Chaput) Date: Tue, 08 Feb 2011 14:20:13 -0500 Subject: [Archiver-dev] Continuously crawling an email list Message-ID: <4D51976D.5070805@sidefx.com> Hi, just saw this list, it seemed related to what I want to do, so I signed up :) I want to create a web app that indexes email messages as they appear in a MailMan list and makes them available for search. It seems like one way to do this would be to create an email account, sign it up to the list, and use an IMAP4 client to poll the account and download new messages. But is that the best way? For one thing, it doesn't allow the batch indexing of old list messages. For that I'd have to download tar'd archives and support separate indexing paths for old (archived), newish (downloaded recently) and new (just pulled out of the account) messages. Is there a good way to have a script "follow" an email list? And better yet, is there already code out there to do so ;) Thanks, Matt From marshman at gmail.com Thu Feb 10 15:51:50 2011 From: marshman at gmail.com (Jeff Marshall) Date: Thu, 10 Feb 2011 06:51:50 -0800 Subject: [Archiver-dev] Continuously crawling an email list In-Reply-To: <4D51976D.5070805@sidefx.com> References: <4D51976D.5070805@sidefx.com> Message-ID: Hi Matt. For mail-archive.com we essentially do what you mention: we have a global email address (archive at mail-archive.com) that list admins can subscribe to their list, and then our service converts the received messages into MHonArc archives. When a list admin comes to us with a request to load up old archives we chew through the old mbox files and pump them into our MHonArc-digesting code. I'm not aware of any scripts that would follow a list in an easier fashion. A benefit to subscribing an address to the list is you make the list admin aware of who you are, and they have the option of saying no. Jeff On Tue, Feb 8, 2011 at 11:20 AM, Matt Chaput wrote: > Hi, just saw this list, it seemed related to what I want to do, so I signed > up :) > > I want to create a web app that indexes email messages as they appear in a > MailMan list and makes them available for search. > > It seems like one way to do this would be to create an email account, sign > it up to the list, and use an IMAP4 client to poll the account and download > new messages. > > But is that the best way? For one thing, it doesn't allow the batch > indexing of old list messages. For that I'd have to download tar'd archives > and support separate indexing paths for old (archived), newish (downloaded > recently) and new (just pulled out of the account) messages. > > Is there a good way to have a script "follow" an email list? And better > yet, is there already code out there to do so ;) > > Thanks, > > Matt > _______________________________________________ > Archiver-dev mailing list > Archiver-dev at python.org > http://mail.python.org/mailman/listinfo/archiver-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hlein at marc.info Thu Feb 10 16:23:40 2011 From: hlein at marc.info (Hank Leininger) Date: Thu, 10 Feb 2011 10:23:40 -0500 (EST) Subject: [Archiver-dev] Continuously crawling an email list In-Reply-To: References: <4D51976D.5070805@sidefx.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 10 Feb 2011, Jeff Marshall wrote: > Hi Matt. For mail-archive.com we essentially do what you mention: we have a > global email address (archive at mail-archive.com) that list admins can > subscribe to their list, and then our service converts the received messages > into MHonArc archives. When a list admin comes to us with a request to load > up old archives we chew through the old mbox files and pump them into our > MHonArc-digesting code. > > I'm not aware of any scripts that would follow a list in an easier fashion. > A benefit to subscribing an address to the list is you make the list admin > aware of who you are, and they have the option of saying no. For MARC it is similar, although we have one distinct email alias per list (typically listname at marc.info) that we subscribe (and of course the back-end is different, we stuff things in an SQL DB rather than MHonArc). For us inserting a message also queues it for search-indexing. The message-insertion script can handle a single message or many, so it's the same codepath if I get my hands on (or reconstruct) mboxes of historical list traffic. I've also got various utility scripts to pull down and clean up messages from different existing archive types. If you (or anybody else for that matter) wants them, let me know, I'll make them available somewhere. (Maybe even with a <6 month turnaround. :( ) I suppose you could use IMAP to poll/pull list-messages from a subscribed account, like if you didn't want to have an SMTP path into your archive-cooking server, but in most cases that just seems to me like unnecessary extra moving parts. Thanks, Hank > On Tue, Feb 8, 2011 at 11:20 AM, Matt Chaput wrote: > >> Hi, just saw this list, it seemed related to what I want to do, so I signed >> up :) >> >> I want to create a web app that indexes email messages as they appear in a >> MailMan list and makes them available for search. >> >> It seems like one way to do this would be to create an email account, sign >> it up to the list, and use an IMAP4 client to poll the account and download >> new messages. >> >> But is that the best way? For one thing, it doesn't allow the batch >> indexing of old list messages. For that I'd have to download tar'd archives >> and support separate indexing paths for old (archived), newish (downloaded >> recently) and new (just pulled out of the account) messages. >> >> Is there a good way to have a script "follow" an email list? And better >> yet, is there already code out there to do so ;) >> >> Thanks, >> >> Matt >> _______________________________________________ >> Archiver-dev mailing list >> Archiver-dev at python.org >> http://mail.python.org/mailman/listinfo/archiver-dev >> > -----BEGIN PGP SIGNATURE----- iD8DBQFNVAL8qP26fHCT+PMRAl9pAJ9Z7yhE4LCkvt0vo/M1kObOuTZcMACfdzMS z2BU1KwXwdI0FLlG1tSyvgI= =1NhY -----END PGP SIGNATURE-----