[Mailman-Users] Importing archives again

Ivan Van Laningham ivanlan at pauahtun.org
Fri Apr 20 22:05:16 CEST 2007


Hi All--
Mark writes:

>
> >
> >[Sangha] Anger and its expression   Ryunyokingryunyo at earthlink.net
> >
> >The address is supposed to be "ryunyo at earthlink.net".
>
>
> No, it is supposed to be "ryunyo at earthlink.net".
>

Not exactly.  On the index pages for the archives, index lines take
one of two forms:

# [Sangha] Welcome Home   mag at swcp.com

or

# [Sangha] Welcome Home   Pat Stacy

Nowhere are @ signs supposed to be used in the archives.  Or at least
in the version of Mailman I'm using and the way I've got it set up.

I looked at .txt files produced by the new list and compared them with
the old .txt files I've got.

>From lines from new .txt file:
>>>
>From erstad at nilsandreas.info  Thu Feb  1 01:01:49 2007
From: erstad at nilsandreas.info (Nils Andreas Erstad)
<<<

>From lines from an old .txt file:
>>>
>From ivanlan at home.com Mon, 31 Jul 2000 23:34:46 -0600
Date: Mon, 31 Jul 2000 23:34:46 -0600
From: Ivan Van Laningham ivanlan at home.com
<<<

>From lines from existing mbox file:
>>>
>From ivanlan at pauahtun.org Wed Feb 28 01:40:24 2007
Date: Tue, 27 Feb 2007 18:38:34 -0700
From: Ivan Van Laningham <ivanlan at pauahtun.org>
<<<
<<<

The difference appears to be in the address on the From: line:  that
is, "Ivan Van Laningham ivanlan at home.com" fails but "Ivan Van
Laningham <ivanlan at pauahtun.org>"

If that's correct, I can modify those lines easily.

Metta,
Ivan



On 4/20/07, Mark Sapiro <msapiro at value.net> wrote:
> Ivan Van Laningham wrote:
>
> >Hi All--
> >This is very helpful.  What I have are basically three sets of archives.
> >
> >1)  Archives from the current list, fairly small and created about two
> >months ago after a disastrous ISP debacle (the yo-yos got themselves
> >_evicted_, for heaven's sake);
> >
> >2)  Archives from the previous host and list incarnation and a much
> >earlier version--but still > 2.0--of Mailman;
> >
> >3)  Archives from the previous host, same list, but a version of
> >Mailman that might have started with the digit one. ;-)  The person
> >who upgraded Mailman in Feb 2002 didn't bother to import the existing
> >archives, so now is the first time I've tried to import such old
> >archives.
> >
> >I have successfully dealt with 1 and 2.  Appending the two mboxes
> >works well, probably because there is a two-week gap between the two
> >latest incarnations of the list.
> >
> >However, 3 is a problem.  I don't have an mbox for the earliest
> >archives; instead, I have the text files--2002-February.txt,
> >etc.--which appear to me to be in mbox format.
>
>
> The .txt files are similar to .mbox files, but there are various
> differences. Many headers have been removed and, most importantly,
> email addresses may have been obscured by changing user at example.com to
> user at example.com.
>
>
> >If I run cleanarch on these text files before running arch on them,
> >they do not appear in the archives.
>
>
> Probably because cleanarch escapes all the "From " separators because
> the email address has " at " instead of "@".
>
>
> >If I skip cleanarch, then I get
> >bad addresses in the posts in the archives (and yes, I did use the
> >--wipe option).  The bad addresses look like the following in the
> >index page:
> >
> >[Sangha] Anger and its expression   Ryunyokingryunyo at earthlink.net
> >
> >The address is supposed to be "ryunyo at earthlink.net".
>
>
> No, it is supposed to be "ryunyo at earthlink.net".
>
>
> >How can I preprocess the text files to fix the problem addresses?  I
> >assume it's because the old text files have something like From:
> >Ryunyo King<"ryunyo at earthlink.com"> in the from line.  Is there a
> >secret option to cleanarch I didn't see?
>
>
> cleanarch won't do this. You need to process the .txt files your self
> with your own script or by hand to replace " at " with "@" in email
> addresses before using cleanarch.
>
> Obviously, you can't just globally replace " at " with "@" as there
> will be many occurrences of " at " outside email addresses.
>
> You might limit your self to "From " lines and From: headers. That will
> probably work. You could also try to use some regexp that only matches
> " at " if it looks like it's in an email address.
>
>
> >(I also ended up with a slew of duplicates when the upgrade happened
> >in Feb 2002; half the messages are right, the other half of the
> >duplicate messages have addresses similar to the above.  But I'm
> >pretty sure I can deal with those.)
> >
> >Thanks for all the help.
> >
> >Metta,
> >Ivan
>
> --
> Mark Sapiro <msapiro at value.net>       The highway is for gamblers,
> San Francisco Bay Area, California    better use your sense - B. Dylan
>
> ------------------------------------------------------
> Mailman-Users mailing list
> Mailman-Users at python.org
> http://mail.python.org/mailman/listinfo/mailman-users
> Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
> Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
> Unsubscribe: http://mail.python.org/mailman/options/mailman-users/ivanlan9%40gmail.com
>
> Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=show&amp;file=faq01.027.htp
>


-- 
Ivan Van Laningham
God N Locomotive Works
http://www.pauahtun.org/
http://www.python.org/workshops/1998-11/proceedings/papers/laningham/laningham.html
Army Signal Corps:  Cu Chi, Class of '70
Author:  Teach Yourself Python in 24 Hours


More information about the Mailman-Users mailing list