[Tutor] Newbie - Simple mailing list archiver

Bob Gailer bgailer@alum.rpi.edu
Sun Mar 30 12:42:03 2003


--=======49B341D0=======
Content-Type: text/plain; x-avg-checked=avg-ok-50C2466; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 8bit

At 09:21 PM 3/29/2003 +0000, Barnaby Scott wrote:
>I manage a mailing list and want to archive it, in a very basic form, to a
>website - which happens to consist of a wiki (the engine for which is
>PikiePikie - which I'm very impressed by and which is itself written in
>Python). The great thing about this excercise is that the wiki uses only
>text files, and the HTML is generated on the fly, so my archive will be all
>plain text.
>
>Below I have sketched out a skeleton for what the archiver needs to do, as I
>see it. It would be a bit cheeky just to come here and ask someone to fill
>in all the Python code for me! However, I would be extremely grateful for 2
>things:
>1: Any comments about the strategic 'skeleton' I have laid out
>2: As much or as little of the code as anyone feels able/inclined to fill
>in. It's not that I'm lazy, it's just that when you are starting literally
>from scratch, it is really hard to know if you are getting EVERYTHING
>completely wrong, and wasting time disappearing up blind alleys. Even
>incredibly simple-sounding things sometimes take weeks to discover unless
>you are shown the way!
>
>As I say, ANY amount of guidance would be gratefully received. Here's my
>skeleton...
>
>#Read mail message from STDIN
>
>#Get these header values:
>#     From
>#     Subject (or specify 'no subject')
>#     Message-ID
>#     Date (in a short, consistent format)
>#     In-Reply-To, or failing that the last value in References, if either
>exist
>#     Content-Type
>
>#If Content-Type is multipart/*, get only the body section that is
>#text/plain
>#Else if Content-Type is text/plain, get the body text
>#Else give up now!

# Perhaps several of us will help in various areas. My contribution, at this
# point is to suggest you consider using the SQLite database for your
# messageIDs "file". SQLite is free, as well the Python interface pySQLite.
# See www.sqlite.org/  and pysqlite.sourceforge.net/.

import sqlite
connection = sqlite.connect('messageids', 077)
# messageids is the database file path
cursor = connection.cursor()

>#Open my 'messageIDs' file (a lookup file which stores, from previous
>messages, the value pairs: An integer messageID, original Message-ID)

# you'll need to create a table. Do this just once:
cursor.execute("create table messageids (messageid int, originalid int)")

>#Find there the highest existing integer messageID and generate a new one by
>#adding 1

cursor.execute("select max(messageid) from messageids")
newid = int(cursor.fetchone()[0]) + 1

>#Append to the 'messageIDs' file:
>#    Our newly generated integer messageID, this message's Message-ID

cursor.execute("insert into messageids values(newid, messageid)")
# this may seem like overkill, but as you expand you may find that having
# a database will be very useful

>#Open my 'ArchiveIndex' file (a wiki page which lists all messages in
>#threads, with a specific method of indenting)
>
>#If there is an In-Reply-To or References value
>#    Look in the 'messageIDs' file for this Message-ID and return that
>message's corresponding integer messageID
>#    Look in the 'ArchiveIndex' file to find this integer at the beginning
>of a line (save for preceding spaces and '*')
>#    Add a new line immediately after this line we have found
>#    Pad it with one more leading space than the preceding line had and...
>#Else
>#    Add a new line at beginning of file
>#    Pad it with 1 leading space and...
>
>#...now write on that line:
>#    '*' integer messageID + ': ' + Subject + ' ' + From + ' ' + Date + '
>['ArchivedMessage' + integer messageID + ' ' + 'View]'
>
>#Close the 'ArchiveIndex' and 'messageIDs' files
>
>#Create and open a new file called 'ArchivedMessage' + integer messageID
>
>#Write to this file:
>#    From
>#    Subject
>#    Date
>#    plain text body
>
>#Close the 'ArchivedMessage?' file
>
>
>_______________________________________________
>Tutor maillist  -  Tutor@python.org
>http://mail.python.org/mailman/listinfo/tutor
>
>
>---
>Incoming mail is certified Virus Free.
>Checked by AVG anti-virus system (http://www.grisoft.com).
>Version: 6.0.463 / Virus Database: 262 - Release Date: 3/17/2003

Bob Gailer
PLEASE NOTE NEW EMAIL ADDRESS bgailer@alum.rpi.edu
303 442 2625

--=======49B341D0=======
Content-Type: text/plain; charset=us-ascii; x-avg=cert; x-avg-checked=avg-ok-50C2466
Content-Disposition: inline


---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.463 / Virus Database: 262 - Release Date: 3/17/2003

--=======49B341D0=======--