[PYTHON META-SIG] Organized web archives of the PSA SIG mailing lists - review
Andrew Kuchling
amk@magnet.com
Fri, 11 Oct 1996 18:11:38 -0400 (EDT)
> - *If* it's easy to do, it'd be nice to have the archives subdivided
> only when the size of the yearly collection exceeds a certain
> threshold - say 200 messages. (I really don't know what's
> appropriate, but my suspicion is that 200 is not too big.)
Ummm... think think... while it's feasible, I think it would
be a bit kludgy. (I'll consider it further, though.) If disk space
isn't too big a problem, why not index both _en masse_ and by
quarter/month?
A digression about how Pipermail works: the base pipermail.T
class handles formatting, and has abstract methods like
get_archives(A), which returns a list of archives where article A
should be filed. Each archive is then a subdirectory. get_archive()
has access to the article's headers (and even its body), so it can
make quite complex decisions.
An article can be put in multiple archives; for example, we
could automatically put postings by Guido, or postings where the
subject line begins with "ANNOUNCE:", in a separate archive. (Any
suggestions for such special archives?)
Currently, a copy of the article is made in each archive
directory; my fuzzy reasoning behind this is that you might want
articles formatted differently depending on where they're going.
(Consider keeping a verbatim copy of postings, and an HTML-formatted
version.) This will eat disk space quickly if articles are placed in
lots of different archives all the time.
An alternative would be to have a single directory for
formatted articles, and each different archive would point into that
single repository. This means we can't format articles differently
for each archive, but it's a lot easier on disk space.
> (Sectioning of the archives will be less disruptive when there is
> an archive search interface, for which andrew is also seeking
> comments.)
One note: the search isn't available on www.magnet.com because
I can't run CGI scripts there. I've prototyped a search using swish
on amarok, but it's hidden behind a firewall. We can worry about that
after the archives are up.
Another big problem: Python code's indentation gets mangled by
HTML formatting. I'd like to magically recognize inclusions of code,
and add <PRE>...</PRE> around them. Any suggestions for how to do
this fairly reliably? I consider this critical for making the
archives usable. (We could just always put the entire article inside
<PRE></PRE>, but that's ugly and not very readable.)
Andrew Kuchling
amk@magnet.com
=================
META-SIG - SIG on Python.Org SIGs and Mailing Lists
send messages to: meta-sig@python.org
administrivia to: meta-sig-request@python.org
=================