[Mailman-Developers] New Pipermail hacks (was Re: Ok, it works! ...)

David Champion dgc@uchicago.edu
Fri, 26 Oct 2001 00:34:53 -0500


On 2001.10.25, in <87vgh3ayv3.fsf@nausicaa.interq.or.jp>,
	"Ben Gertzfield" <che@debian.org> wrote:
> 
> Here's a patch that actually throws out all-HTML emails, but just
> removes HTML parts.  
> 
> Actually, why don't we just decode HTML attachments like any other,
> and let the user beware if they want to click on it?  There are lots
> of legitimate reasons to allow HTML attachments.  I can't think of any
> to allow all-HTML messages. *grin* We could treat all-HTML messages in
> the same way, just provide a link and let the user beware if they
> click on it.

Unfortunately, I think there are legitimate reasons for allowing HTML
messages (as well as parts) into the record. But I don't think that
legitimizes passing the HTML through literally -- this poses a big
potential threat to archive viewers.

I don't care to make a full-blown rendering of HTML; I'd argue that it's
not Mailman's job -- but it is Mailman's job (or, more precisely, the
archiver's job) to provide any text available to the archive viewer.
Whether its display is true to the intentions of the poster is subject
to endless debate, but HTML is widely expected to be legible even if
it's not rendered per specification -- and it almost always is, if you
try hard enough -- so I think that the content should be available.

I suggested transliterating the HTML with &lt; and &gt; tokens, to
make it harmless but legible, in case there's significant text inside.
But, admittedly, that is pretty ugly. What about simply stripping out
ALL markup, leaving only bare text -- and perhaps doing some minor
interpretation for <br> and <p> tags, just to improve readability? Then
throw in a link to the original, as Ben suggests, for good measure.


> The patch also adds a filename to the replacement payload, so that
> users can have an idea of what they're going to see if a description
> was not provided (VERY common).  

Ah, filenames. I'd actually like to see the filename stored on the
server as requested in the MIME content-disposition. I don't think
the archiver needs to guarantee literalism here; a good-faith effort
is sufficient. But I think it's significant in many cases, where the
transmission filename is really how the file needs to be saved locally.
Minimally I'd like the filename to be shown on the archive display, but
it'd be nice if I don't need to change the filename in my browser's
"save as..." dialog each time I save an attachment.

I'd suggest a very basic sanitizing of the basename of the MIME
filename. Something like s!.*[:/\\]!! to remove pathname components for
all three major pathname separators, and then (optionally) to either
hex-encode the non-alphanumeric symbols, a la HTML, or to replace them
with some other token.

-- 
 -D.	dgc@uchicago.edu	NSIT	University of Chicago