[Mailman-Users] Fwd: HTML filter on the lists

Thu Aug 3 08:55:13 CEST 2006

I got the below message from a user, and am not quite sure what to do?  Any advice?

Dave

>Date: Thu, 3 Aug 2006 00:30:29 -0600
>From: "T. Joseph Carter" <tjcarter at bluecherry.net>
>To: David Andrews <dandrews at visi.com>
>Subject: HTML filter on the lists
>
>The filter you are using on text/html messages to the list really is very,
>very broken.  First, it leaves parts of the HTML behind.  Second, it lies
>about its output, claiming that all messages are now us-ascii (which
>breaks character set conversion tools which need to know the original
>character set in order to map to the correct one.)
>
>The situation as it exists now is that you have almost everyone on the
>list using Microsoft Outhous--er, I mean Outlook, which renders plain text
>us-ascii messages as HTML in Windows-Latin-1 encoding.
>
>My native character set is not Windows-Latin-1, it's UTF-8.  This requires
>conversion, and the conversion tools assume that because your filter says
>the message is us-ascii, it actually is.  I am also one of the about three
>people on the lists whose email does not support HTML natively.  I have
>fixed that with a mail filter, but it only works if the message is
>actually HTML.
>
>Essentially, the three people for whom your mail filter still serves a
>purpose are having to deal with HTML emails we can't read precisely
>because your filter doesn't actually do what it says it does.
>
>My thought on this is to switch to a filter that simply defangs HTML
>without stripping it, or replacing the existing filter with some suitable
>lynx command line.  My filter:
>
>LANG=en.UTF-8 lynx -dump -localhost -stdin -dont-wrap-pre -minimal
>
>You might want to use en.iso8859-1 instead for LANG, since just about
>everyone on the list speaks a Latin-1 language natively and Outlook does
>know how to convert that to a Windows character set rather easily.  Just
>make sure that when the output is stuffed back into MIME format the
>charset is set to match the output.
>
>
>I tried to write something to correct this--if I take an affected message,
>correct the MIME headers so mutt knows it's HTML and what charset it
>really is, mutt does properly extract the message.  The problem is that
>there is no automated way to determine which messages are mangled, and any
>filter would be forced to make as many assumptions about what the filter
>broke as as the filter made in breaking it.  An Eastern-European poster's
>messages would be garbled beyond recovery.  The proper solution is to not
>break the messages.  *smile*
>
>
>__________ NOD32 1.1689 (20060802) Information __________
>
>This message was checked by NOD32 antivirus system.
>http://www.eset.com