[Mailman-Users] problem with accented characters, converting HTML to plain text

Mark Sapiro mark at msapiro.net
Tue Jul 21 20:01:40 CEST 2015


On 7/21/15 12:42 AM, Laura Creighton wrote:
> 
> My new rule in my mailer for how to display html text is:
> 
> w3m -dump -o display_link_number=1 -cols 78 -T text/html -I "$(echo %a | sed -r 's/.*charset="?([-a-zA-Z0-9_]*).*/\1/')" -O utf-8 | less
> 
> which is one heck of a mouthful, but hasn't caused me any problems since.
> 
> Just in case somebody else wants to ditch lynx ...


While I'm sure Laura's command above works well as an HTML viewer for an
MUA such as might be specified in a mutt mailcap file, there are issues
with trying to use this as a Mailman HTML_TO_PLAIN_TEXT_COMMAND because
it gets the input charset from the message's Content-Type: header and
none of the message's headers are passed to HTML_TO_PLAIN_TEXT_COMMAND.
Also, it specifies the output charset as utf-8, but Mailman will not
change the charset parameter in the converted MIME part. It only changes
the MIME type from text/html to text/plain so if the original HTML
charset is not utf-8, creating utf-8 output would be wrong.

While one could use some w3m command in HTML_TO_PLAIN_TEXT_COMMAND, the
appropriate command might be something like

w3m -dump -o display_link_number=1 -cols 78 -T text/html %(filename)s

without the -I and -O options, and this could wind up with the same
charset issues as lynx.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan


More information about the Mailman-Users mailing list