[Mailman-Users] Changing Characters
Steven D'Aprano
steve at pearwood.info
Wed Jun 27 01:37:32 EDT 2018
On Tue, Jun 26, 2018 at 10:09:46PM -0500, David Andrews wrote:
> At 07:40 PM 6/26/2018, Mark Sapiro wrote:
> >On 6/26/18 5:03 PM, Richard Damon wrote:
> >> On 6/26/18 2:12 PM, David Andrews wrote:
> >>> I am running Mailman 2.1.26, cPanel. I had a message that I forwarded
> >>> to a list using Outlook 2010. It looked fine in Outlook, but when it
> >>> went to list all ' apostrophes were changed to ? question mark. What
> >>> causes this, and how can I prevent it.
> >>>
> >>> Dave
> >> The lists language is set to use a National Code page, and Outlook
> >> formatted the message to use a 'Smart Quote' that isn't part of that
> >> Code Page.
> >
> >
> >I'm not sure what's happening. Yes, Outlook represented the message in a
> >character set (code page) which wasn't compatible with the list's
> >language character set, probably us-ascii, but this should affect only
> >plain format digests and archives where the message is represented in
> >the list's character set. For individual messages sent to the list
> >members and MIME format digest, there should be no transliteration.
>
> This wasn't in the digest, it was in a regular message.
Look at the charset used by the email, the charset the mail client uses,
and the actual characters in use. If there's a discrepency between any
of them, weird things are displayed.
Look at the email's Content-Type header, it should look something like
this:
Content-Type: text/plain; charset="us-ascii";
(Actually email should use utf-8, ALWAYS, but hardly anything does.)
Given that this has some sort of curly quotes, it ought to use UTF-8,
not ASCII, but so many Windows applications fail to UTF-8 when they
should it is heart-breaking.
Second-best should be Windows-1252, sometimes called CP-1252. If it is
labelled "iso-8859-1" that's wrong but common. If there's no charset
declared at all, assume the encoding is actually Windows-1252 given
that it has come from Outlook.
Then look at your email client. (Which is...?) It ought to honour the
Content-Type header, but some older email clients don't and just assume
everything is ASCII or the machine's default code page, whatever that
is. If there is a way to instruct your client to change encodings (there
is often an "Encoding" menu, try setting it by hand and see if the
invalid question marks change to ’ characters. (That's a U+2019 RIGHT
SINGLE QUOTATION MARK.)
Finally, try looking at the "Raw Contents" or "Full Email" or whatever
your email client calls it -- you want to look at the raw content of the
email, in full. Find the places where the mystery question marks are,
and see what you can see. If you're lucky, it will be some sort of
little square box with a four-digit hex code in it, like 0098 or FFFF.
(But don't be surprised if it isn't visible at all.)
--
Steve
More information about the Mailman-Users
mailing list