[Mailman-Users] Changing Characters

Steven D'Aprano steve at pearwood.info
Wed Jun 27 01:37:32 EDT 2018


On Tue, Jun 26, 2018 at 10:09:46PM -0500, David Andrews wrote:
> At 07:40 PM 6/26/2018, Mark Sapiro wrote:
> >On 6/26/18 5:03 PM, Richard Damon wrote:
> >> On 6/26/18 2:12 PM, David Andrews wrote:
> >>> I am running Mailman 2.1.26, cPanel. I had a message that I forwarded
> >>> to a list using Outlook 2010. It looked fine in Outlook, but when it
> >>> went to list all ' apostrophes were changed to ? question mark. What
> >>> causes this, and how can I prevent it.
> >>>
> >>> Dave
> >> The lists language is set to use a National Code page, and Outlook
> >> formatted the message to use a 'Smart Quote' that isn't part of that
> >> Code Page.
> >
> >
> >I'm not sure what's happening. Yes, Outlook represented the message in a
> >character set (code page) which wasn't compatible with the list's
> >language character set, probably us-ascii, but this should affect only
> >plain format digests and archives where the message is represented in
> >the list's character set. For individual messages sent to the list
> >members and MIME format digest, there should be no transliteration.
> 
> This wasn't in the digest, it was in a regular message.

Look at the charset used by the email, the charset the mail client uses, 
and the actual characters in use. If there's a discrepency between any 
of them, weird things are displayed.

Look at the email's Content-Type header, it should look something like 
this:

    Content-Type: text/plain; charset="us-ascii";

(Actually email should use utf-8, ALWAYS, but hardly anything does.)

Given that this has some sort of curly quotes, it ought to use UTF-8, 
not ASCII, but so many Windows applications fail to UTF-8 when they 
should it is heart-breaking.
 
Second-best should be Windows-1252, sometimes called CP-1252. If it is 
labelled "iso-8859-1" that's wrong but common. If there's no charset 
declared at all, assume the encoding is actually Windows-1252 given 
that it has come from Outlook.

Then look at your email client. (Which is...?) It ought to honour the 
Content-Type header, but some older email clients don't and just assume 
everything is ASCII or the machine's default code page, whatever that 
is. If there is a way to instruct your client to change encodings (there 
is often an "Encoding" menu, try setting it by hand and see if the 
invalid question marks change to ’ characters. (That's a U+2019 RIGHT 
SINGLE QUOTATION MARK.)

Finally, try looking at the "Raw Contents" or "Full Email" or whatever 
your email client calls it -- you want to look at the raw content of the 
email, in full. Find the places where the mystery question marks are, 
and see what you can see. If you're lucky, it will be some sort of 
little square box with a four-digit hex code in it, like 0098 or FFFF.

(But don't be surprised if it isn't visible at all.)


-- 
Steve


More information about the Mailman-Users mailing list