From Bernhard.Schmidt at lrz.de Wed Jul 13 06:56:05 2016 From: Bernhard.Schmidt at lrz.de (Bernhard Schmidt) Date: Wed, 13 Jul 2016 12:56:05 +0200 Subject: [Mailman-i18n] Charset of Plaintext Templates Message-ID: <3333a298-3a03-a63a-ef69-673b6c5f5eb9@lrz.de> Hi, I've just noticed that Mailman sends wrongly encoded German mails for admin approval of subscriptions that looks like this --- Ihre Genehmigung ist f??r den folgenden Abonnementswunsch erforderlich: F??r: xxx at lrz.de Liste: yyy at lists.lrz.de Bitte besuchen Sie bei Gelegenheit https://lists.lrz.de/mailman/admindb/yyyy um diese Anfrage zu beantworten. --- The charset of the mail is UTF-8, and templates/de/subauth.txt is also UTF-8, but the text looks like the UTF-8 template has been parsed as ISO8859-1 and recoded into UTF-8 again. Recoding templates/de/subauth.txt to ISO8859-1 fixes the issue. We have a lot of different encodings even within the same language and filetype mailman/templates/de (1) % file * adminaddrchgack.txt: UTF-8 Unicode text admindbdetails.html: HTML document, ASCII text admindbpreamble.html: HTML document, ASCII text admindbsummary.html: HTML document, ASCII text adminsubscribeack.txt: ASCII text, with no line terminators adminunsubscribeack.txt: ASCII text admlogin.html: HTML document, ASCII text approve.txt: ISO-8859 text archidxentry.html: HTML document, ASCII text archidxfoot.html: HTML document, ASCII text archidxhead.html: HTML document, ASCII text archlistend.html: ASCII text archliststart.html: HTML document, ASCII text archtocentry.html: HTML document, ASCII text archtoc.html: HTML document, ASCII text archtocnombox.html: HTML document, ASCII text article.html: HTML document, ASCII text bounce.txt: ISO-8859 text checkdbs.txt: ISO-8859 text convert.txt: ISO-8859 text cronpass.txt: ISO-8859 text disabled.txt: ISO-8859 text emptyarchive.html: HTML document, ASCII text headfoot.html: HTML document, ASCII text help.txt: ISO-8859 text invite.txt: ISO-8859 text listinfo.html: HTML document, ASCII text masthead.txt: UTF-8 Unicode text newlist.txt: ISO-8859 text nomoretoday.txt: UTF-8 Unicode text options.html: HTML document, ASCII text postack.txt: ASCII text postauth.txt: ISO-8859 text postheld.txt: ISO-8859 text private.html: HTML document, ASCII text probe.txt: UTF-8 Unicode text refuse.txt: UTF-8 Unicode text roster.html: HTML document, ASCII text subauth.txt: UTF-8 Unicode text subscribeack.txt: ISO-8859 text subscribe.html: HTML document, ASCII text unsubauth.txt: ASCII text unsub.txt: ISO-8859 text userpass.txt: ISO-8859 text verify.txt: ISO-8859 text I don't quite get the code, but it looks like at least *.txt should be ISO8859-1 at the moment. Best Regards, Bernhard -- Bernhard Schmidt Netzbetrieb / IPv6 / DNSSEC Leibniz-Rechenzentrum Leibniz Supercomputing Centre Boltzmannstr. 1 D-85748 Garching b. Muenchen Tel: +49 89 35831-7885 E-Mail/Jabber: Bernhard.Schmidt at lrz.de -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5305 bytes Desc: S/MIME Cryptographic Signature URL: From mark at msapiro.net Wed Jul 13 13:17:13 2016 From: mark at msapiro.net (Mark Sapiro) Date: Wed, 13 Jul 2016 10:17:13 -0700 Subject: [Mailman-i18n] Charset of Plaintext Templates In-Reply-To: <3333a298-3a03-a63a-ef69-673b6c5f5eb9@lrz.de> References: <3333a298-3a03-a63a-ef69-673b6c5f5eb9@lrz.de> Message-ID: <57867799.8000002@msapiro.net> On 07/13/2016 03:56 AM, Bernhard Schmidt wrote: > > I've just noticed that Mailman sends wrongly encoded German mails for > admin approval of subscriptions that looks like this ... > We have a lot of different encodings even within the same language and > filetype > > mailman/templates/de (1) % file * > adminaddrchgack.txt: UTF-8 Unicode text > admindbdetails.html: HTML document, ASCII text > admindbpreamble.html: HTML document, ASCII text > admindbsummary.html: HTML document, ASCII text > adminsubscribeack.txt: ASCII text, with no line terminators > adminunsubscribeack.txt: ASCII text > admlogin.html: HTML document, ASCII text > approve.txt: ISO-8859 text > archidxentry.html: HTML document, ASCII text > archidxfoot.html: HTML document, ASCII text > archidxhead.html: HTML document, ASCII text > archlistend.html: ASCII text > archliststart.html: HTML document, ASCII text > archtocentry.html: HTML document, ASCII text > archtoc.html: HTML document, ASCII text > archtocnombox.html: HTML document, ASCII text > article.html: HTML document, ASCII text > bounce.txt: ISO-8859 text > checkdbs.txt: ISO-8859 text > convert.txt: ISO-8859 text > cronpass.txt: ISO-8859 text > disabled.txt: ISO-8859 text > emptyarchive.html: HTML document, ASCII text > headfoot.html: HTML document, ASCII text > help.txt: ISO-8859 text > invite.txt: ISO-8859 text > listinfo.html: HTML document, ASCII text > masthead.txt: UTF-8 Unicode text > newlist.txt: ISO-8859 text > nomoretoday.txt: UTF-8 Unicode text > options.html: HTML document, ASCII text > postack.txt: ASCII text > postauth.txt: ISO-8859 text > postheld.txt: ISO-8859 text > private.html: HTML document, ASCII text > probe.txt: UTF-8 Unicode text > refuse.txt: UTF-8 Unicode text > roster.html: HTML document, ASCII text > subauth.txt: UTF-8 Unicode text > subscribeack.txt: ISO-8859 text > subscribe.html: HTML document, ASCII text > unsubauth.txt: ASCII text > unsub.txt: ISO-8859 text > userpass.txt: ISO-8859 text > verify.txt: ISO-8859 text > > I don't quite get the code, but it looks like at least *.txt should be > ISO8859-1 at the moment. Thank you for the report. As you surmise, all the .txt files should be iso-8859-1 encoded, not utf-8. ASCII text is OK as that is a subset of iso-8859-1. I have reported this at and fixed it for the next release. -- Mark Sapiro The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan From Bernhard.Schmidt at lrz.de Wed Jul 13 13:23:06 2016 From: Bernhard.Schmidt at lrz.de (Bernhard Schmidt) Date: Wed, 13 Jul 2016 19:23:06 +0200 Subject: [Mailman-i18n] Charset of Plaintext Templates In-Reply-To: <57867799.8000002@msapiro.net> References: <3333a298-3a03-a63a-ef69-673b6c5f5eb9@lrz.de> <57867799.8000002@msapiro.net> Message-ID: Am 13.07.2016 um 19:17 schrieb Mark Sapiro: Hi > > Thank you for the report. As you surmise, all the .txt files should be > iso-8859-1 encoded, not utf-8. ASCII text is OK as that is a subset of > iso-8859-1. I have reported this at > and fixed it for the > next release. Thanks a lot. For my understanding, is there a per-language default charset somewhere in the code I've missed? There are several more UTF-8 .txt files in other languages, some of which cannot be represented with ISO8859 (zh_CN for example). Bernhard -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5227 bytes Desc: S/MIME Cryptographic Signature URL: From mark at msapiro.net Wed Jul 13 15:12:08 2016 From: mark at msapiro.net (Mark Sapiro) Date: Wed, 13 Jul 2016 12:12:08 -0700 Subject: [Mailman-i18n] Charset of Plaintext Templates In-Reply-To: References: <3333a298-3a03-a63a-ef69-673b6c5f5eb9@lrz.de> <57867799.8000002@msapiro.net> Message-ID: <57869288.60506@msapiro.net> On 07/13/2016 10:23 AM, Bernhard Schmidt wrote: > > Thanks a lot. For my understanding, is there a per-language default > charset somewhere in the code I've missed? There are several more UTF-8 > .txt files in other languages, some of which cannot be represented with > ISO8859 (zh_CN for example). There is a table at the end of Defaults.py which defines the supported languages and their Mailman character sets. Many languages are already utf-8 encoded. You might think that changing the character set for a language is a simple matter of just redefining the character set and recoding the message catalog and templates, but it's more complicated than that. See the thread at and the bug report at . -- Mark Sapiro The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan