[Mailman-i18n] Polish language template files

Mark Sapiro mark at msapiro.net
Tue Jan 17 18:56:40 EST 2017


On 01/17/2017 01:05 PM, Mark Dale wrote:
> Hi Mark,
> 
> I've been in touch with the Polish Language maintainer (Stefan Plewako)
> regarding the non-ascii characters that I am seeing.
> 
> In the language template files for Polish (Mailman 2.1.23) I see that
> the non-ascii characters (letters with diacritics) are replaced with
> question marks. These question marks get displayed in the Mailman web
> pages for lists using Polish.
> 
> Stefan directed me to GitHub for the Polish language files and in those
> I files I can see the Polish letters okay. (letters with diacritics)
> 
> I loaded Stefan's files into Mailman (replacing the existing) and all is
> now well. I was surprised as I was thinking that the non-ascii
> characters would need to be replaced with HTML entities - as you had
> done for the Hungarian files a couple of months ago. Stefan had advised
> me that doing that shouldn't be needed, and it seems he might be correct.


I can easily convert all the non-ascii in the Polish language templates
to html entities which is the correct way to deal with this. I have been
reluctant to do this for certain languages in the past because of the
sheer numbers of html entities involved, essentially every character in
Greek for example. Polish is not so bad, but the majority of non-ascii
characters have only numeric html entities. For example, the snippet you
quote becomes

>         <td colspan="2">
>         Wiadomości do wszystkich prenumeratorów listy wysyłaj na adres:
>           <A HREF="mailto:<MM-Posting-Addr>"><MM-Posting-Addr></A>.
> 
>           <p>Możesz zapisać się na listę lub zmienić op
> cje prenumeraty korzystając z poniższych sekcji.
>         </td>

which will render correctly in a browser that recognizes those entities
but is no more readable to humans in other contexts than the �
characters are.

The underlying issue here is Mailman's character set for Polish is
iso-8859-2. Mailman sends those web pages built from those templates with a

Content-Type: text/html; charset=iso-8859-2

header, but some web servers are configured to override that. E.g., see
<http://httpd.apache.org/docs/2.4/mod/core.html#adddefaultcharset> for a
description of the Apache directive.

Stefan's templates are UTF-8 encoded and the html templates will work in
an environment where the web server 'forces' utf-8, but the .txt
templates if utf-8 encoded will break in a Mailman whose character set
for Polish is still iso-8859-2, because they will be sent in email with

Content-Type: text/plain; charset=iso-8859-2

but with utf-8 encoded characters.

The ultimate solution is to make everything utf-8 encoded. Individual
sites can do this, but I can't for the reasons discussed at
<https://mail.python.org/pipermail/mailman-i18n/2015-February/001854.html>.

Also see see the thread "Encoding problem with 2.15 to 2.18 upgrade with
Finnish" beginning at
<https://mail.python.org/pipermail/mailman-users/2015-December/080221.html>
and continuing at
<https://mail.python.org/pipermail/mailman-users/2016-January/080275.html>
for some of the fallout after Debian arbitrarily changed the character
set for several languages to utf-8 in their Mailman package.

Bottom line is I have converted the Polish html templates to use html
entities at
<http://bazaar.launchpad.net/~mailman-coders/mailman/2.1/revision/1688>
and will install those at mail.python.org with the intent of releasing
that with 2.1.24. It should be OK, but if I get pushback from the Polish
lists on mpo, I may have to reverse.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan


More information about the Mailman-i18n mailing list