From che@debian.org  Mon Apr  1 00:07:31 2002
From: che@debian.org (Ben Gertzfield)
Date: Mon, 01 Apr 2002 09:07:31 +0900
Subject: [Mailman-i18n] Subject lines in Archives
In-Reply-To: <j4bsd4g2nc.fsf@informatik.hu-berlin.de> (loewis@informatik.hu-berlin.de's
 message of "31 Mar 2002 17:47:35 +0200")
References: <200203311249.g2VCnxic016571@paros.informatik.hu-berlin.de>
 <87n0wo4ycg.fsf@nausicaa.interq.or.jp>
 <j4bsd4g2nc.fsf@informatik.hu-berlin.de>
Message-ID: <87it7c8enw.fsf@nausicaa.interq.or.jp>

>>>>> "Martin" =3D=3D Martin v L=F6wis <loewis@informatik.hu-berlin.de> wri=
tes:

    Ben> They're not one-to-one; for example, ISO-2022-JP goes to
    Ben> japanese.iso-2022-jp.

    Martin> That is actually a bug in the Japanese codecs package; it
    Martin> ought to register a lookup function, instead of relying on
    Martin> the default lookup function. If that bug is not fixed,
    Martin> modifying codecs.encodings.aliases.aliases might be
    Martin> appropriate.

I believe the rationale is that when the Japanese codecs are accepted
into Python, the author did not want the older versions conflicting
with them.  I'm pretty sure the Chinese and Korean codecs are
installed in the same way.

Ben

--=20
Brought to you by the letters B and X and the number 5.
"A yonker is a young man."
Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/


From loewis@informatik.hu-berlin.de  Mon Apr  1 08:45:00 2002
From: loewis@informatik.hu-berlin.de (Martin von Loewis)
Date: Mon, 1 Apr 2002 10:45:00 +0200 (CEST)
Subject: [Mailman-i18n] Subject lines in Archives
In-Reply-To: <87it7c8enw.fsf@nausicaa.interq.or.jp> from Ben Gertzfield at "Apr 1, 2002 09:07:31 am"
Message-ID: <200204010845.g318j0aI001226@paros.informatik.hu-berlin.de>

> I believe the rationale is that when the Japanese codecs are accepted
> into Python, the author did not want the older versions conflicting
> with them.  I'm pretty sure the Chinese and Korean codecs are
> installed in the same way.

I don't think this is the rationale. I think the rationale is that the
first release wrote right into the encodings directory of Python, and people
complained about that. Then, he changed it to a separate package, and
could not figure out to make it more convenient. I think the Chinese
and Korean codecs are the same because they copied the infrastructure
from the Japanese codecs.

We are both guessing.

Regards,
Martin


From barry@zope.com  Mon Apr  1 18:26:48 2002
From: barry@zope.com (Barry A. Warsaw)
Date: Mon, 1 Apr 2002 13:26:48 -0500
Subject: [Mailman-i18n] Subject lines in Archives
References: <ADFC23EF-4491-11D6-8709-0003930418EA@mac.com>
 <200203311117.g2VBH3cw016477@paros.informatik.hu-berlin.de>
Message-ID: <15528.42600.882007.80793@anthem.wooz.org>

>>>>> "MvL" == Martin von Loewis <loewis@informatik.hu-berlin.de> writes:

    >> Is this a known bug/work in progress, or is there something I
    >> can change to fix it?

    MvL> It's a known bug; please see

    MvL> http://sourceforge.net/tracker/index.php?func=detail&aid=510415&group_id=103&atid=300103

    MvL> for a patch. I'd appreciate if you could comment on the patch
    MvL> on whether it works for you. Notice that you might have to
    MvL> regenerate the archive index.

Martin,

Thanks, this patch applies cleanly to MM2.1 cvs, so I would like to
get some feedback from you folks as to whether I should commit this.
I'm currently in the process of running these changes over a capture
of the python-list mbox file, but if anybody's got a better (read:
smaller :) sample mbox -- with lots of funky charset combinations -- I
could test this on, I'd appreciate it.

>>>>> "BG" == Ben Gertzfield <che@debian.org> writes:

    BG> Also, what do you do to map charsets to Python Unicode codecs?
    BG> They're not one-to-one; for example, ISO-2022-JP goes to
    BG> japanese.iso-2022-jp.

    MvL> That is actually a bug in the Japanese codecs package; it
    MvL> ought to register a lookup function, instead of relying on
    MvL> the default lookup function. If that bug is not fixed,
    MvL> modifying codecs.encodings.aliases.aliases might be
    MvL> appropriate.

Is there some fix we need to get applied to the Japanese codecs found
at:

    http://pseudo.grad.sccs.chukyo-u.ac.jp/~kajiyama/python/

I'm sure Tamito KAJIYAMA would be open to suggestions.  Otherwise, let
me know what I'd need to add to MM's copy of the Japanese codecs
package.

-Barry


From std@std.priv.at  Mon Apr  1 18:52:00 2002
From: std@std.priv.at (Stefan Divjak)
Date: Mon, 1 Apr 2002 20:52:00 +0200 (CEST)
Subject: [Mailman-i18n] Subject lines in Archives
In-Reply-To: <15528.42600.882007.80793@anthem.wooz.org>
Message-ID: <Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at>

On Mon, 1 Apr 2002, Barry A. Warsaw wrote:

> Thanks, this patch applies cleanly to MM2.1 cvs, so I would like to
> get some feedback from you folks as to whether I should commit this.

The patch worked fine, besides a few things which could be improved 
(Martin already gave me some answers for this):
* "windows-1257" charset unknown
* HyperArch dies when detecting an unknown charset
* Subject in "Previous" / "Next" Link not yet corrected

These broken subject-lines were quite annoying - thanks, Martin!
-- 
Stefan Divjak alias std@std.priv.at
Graz, Austria, Europe, Earth


From barry@zope.com  Mon Apr  1 18:57:19 2002
From: barry@zope.com (Barry A. Warsaw)
Date: Mon, 1 Apr 2002 13:57:19 -0500
Subject: [Mailman-i18n] Subject lines in Archives
References: <15528.42600.882007.80793@anthem.wooz.org>
 <Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at>
Message-ID: <15528.44431.221819.696341@anthem.wooz.org>

>>>>> "SD" == Stefan Divjak <std@std.priv.at> writes:

    SD> The patch worked fine, besides a few things which could be
    SD> improved (Martin already gave me some answers for this):

Thanks for the feedback.

    SD> * "windows-1257" charset unknown
    SD> * HyperArch dies when detecting an unknown charset

This has me worried.  Tracebacks are bad!  Ignoring something it
doesn't know anything about is fine.

    SD> * Subject in "Previous" / "Next" Link not yet corrected

So far so good in slurping up python-list.mbox...
-Barry


From che@debian.org  Mon Apr  1 23:46:20 2002
From: che@debian.org (Ben Gertzfield)
Date: Tue, 02 Apr 2002 08:46:20 +0900
Subject: [Mailman-i18n] Subject lines in Archives
In-Reply-To: <15528.44431.221819.696341@anthem.wooz.org> (barry@zope.com's
 message of "Mon, 1 Apr 2002 13:57:19 -0500")
References: <15528.42600.882007.80793@anthem.wooz.org>
 <Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at>
 <15528.44431.221819.696341@anthem.wooz.org>
Message-ID: <87r8lzkmnn.fsf@nausicaa.interq.or.jp>

>>>>> "BAW" == Barry A Warsaw <barry@zope.com> writes:
>>>>> "SD" == Stefan Divjak <std@std.priv.at> writes:

    SD> * "windows-1257" charset unknown * HyperArch dies when
    SD> detecting an unknown charset

    BAW> This has me worried.  Tracebacks are bad!  Ignoring something
    BAW> it doesn't know anything about is fine.

This is again the fact that many charsets have different names as a
Python Unicode codec.  It looks like all "windows-foo" charsets need
to be mapped to "cpfoo" for the Python Unicode codec.

Ben

-- 
Brought to you by the letters O and F and the number 18.
"He's like.. some sort of.. non-giving up.. school guy!"
Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/


From loewis@informatik.hu-berlin.de  Tue Apr  2 08:59:57 2002
From: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=)
Date: 02 Apr 2002 10:59:57 +0200
Subject: [Mailman-i18n] Subject lines in Archives
In-Reply-To: <15528.42600.882007.80793@anthem.wooz.org>
References: <ADFC23EF-4491-11D6-8709-0003930418EA@mac.com>
 <200203311117.g2VBH3cw016477@paros.informatik.hu-berlin.de>
 <15528.42600.882007.80793@anthem.wooz.org>
Message-ID: <j44riu5vci.fsf@informatik.hu-berlin.de>

barry@zope.com (Barry A. Warsaw) writes:

> Thanks, this patch applies cleanly to MM2.1 cvs, so I would like to
> get some feedback from you folks as to whether I should commit this.
> I'm currently in the process of running these changes over a capture
> of the python-list mbox file, but if anybody's got a better (read:
> smaller :) sample mbox -- with lots of funky charset combinations -- I
> could test this on, I'd appreciate it.

I have revised the patch on SF to fix the problems Stefan found (both
catching lookup errors, producing proper prev/next subjects, and
producing a proper <title>).

I have also collected messages with funny charsets from various
archives, and combined them to a small mailbox at

http://www.informatik.hu-berlin.de/~loewis/test.mbox

With this, you should be able to observe the following effects:

- when reading the mailbox in current mailman, the index will be
  windows-1257; there will be lots of garbage MIME text

- when applying my patch, the utf-8 and iso-8859-1 parts of it will
  become readable. Japanese and Korean text (in the name of two
  message authors) will remain obscure.

- when making available the Japanese MIME charset names, the Japanese
  name will become readable (to those which can read Japanese, that is)

- when adding the Korean codecs, the Korean name will also become
  readable

- in all cases, the subject encoded x-mvl will remain MIME garbage.

I've changed the Date: fields of all the messages, to make them appear
in a single month. Adding messages to the archive in Jan 2001 might
shift the encodings balance, so that windows-1257 loses majority. That
should have no effect on the rendering of the index.

I don't have permission from any of the message authors, so please
ignore the actual content of their messages :-)

> I'm sure Tamito KAJIYAMA would be open to suggestions.  Otherwise, let
> me know what I'd need to add to MM's copy of the Japanese codecs
> package.

I've talked to Tamito, and he said he'll change it - although it is
not clear yet in which way. It seems clear that explicit action will
be needed (unless .pth files in pythonlib are considered from site.py,
which I doubt).

Alternatively, and independently, please consider the patch 

http://sourceforge.net/tracker/?func=detail&aid=538185&group_id=103&atid=300103

It registers the common aliases for the Japanese encodings, and maps
them to the japanese package. This code could go anywhere you like,
provided that importing HyperArch triggers its execution. Notice that
this will override any existing codecs with these names (cp932,
iso-2022-jp, etc). For Mailman, I'd consider this a good thing, since
it will provide better reproducability of results.

Regards,
Martin


From loewis@informatik.hu-berlin.de  Tue Apr  2 09:01:55 2002
From: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=)
Date: 02 Apr 2002 11:01:55 +0200
Subject: [Mailman-i18n] Subject lines in Archives
In-Reply-To: <87r8lzkmnn.fsf@nausicaa.interq.or.jp>
References: <15528.42600.882007.80793@anthem.wooz.org>
 <Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at>
 <15528.44431.221819.696341@anthem.wooz.org>
 <87r8lzkmnn.fsf@nausicaa.interq.or.jp>
Message-ID: <j4y9g64gos.fsf@informatik.hu-berlin.de>

Ben Gertzfield <che@debian.org> writes:

> This is again the fact that many charsets have different names as a
> Python Unicode codec.  It looks like all "windows-foo" charsets need
> to be mapped to "cpfoo" for the Python Unicode codec.

In Python 2.3, this has happened (atleast for those known to
IANA). For mailman, it may be desirable to provide some of those
mappings even in earlier Python versions; see

http://sourceforge.net/tracker/?func=detail&aid=538185&group_id=103&atid=300103

Regards,
Martin


From che@debian.org  Tue Apr  2 09:24:05 2002
From: che@debian.org (Ben Gertzfield)
Date: Tue, 02 Apr 2002 18:24:05 +0900
Subject: [Mailman-i18n] Subject lines in Archives
In-Reply-To: <j4y9g64gos.fsf@informatik.hu-berlin.de> (loewis@informatik.hu-berlin.de's
 message of "02 Apr 2002 11:01:55 +0200")
References: <15528.42600.882007.80793@anthem.wooz.org>
 <Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at>
 <15528.44431.221819.696341@anthem.wooz.org>
 <87r8lzkmnn.fsf@nausicaa.interq.or.jp>
 <j4y9g64gos.fsf@informatik.hu-berlin.de>
Message-ID: <874riulah6.fsf@nausicaa.interq.or.jp>

>>>>> "Martin" =3D=3D Martin v L=F6wis <loewis@informatik.hu-berlin.de> wri=
tes:

    Ben> This is again the fact that many charsets have different
    Ben> names as a Python Unicode codec.  It looks like all
    Ben> "windows-foo" charsets need to be mapped to "cpfoo" for the
    Ben> Python Unicode codec.

    Martin> In Python 2.3, this has happened (atleast for those known
    Martin> to IANA). For mailman, it may be desirable to provide some
    Martin> of those mappings even in earlier Python versions; see

    Martin> http://sourceforge.net/tracker/?func=3Ddetail&aid=3D538185&grou=
p_id=3D103&atid=3D300103

Thanks for the patch, Martin.  I think we will need something
similar to this for the Korean Windows charsets, as in all the
Korean spam I get:

Content-Type: text/html; charset=3D"ks_c_5601-1987"

We will probably need some general fallback to replace completely
unknown charsets with some safe US-ASCII text.  Do you think you could
add this?  Say, something like "(text with unknown encoding)".

Ben

--=20
Brought to you by the letters N and E and the number 16.
"Bill Gates is a talented evil man."
Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/


From loewis@informatik.hu-berlin.de  Tue Apr  2 09:43:49 2002
From: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=)
Date: 02 Apr 2002 11:43:49 +0200
Subject: [Mailman-i18n] Subject lines in Archives
In-Reply-To: <874riulah6.fsf@nausicaa.interq.or.jp>
References: <15528.42600.882007.80793@anthem.wooz.org>
 <Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at>
 <15528.44431.221819.696341@anthem.wooz.org>
 <87r8lzkmnn.fsf@nausicaa.interq.or.jp>
 <j4y9g64gos.fsf@informatik.hu-berlin.de>
 <874riulah6.fsf@nausicaa.interq.or.jp>
Message-ID: <j4g02e4eqy.fsf@informatik.hu-berlin.de>

Ben Gertzfield <che@debian.org> writes:

> We will probably need some general fallback to replace completely
> unknown charsets with some safe US-ASCII text.  Do you think you could
> add this?  Say, something like "(text with unknown encoding)".

For the index, this might be a good idea. For the article, I'd prefer
if there are some traces left of the original subject. E.g. if it is
quoted-printable, you can often guess the subject from only the ASCII
parts in it - atleast for the Latin languages.

OTOH, Mailman should IMO support all widely-used encodings out of the
box; then this might not be an issue anymore.

Regards,
Martin


From che@debian.org  Tue Apr  2 12:21:19 2002
From: che@debian.org (Ben Gertzfield)
Date: Tue, 02 Apr 2002 21:21:19 +0900
Subject: [Mailman-i18n] Subject lines in Archives
In-Reply-To: <j4g02e4eqy.fsf@informatik.hu-berlin.de> (loewis@informatik.hu-berlin.de's
 message of "02 Apr 2002 11:43:49 +0200")
References: <15528.42600.882007.80793@anthem.wooz.org>
 <Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at>
 <15528.44431.221819.696341@anthem.wooz.org>
 <87r8lzkmnn.fsf@nausicaa.interq.or.jp>
 <j4y9g64gos.fsf@informatik.hu-berlin.de>
 <874riulah6.fsf@nausicaa.interq.or.jp>
 <j4g02e4eqy.fsf@informatik.hu-berlin.de>
Message-ID: <87u1qujnpc.fsf@nausicaa.interq.or.jp>

>>>>> "Martin" =3D=3D Martin v L=F6wis <loewis@informatik.hu-berlin.de> wri=
tes:

    Ben> We will probably need some general fallback to replace
    Ben> completely unknown charsets with some safe US-ASCII text.  Do
    Ben> you think you could add this?  Say, something like "(text
    Ben> with unknown encoding)".

    Martin> For the index, this might be a good idea. For the article,
    Martin> I'd prefer if there are some traces left of the original
    Martin> subject. E.g. if it is quoted-printable, you can often
    Martin> guess the subject from only the ASCII parts in it -
    Martin> atleast for the Latin languages.

Yes, I agree.  We just don't want to create files with invalid
encodings; mixing encodings in a single HTML file is a recipe
for disaster!

    Martin> OTOH, Mailman should IMO support all widely-used encodings
    Martin> out of the box; then this might not be an issue anymore.

This will happen eventually, but 2.1 will be the first release with
*any* international support, so there are bound to be a large number
of encodings we miss. (I'm thinking of all the Windows ones, here.)

Ben

--=20
Brought to you by the letters Q and P and the number 7.
"Frungy! Frungy! Frungy!!"
Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/


From loewis@informatik.hu-berlin.de  Tue Apr  2 12:41:08 2002
From: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=)
Date: 02 Apr 2002 14:41:08 +0200
Subject: [Mailman-i18n] Subject lines in Archives
In-Reply-To: <87u1qujnpc.fsf@nausicaa.interq.or.jp>
References: <15528.42600.882007.80793@anthem.wooz.org>
 <Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at>
 <15528.44431.221819.696341@anthem.wooz.org>
 <87r8lzkmnn.fsf@nausicaa.interq.or.jp>
 <j4y9g64gos.fsf@informatik.hu-berlin.de>
 <874riulah6.fsf@nausicaa.interq.or.jp>
 <j4g02e4eqy.fsf@informatik.hu-berlin.de>
 <87u1qujnpc.fsf@nausicaa.interq.or.jp>
Message-ID: <j4zo0m2ryz.fsf@informatik.hu-berlin.de>

Ben Gertzfield <che@debian.org> writes:

> Yes, I agree.  We just don't want to create files with invalid
> encodings; mixing encodings in a single HTML file is a recipe
> for disaster!

If that is your concern, then things can remain as they are (or will
be, after the patch) - it will just print the mime-encoded subject of
the original message. If the original message had non-ASCII text in
the subject that was not MIME-encoded, I still think it should be
copied as-is to the HTML - proper display will then be the task of the
Web browser.

Regards,
Martin


From che@debian.org  Tue Apr  2 12:58:39 2002
From: che@debian.org (Ben Gertzfield)
Date: Tue, 02 Apr 2002 21:58:39 +0900
Subject: [Mailman-i18n] Subject lines in Archives
In-Reply-To: <j4zo0m2ryz.fsf@informatik.hu-berlin.de> (loewis@informatik.hu-berlin.de's
 message of "02 Apr 2002 14:41:08 +0200")
References: <15528.42600.882007.80793@anthem.wooz.org>
 <Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at>
 <15528.44431.221819.696341@anthem.wooz.org>
 <87r8lzkmnn.fsf@nausicaa.interq.or.jp>
 <j4y9g64gos.fsf@informatik.hu-berlin.de>
 <874riulah6.fsf@nausicaa.interq.or.jp>
 <j4g02e4eqy.fsf@informatik.hu-berlin.de>
 <87u1qujnpc.fsf@nausicaa.interq.or.jp>
 <j4zo0m2ryz.fsf@informatik.hu-berlin.de>
Message-ID: <87lmc6jlz4.fsf@nausicaa.interq.or.jp>

>>>>> "Martin" =3D=3D Martin v L=F6wis <loewis@informatik.hu-berlin.de> wri=
tes:

    Ben> Yes, I agree.  We just don't want to create files with
    Ben> invalid encodings; mixing encodings in a single HTML file is
    Ben> a recipe for disaster!

    Martin> If that is your concern, then things can remain as they
    Martin> are (or will be, after the patch) - it will just print the
    Martin> mime-encoded subject of the original message. If the
    Martin> original message had non-ASCII text in the subject that
    Martin> was not MIME-encoded, I still think it should be copied
    Martin> as-is to the HTML - proper display will then be the task
    Martin> of the Web browser.

Unfortunately, I have to disagree.  The main problem will come
with any encoding that is modal -- like UTF-8!=20

If we copy random 8-bit non-MIME encoded text (very common these days)
into an HTML page containing UTF-8 text (let's say the majority of
posts were in UTF-8 on this list) then we will not only produce
invalid UTF-8 text, but we could quite possibly shift the user's
terminal into a garbage state from the invalid 8-bit strings, making
further display impossible.

Not everyone views these archives with a GUI web browser that contains
work-arounds for all the invalid encoded text in the world; we need to
be liberal in what we accept, but conservative in what we emit.

I love the idea of using Unicode escapes for all text that we can
convert to Unicode, but any text we can't convert just is not safe to
include verbatim.  Perhaps we should make it an option for those who
really want to include possibly dangerous text directly in the
archives?

I know I would prefer a message like "(text with unknown encoding)"
over a garbled Japanese terminal any day.

Ben

--=20
Brought to you by the letters N and M and the number 17.
"Johnny! Don't go! It's too dangerous!" "I don't care!"
Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/


From loewis@informatik.hu-berlin.de  Tue Apr  2 15:03:07 2002
From: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=)
Date: 02 Apr 2002 17:03:07 +0200
Subject: [Mailman-i18n] Subject lines in Archives
In-Reply-To: <87lmc6jlz4.fsf@nausicaa.interq.or.jp>
References: <15528.42600.882007.80793@anthem.wooz.org>
 <Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at>
 <15528.44431.221819.696341@anthem.wooz.org>
 <87r8lzkmnn.fsf@nausicaa.interq.or.jp>
 <j4y9g64gos.fsf@informatik.hu-berlin.de>
 <874riulah6.fsf@nausicaa.interq.or.jp>
 <j4g02e4eqy.fsf@informatik.hu-berlin.de>
 <87u1qujnpc.fsf@nausicaa.interq.or.jp>
 <j4zo0m2ryz.fsf@informatik.hu-berlin.de>
 <87lmc6jlz4.fsf@nausicaa.interq.or.jp>
Message-ID: <j4hemu2lec.fsf@informatik.hu-berlin.de>

Ben Gertzfield <che@debian.org> writes:

> Not everyone views these archives with a GUI web browser that contains
> work-arounds for all the invalid encoded text in the world; we need to
> be liberal in what we accept, but conservative in what we emit.

In a GUI browser, it is not at all dangerous, we appear to agree on
that. I'd claim that the majority uses GUI browsers these days, so it
is not really clear why the majority should suffer for the comfort of
a few.

> I know I would prefer a message like "(text with unknown encoding)"
> over a garbled Japanese terminal any day.

If that is really a concern to you (it is none to me, since I don't
use a web browser that may corrupt my terminal), then I think the
non-ASCII or control bytes could be qp-encoded - just supressing the
text would drop the usability of the archive.

Notice that this isn't just necessary for the subjects - it is needed
for arbitrary body text as well (and so independent from the subject
we are discussing right now).

Regards,
Martin


From che@debian.org  Tue Apr  2 15:44:40 2002
From: che@debian.org (Ben Gertzfield)
Date: Wed, 03 Apr 2002 00:44:40 +0900
Subject: [Mailman-i18n] Subject lines in Archives
In-Reply-To: <j4hemu2lec.fsf@informatik.hu-berlin.de> (loewis@informatik.hu-berlin.de's
 message of "02 Apr 2002 17:03:07 +0200")
References: <15528.42600.882007.80793@anthem.wooz.org>
 <Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at>
 <15528.44431.221819.696341@anthem.wooz.org>
 <87r8lzkmnn.fsf@nausicaa.interq.or.jp>
 <j4y9g64gos.fsf@informatik.hu-berlin.de>
 <874riulah6.fsf@nausicaa.interq.or.jp>
 <j4g02e4eqy.fsf@informatik.hu-berlin.de>
 <87u1qujnpc.fsf@nausicaa.interq.or.jp>
 <j4zo0m2ryz.fsf@informatik.hu-berlin.de>
 <87lmc6jlz4.fsf@nausicaa.interq.or.jp>
 <j4hemu2lec.fsf@informatik.hu-berlin.de>
Message-ID: <878z86jeaf.fsf@nausicaa.interq.or.jp>

>>>>> "Martin" =3D=3D Martin v L=F6wis <loewis@informatik.hu-berlin.de> wri=
tes:

    Ben> Not everyone views these archives with a GUI web browser that
    Ben> contains work-arounds for all the invalid encoded text in the
    Ben> world; we need to be liberal in what we accept, but
    Ben> conservative in what we emit.

    Martin> In a GUI browser, it is not at all dangerous, we appear to
    Martin> agree on that. I'd claim that the majority uses GUI
    Martin> browsers these days, so it is not really clear why the
    Martin> majority should suffer for the comfort of a few.

This is only because GUI browsers contain work-arounds for problems
just like this.  Why make the GUI browser programmers' lives harder?

    Ben> I know I would prefer a message like "(text with unknown
    Ben> encoding)" over a garbled Japanese terminal any day.

    Martin> If that is really a concern to you (it is none to me,
    Martin> since I don't use a web browser that may corrupt my
    Martin> terminal), then I think the non-ASCII or control bytes
    Martin> could be qp-encoded - just supressing the text would drop
    Martin> the usability of the archive.

Or we could give an option to replace text that could not be converted
to Unicode with a message, eh?  What's the harm in allowing both?

    Martin> Notice that this isn't just necessary for the subjects -
    Martin> it is needed for arbitrary body text as well (and so
    Martin> independent from the subject we are discussing right now).

Yes.

Ben

--=20
Brought to you by the letters I and J and the number 6.
"Moshimoshi. Kikoemasu ka?" "Kakenaoshimasu kara ne! 1-do kitte kudasai."
Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/


From loewis@informatik.hu-berlin.de  Tue Apr  2 16:45:44 2002
From: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=)
Date: 02 Apr 2002 18:45:44 +0200
Subject: [Mailman-i18n] Subject lines in Archives
In-Reply-To: <878z86jeaf.fsf@nausicaa.interq.or.jp>
References: <15528.42600.882007.80793@anthem.wooz.org>
 <Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at>
 <15528.44431.221819.696341@anthem.wooz.org>
 <87r8lzkmnn.fsf@nausicaa.interq.or.jp>
 <j4y9g64gos.fsf@informatik.hu-berlin.de>
 <874riulah6.fsf@nausicaa.interq.or.jp>
 <j4g02e4eqy.fsf@informatik.hu-berlin.de>
 <87u1qujnpc.fsf@nausicaa.interq.or.jp>
 <j4zo0m2ryz.fsf@informatik.hu-berlin.de>
 <87lmc6jlz4.fsf@nausicaa.interq.or.jp>
 <j4hemu2lec.fsf@informatik.hu-berlin.de>
 <878z86jeaf.fsf@nausicaa.interq.or.jp>
Message-ID: <j4u1qu122v.fsf@informatik.hu-berlin.de>

Ben Gertzfield <che@debian.org> writes:

> This is only because GUI browsers contain work-arounds for problems
> just like this.  Why make the GUI browser programmers' lives harder?

Please remember that the original problem is in the email clients
which don't properly MIME-encode non-ASCII text. Be liberal in what
you accept: we should not throw away contents just because we don't
know what charset it has. Perhaps the browser can display something
meaningful, perhaps not. If the remove the contents from the archive,
it is certain that it can't display anthing meaningful.

> Or we could give an option to replace text that could not be converted
> to Unicode with a message, eh?  What's the harm in allowing both?

Who would be controlling this option, and how? If the list admin: why
is she in a better position to make a decision than we are?

Regards,
Martin


From che@debian.org  Wed Apr  3 00:36:52 2002
From: che@debian.org (Ben Gertzfield)
Date: Wed, 03 Apr 2002 09:36:52 +0900
Subject: [Mailman-i18n] Subject lines in Archives
In-Reply-To: <j4u1qu122v.fsf@informatik.hu-berlin.de> (loewis@informatik.hu-berlin.de's
 message of "02 Apr 2002 18:45:44 +0200")
References: <15528.42600.882007.80793@anthem.wooz.org>
 <Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at>
 <15528.44431.221819.696341@anthem.wooz.org>
 <87r8lzkmnn.fsf@nausicaa.interq.or.jp>
 <j4y9g64gos.fsf@informatik.hu-berlin.de>
 <874riulah6.fsf@nausicaa.interq.or.jp>
 <j4g02e4eqy.fsf@informatik.hu-berlin.de>
 <87u1qujnpc.fsf@nausicaa.interq.or.jp>
 <j4zo0m2ryz.fsf@informatik.hu-berlin.de>
 <87lmc6jlz4.fsf@nausicaa.interq.or.jp>
 <j4hemu2lec.fsf@informatik.hu-berlin.de>
 <878z86jeaf.fsf@nausicaa.interq.or.jp>
 <j4u1qu122v.fsf@informatik.hu-berlin.de>
Message-ID: <87zo0lipnf.fsf@nausicaa.interq.or.jp>

>>>>> "Martin" =3D=3D Martin v L=F6wis <loewis@informatik.hu-berlin.de> wri=
tes:

    Ben> Or we could give an option to replace text that could not be
    Ben> converted to Unicode with a message, eh?  What's the harm in
    Ben> allowing both?

    Martin> Who would be controlling this option, and how? If the list
    Martin> admin: why is she in a better position to make a decision
    Martin> than we are?

I think the list admin should have the right to decide if they do not
wish their customers' terminals to get messed up when browsing
illegally encoded text.

Ben

--=20
Brought to you by the letters J and Z and the number 18.
"Sculch is junk."
Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/


From m.ramsch@computer.org  Wed Apr  3 17:57:53 2002
From: m.ramsch@computer.org (Martin Ramsch)
Date: Wed, 3 Apr 2002 19:57:53 +0200
Subject: [Mailman-i18n] Subject lines in Archives
References: <15528.42600.882007.80793@anthem.wooz.org><Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at><15528.44431.221819.696341@anthem.wooz.org><87r8lzkmnn.fsf@nausicaa.interq.or.jp><j4y9g64gos.fsf@informatik.hu-berlin.de><874riulah6.fsf@nausicaa.interq.or.jp><j4g02e4eqy.fsf@informatik.hu-berlin.de><87u1qujnpc.fsf@nausicaa.interq.or.jp><j4zo0m2ryz.fsf@informatik.hu-berlin.de><87lmc6jlz4.fsf@nausicaa.interq.or.jp><j4hemu2lec.fsf@informatik.hu-berlin.de><878z86jeaf.fsf@nausicaa.interq.or.jp><j4u1qu122v.fsf@informatik.hu-berlin.de> <87zo0lipnf.fsf@nausicaa.interq.or.jp>
Message-ID: <001c01c1dcd2$edfda330$e39590d4@ramsch.org>

Greetings to all!

Ben Gertzfield <che@debian.org> wrote:
> >>>>> "Martin" == Martin v L�wis <loewis@informatik.hu-berlin.de> writes:
>
>     Ben> Or we could give an option to replace text that could not be
>     Ben> converted to Unicode with a message, eh?  What's the harm in
>     Ben> allowing both?
>
>     Martin> Who would be controlling this option, and how? If the list
>     Martin> admin: why is she in a better position to make a decision
>     Martin> than we are?
>
> I think the list admin should have the right to decide if they do not
> wish their customers' terminals to get messed up when browsing
> illegally encoded text.

I followed this discussion, and strongly second Ben's opinion that an archiver
always should output correctly coded pages - no exception!

Be liberal in what you accept, but conservative in what we emit.

Martin, please re-think about it.  Only following this principle ensures
to end up with a stable problem free product!

Maybe another idea to solve the problem:

  If the charset of a message is not specified, we first might use heuristics
  to guess the encoding - in many cases this is possible.
  But if we really don't know which encoding is used, I'd prefer to replace
  this message with a _LINK_ saying "text with unknown encoding" which
  points to a seperate page showing the message in question.

  This way we only produce correctly encoded output on the main pages,
  and warn in advance where the encoding potentially might be screwed,
  but still don't leave out a bit of information.

  Future talk:  to this latter page we maybe even could add a form where
  readers can suggest which encoding should be used, and this gathered
  input could be used to finally integrate the message properly ...

Cheers,
  Martin


From duke@linux.ee  Sat Apr 20 16:12:33 2002
From: duke@linux.ee (Anti Veeranna)
Date: Sat, 20 Apr 2002 18:12:33 +0300
Subject: [Mailman-i18n] Mailman - Estonian translation
Message-ID: <20020420181233.7e1825c0.duke@linux.ee>

Hello

If now one else has volunteered to take that job before, I would like
to work on the Estonian translation/language pack for Mailman.

A little background information about me: I have previous experience in
localizing gettext based applications. I'm part of KDE's Estonian language
team, where the localization of programs the kdegames module is my
responsibility. I have also worked on unofficial translation of
Majordomo[1] and a number of other smaller programs.

Currently I am also administrator for 2 small Mailman installations,
which run 30 and 10 lists respectively.

Based on that, I believe that I have the necessary knowledge and skill for 
this job. I have a pretty clear picture of what is involved and I'm all
set up for it; all I need is an OK from you.

[1] unofficial, because it didn't have any support for gettext or
 similar internalization libraries and therefore translating it ment
 changing the code directly. 

-- 
Anti Veeranna
duke@linux.ee


From michel.guilhem@annuwave.net  Sun Apr 21 11:41:22 2002
From: michel.guilhem@annuwave.net (michel.guilhem)
Date: Sun, 21 Apr 2002 12:41:22 +0200
Subject: [Mailman-i18n] location of french translation ( and others )
Message-ID: <3CC29752.C8D8A89A@annuwave.net>

Where can y obtain the french translation ( and all the others )?
and then , the only thing that i have to do is to place 
a directory called fr ( for example ) under templates and 
modify in Defaults.py the location of the templates ?

I forgot one thong or  i am right ?

Thanks .