From che@debian.org Mon Apr 1 00:07:31 2002 From: che@debian.org (Ben Gertzfield) Date: Mon, 01 Apr 2002 09:07:31 +0900 Subject: [Mailman-i18n] Subject lines in Archives In-Reply-To: (loewis@informatik.hu-berlin.de's message of "31 Mar 2002 17:47:35 +0200") References: <200203311249.g2VCnxic016571@paros.informatik.hu-berlin.de> <87n0wo4ycg.fsf@nausicaa.interq.or.jp> Message-ID: <87it7c8enw.fsf@nausicaa.interq.or.jp> >>>>> "Martin" =3D=3D Martin v L=F6wis wri= tes: Ben> They're not one-to-one; for example, ISO-2022-JP goes to Ben> japanese.iso-2022-jp. Martin> That is actually a bug in the Japanese codecs package; it Martin> ought to register a lookup function, instead of relying on Martin> the default lookup function. If that bug is not fixed, Martin> modifying codecs.encodings.aliases.aliases might be Martin> appropriate. I believe the rationale is that when the Japanese codecs are accepted into Python, the author did not want the older versions conflicting with them. I'm pretty sure the Chinese and Korean codecs are installed in the same way. Ben --=20 Brought to you by the letters B and X and the number 5. "A yonker is a young man." Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/ From loewis@informatik.hu-berlin.de Mon Apr 1 08:45:00 2002 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 1 Apr 2002 10:45:00 +0200 (CEST) Subject: [Mailman-i18n] Subject lines in Archives In-Reply-To: <87it7c8enw.fsf@nausicaa.interq.or.jp> from Ben Gertzfield at "Apr 1, 2002 09:07:31 am" Message-ID: <200204010845.g318j0aI001226@paros.informatik.hu-berlin.de> > I believe the rationale is that when the Japanese codecs are accepted > into Python, the author did not want the older versions conflicting > with them. I'm pretty sure the Chinese and Korean codecs are > installed in the same way. I don't think this is the rationale. I think the rationale is that the first release wrote right into the encodings directory of Python, and people complained about that. Then, he changed it to a separate package, and could not figure out to make it more convenient. I think the Chinese and Korean codecs are the same because they copied the infrastructure from the Japanese codecs. We are both guessing. Regards, Martin From barry@zope.com Mon Apr 1 18:26:48 2002 From: barry@zope.com (Barry A. Warsaw) Date: Mon, 1 Apr 2002 13:26:48 -0500 Subject: [Mailman-i18n] Subject lines in Archives References: <200203311117.g2VBH3cw016477@paros.informatik.hu-berlin.de> Message-ID: <15528.42600.882007.80793@anthem.wooz.org> >>>>> "MvL" == Martin von Loewis writes: >> Is this a known bug/work in progress, or is there something I >> can change to fix it? MvL> It's a known bug; please see MvL> http://sourceforge.net/tracker/index.php?func=detail&aid=510415&group_id=103&atid=300103 MvL> for a patch. I'd appreciate if you could comment on the patch MvL> on whether it works for you. Notice that you might have to MvL> regenerate the archive index. Martin, Thanks, this patch applies cleanly to MM2.1 cvs, so I would like to get some feedback from you folks as to whether I should commit this. I'm currently in the process of running these changes over a capture of the python-list mbox file, but if anybody's got a better (read: smaller :) sample mbox -- with lots of funky charset combinations -- I could test this on, I'd appreciate it. >>>>> "BG" == Ben Gertzfield writes: BG> Also, what do you do to map charsets to Python Unicode codecs? BG> They're not one-to-one; for example, ISO-2022-JP goes to BG> japanese.iso-2022-jp. MvL> That is actually a bug in the Japanese codecs package; it MvL> ought to register a lookup function, instead of relying on MvL> the default lookup function. If that bug is not fixed, MvL> modifying codecs.encodings.aliases.aliases might be MvL> appropriate. Is there some fix we need to get applied to the Japanese codecs found at: http://pseudo.grad.sccs.chukyo-u.ac.jp/~kajiyama/python/ I'm sure Tamito KAJIYAMA would be open to suggestions. Otherwise, let me know what I'd need to add to MM's copy of the Japanese codecs package. -Barry From std@std.priv.at Mon Apr 1 18:52:00 2002 From: std@std.priv.at (Stefan Divjak) Date: Mon, 1 Apr 2002 20:52:00 +0200 (CEST) Subject: [Mailman-i18n] Subject lines in Archives In-Reply-To: <15528.42600.882007.80793@anthem.wooz.org> Message-ID: On Mon, 1 Apr 2002, Barry A. Warsaw wrote: > Thanks, this patch applies cleanly to MM2.1 cvs, so I would like to > get some feedback from you folks as to whether I should commit this. The patch worked fine, besides a few things which could be improved (Martin already gave me some answers for this): * "windows-1257" charset unknown * HyperArch dies when detecting an unknown charset * Subject in "Previous" / "Next" Link not yet corrected These broken subject-lines were quite annoying - thanks, Martin! -- Stefan Divjak alias std@std.priv.at Graz, Austria, Europe, Earth From barry@zope.com Mon Apr 1 18:57:19 2002 From: barry@zope.com (Barry A. Warsaw) Date: Mon, 1 Apr 2002 13:57:19 -0500 Subject: [Mailman-i18n] Subject lines in Archives References: <15528.42600.882007.80793@anthem.wooz.org> Message-ID: <15528.44431.221819.696341@anthem.wooz.org> >>>>> "SD" == Stefan Divjak writes: SD> The patch worked fine, besides a few things which could be SD> improved (Martin already gave me some answers for this): Thanks for the feedback. SD> * "windows-1257" charset unknown SD> * HyperArch dies when detecting an unknown charset This has me worried. Tracebacks are bad! Ignoring something it doesn't know anything about is fine. SD> * Subject in "Previous" / "Next" Link not yet corrected So far so good in slurping up python-list.mbox... -Barry From che@debian.org Mon Apr 1 23:46:20 2002 From: che@debian.org (Ben Gertzfield) Date: Tue, 02 Apr 2002 08:46:20 +0900 Subject: [Mailman-i18n] Subject lines in Archives In-Reply-To: <15528.44431.221819.696341@anthem.wooz.org> (barry@zope.com's message of "Mon, 1 Apr 2002 13:57:19 -0500") References: <15528.42600.882007.80793@anthem.wooz.org> <15528.44431.221819.696341@anthem.wooz.org> Message-ID: <87r8lzkmnn.fsf@nausicaa.interq.or.jp> >>>>> "BAW" == Barry A Warsaw writes: >>>>> "SD" == Stefan Divjak writes: SD> * "windows-1257" charset unknown * HyperArch dies when SD> detecting an unknown charset BAW> This has me worried. Tracebacks are bad! Ignoring something BAW> it doesn't know anything about is fine. This is again the fact that many charsets have different names as a Python Unicode codec. It looks like all "windows-foo" charsets need to be mapped to "cpfoo" for the Python Unicode codec. Ben -- Brought to you by the letters O and F and the number 18. "He's like.. some sort of.. non-giving up.. school guy!" Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/ From loewis@informatik.hu-berlin.de Tue Apr 2 08:59:57 2002 From: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=) Date: 02 Apr 2002 10:59:57 +0200 Subject: [Mailman-i18n] Subject lines in Archives In-Reply-To: <15528.42600.882007.80793@anthem.wooz.org> References: <200203311117.g2VBH3cw016477@paros.informatik.hu-berlin.de> <15528.42600.882007.80793@anthem.wooz.org> Message-ID: barry@zope.com (Barry A. Warsaw) writes: > Thanks, this patch applies cleanly to MM2.1 cvs, so I would like to > get some feedback from you folks as to whether I should commit this. > I'm currently in the process of running these changes over a capture > of the python-list mbox file, but if anybody's got a better (read: > smaller :) sample mbox -- with lots of funky charset combinations -- I > could test this on, I'd appreciate it. I have revised the patch on SF to fix the problems Stefan found (both catching lookup errors, producing proper prev/next subjects, and producing a proper ). I have also collected messages with funny charsets from various archives, and combined them to a small mailbox at http://www.informatik.hu-berlin.de/~loewis/test.mbox With this, you should be able to observe the following effects: - when reading the mailbox in current mailman, the index will be windows-1257; there will be lots of garbage MIME text - when applying my patch, the utf-8 and iso-8859-1 parts of it will become readable. Japanese and Korean text (in the name of two message authors) will remain obscure. - when making available the Japanese MIME charset names, the Japanese name will become readable (to those which can read Japanese, that is) - when adding the Korean codecs, the Korean name will also become readable - in all cases, the subject encoded x-mvl will remain MIME garbage. I've changed the Date: fields of all the messages, to make them appear in a single month. Adding messages to the archive in Jan 2001 might shift the encodings balance, so that windows-1257 loses majority. That should have no effect on the rendering of the index. I don't have permission from any of the message authors, so please ignore the actual content of their messages :-) > I'm sure Tamito KAJIYAMA would be open to suggestions. Otherwise, let > me know what I'd need to add to MM's copy of the Japanese codecs > package. I've talked to Tamito, and he said he'll change it - although it is not clear yet in which way. It seems clear that explicit action will be needed (unless .pth files in pythonlib are considered from site.py, which I doubt). Alternatively, and independently, please consider the patch http://sourceforge.net/tracker/?func=detail&aid=538185&group_id=103&atid=300103 It registers the common aliases for the Japanese encodings, and maps them to the japanese package. This code could go anywhere you like, provided that importing HyperArch triggers its execution. Notice that this will override any existing codecs with these names (cp932, iso-2022-jp, etc). For Mailman, I'd consider this a good thing, since it will provide better reproducability of results. Regards, Martin From loewis@informatik.hu-berlin.de Tue Apr 2 09:01:55 2002 From: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=) Date: 02 Apr 2002 11:01:55 +0200 Subject: [Mailman-i18n] Subject lines in Archives In-Reply-To: <87r8lzkmnn.fsf@nausicaa.interq.or.jp> References: <15528.42600.882007.80793@anthem.wooz.org> <Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at> <15528.44431.221819.696341@anthem.wooz.org> <87r8lzkmnn.fsf@nausicaa.interq.or.jp> Message-ID: <j4y9g64gos.fsf@informatik.hu-berlin.de> Ben Gertzfield <che@debian.org> writes: > This is again the fact that many charsets have different names as a > Python Unicode codec. It looks like all "windows-foo" charsets need > to be mapped to "cpfoo" for the Python Unicode codec. In Python 2.3, this has happened (atleast for those known to IANA). For mailman, it may be desirable to provide some of those mappings even in earlier Python versions; see http://sourceforge.net/tracker/?func=detail&aid=538185&group_id=103&atid=300103 Regards, Martin From che@debian.org Tue Apr 2 09:24:05 2002 From: che@debian.org (Ben Gertzfield) Date: Tue, 02 Apr 2002 18:24:05 +0900 Subject: [Mailman-i18n] Subject lines in Archives In-Reply-To: <j4y9g64gos.fsf@informatik.hu-berlin.de> (loewis@informatik.hu-berlin.de's message of "02 Apr 2002 11:01:55 +0200") References: <15528.42600.882007.80793@anthem.wooz.org> <Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at> <15528.44431.221819.696341@anthem.wooz.org> <87r8lzkmnn.fsf@nausicaa.interq.or.jp> <j4y9g64gos.fsf@informatik.hu-berlin.de> Message-ID: <874riulah6.fsf@nausicaa.interq.or.jp> >>>>> "Martin" =3D=3D Martin v L=F6wis <loewis@informatik.hu-berlin.de> wri= tes: Ben> This is again the fact that many charsets have different Ben> names as a Python Unicode codec. It looks like all Ben> "windows-foo" charsets need to be mapped to "cpfoo" for the Ben> Python Unicode codec. Martin> In Python 2.3, this has happened (atleast for those known Martin> to IANA). For mailman, it may be desirable to provide some Martin> of those mappings even in earlier Python versions; see Martin> http://sourceforge.net/tracker/?func=3Ddetail&aid=3D538185&grou= p_id=3D103&atid=3D300103 Thanks for the patch, Martin. I think we will need something similar to this for the Korean Windows charsets, as in all the Korean spam I get: Content-Type: text/html; charset=3D"ks_c_5601-1987" We will probably need some general fallback to replace completely unknown charsets with some safe US-ASCII text. Do you think you could add this? Say, something like "(text with unknown encoding)". Ben --=20 Brought to you by the letters N and E and the number 16. "Bill Gates is a talented evil man." Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/ From loewis@informatik.hu-berlin.de Tue Apr 2 09:43:49 2002 From: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=) Date: 02 Apr 2002 11:43:49 +0200 Subject: [Mailman-i18n] Subject lines in Archives In-Reply-To: <874riulah6.fsf@nausicaa.interq.or.jp> References: <15528.42600.882007.80793@anthem.wooz.org> <Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at> <15528.44431.221819.696341@anthem.wooz.org> <87r8lzkmnn.fsf@nausicaa.interq.or.jp> <j4y9g64gos.fsf@informatik.hu-berlin.de> <874riulah6.fsf@nausicaa.interq.or.jp> Message-ID: <j4g02e4eqy.fsf@informatik.hu-berlin.de> Ben Gertzfield <che@debian.org> writes: > We will probably need some general fallback to replace completely > unknown charsets with some safe US-ASCII text. Do you think you could > add this? Say, something like "(text with unknown encoding)". For the index, this might be a good idea. For the article, I'd prefer if there are some traces left of the original subject. E.g. if it is quoted-printable, you can often guess the subject from only the ASCII parts in it - atleast for the Latin languages. OTOH, Mailman should IMO support all widely-used encodings out of the box; then this might not be an issue anymore. Regards, Martin From che@debian.org Tue Apr 2 12:21:19 2002 From: che@debian.org (Ben Gertzfield) Date: Tue, 02 Apr 2002 21:21:19 +0900 Subject: [Mailman-i18n] Subject lines in Archives In-Reply-To: <j4g02e4eqy.fsf@informatik.hu-berlin.de> (loewis@informatik.hu-berlin.de's message of "02 Apr 2002 11:43:49 +0200") References: <15528.42600.882007.80793@anthem.wooz.org> <Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at> <15528.44431.221819.696341@anthem.wooz.org> <87r8lzkmnn.fsf@nausicaa.interq.or.jp> <j4y9g64gos.fsf@informatik.hu-berlin.de> <874riulah6.fsf@nausicaa.interq.or.jp> <j4g02e4eqy.fsf@informatik.hu-berlin.de> Message-ID: <87u1qujnpc.fsf@nausicaa.interq.or.jp> >>>>> "Martin" =3D=3D Martin v L=F6wis <loewis@informatik.hu-berlin.de> wri= tes: Ben> We will probably need some general fallback to replace Ben> completely unknown charsets with some safe US-ASCII text. Do Ben> you think you could add this? Say, something like "(text Ben> with unknown encoding)". Martin> For the index, this might be a good idea. For the article, Martin> I'd prefer if there are some traces left of the original Martin> subject. E.g. if it is quoted-printable, you can often Martin> guess the subject from only the ASCII parts in it - Martin> atleast for the Latin languages. Yes, I agree. We just don't want to create files with invalid encodings; mixing encodings in a single HTML file is a recipe for disaster! Martin> OTOH, Mailman should IMO support all widely-used encodings Martin> out of the box; then this might not be an issue anymore. This will happen eventually, but 2.1 will be the first release with *any* international support, so there are bound to be a large number of encodings we miss. (I'm thinking of all the Windows ones, here.) Ben --=20 Brought to you by the letters Q and P and the number 7. "Frungy! Frungy! Frungy!!" Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/ From loewis@informatik.hu-berlin.de Tue Apr 2 12:41:08 2002 From: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=) Date: 02 Apr 2002 14:41:08 +0200 Subject: [Mailman-i18n] Subject lines in Archives In-Reply-To: <87u1qujnpc.fsf@nausicaa.interq.or.jp> References: <15528.42600.882007.80793@anthem.wooz.org> <Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at> <15528.44431.221819.696341@anthem.wooz.org> <87r8lzkmnn.fsf@nausicaa.interq.or.jp> <j4y9g64gos.fsf@informatik.hu-berlin.de> <874riulah6.fsf@nausicaa.interq.or.jp> <j4g02e4eqy.fsf@informatik.hu-berlin.de> <87u1qujnpc.fsf@nausicaa.interq.or.jp> Message-ID: <j4zo0m2ryz.fsf@informatik.hu-berlin.de> Ben Gertzfield <che@debian.org> writes: > Yes, I agree. We just don't want to create files with invalid > encodings; mixing encodings in a single HTML file is a recipe > for disaster! If that is your concern, then things can remain as they are (or will be, after the patch) - it will just print the mime-encoded subject of the original message. If the original message had non-ASCII text in the subject that was not MIME-encoded, I still think it should be copied as-is to the HTML - proper display will then be the task of the Web browser. Regards, Martin From che@debian.org Tue Apr 2 12:58:39 2002 From: che@debian.org (Ben Gertzfield) Date: Tue, 02 Apr 2002 21:58:39 +0900 Subject: [Mailman-i18n] Subject lines in Archives In-Reply-To: <j4zo0m2ryz.fsf@informatik.hu-berlin.de> (loewis@informatik.hu-berlin.de's message of "02 Apr 2002 14:41:08 +0200") References: <15528.42600.882007.80793@anthem.wooz.org> <Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at> <15528.44431.221819.696341@anthem.wooz.org> <87r8lzkmnn.fsf@nausicaa.interq.or.jp> <j4y9g64gos.fsf@informatik.hu-berlin.de> <874riulah6.fsf@nausicaa.interq.or.jp> <j4g02e4eqy.fsf@informatik.hu-berlin.de> <87u1qujnpc.fsf@nausicaa.interq.or.jp> <j4zo0m2ryz.fsf@informatik.hu-berlin.de> Message-ID: <87lmc6jlz4.fsf@nausicaa.interq.or.jp> >>>>> "Martin" =3D=3D Martin v L=F6wis <loewis@informatik.hu-berlin.de> wri= tes: Ben> Yes, I agree. We just don't want to create files with Ben> invalid encodings; mixing encodings in a single HTML file is Ben> a recipe for disaster! Martin> If that is your concern, then things can remain as they Martin> are (or will be, after the patch) - it will just print the Martin> mime-encoded subject of the original message. If the Martin> original message had non-ASCII text in the subject that Martin> was not MIME-encoded, I still think it should be copied Martin> as-is to the HTML - proper display will then be the task Martin> of the Web browser. Unfortunately, I have to disagree. The main problem will come with any encoding that is modal -- like UTF-8!=20 If we copy random 8-bit non-MIME encoded text (very common these days) into an HTML page containing UTF-8 text (let's say the majority of posts were in UTF-8 on this list) then we will not only produce invalid UTF-8 text, but we could quite possibly shift the user's terminal into a garbage state from the invalid 8-bit strings, making further display impossible. Not everyone views these archives with a GUI web browser that contains work-arounds for all the invalid encoded text in the world; we need to be liberal in what we accept, but conservative in what we emit. I love the idea of using Unicode escapes for all text that we can convert to Unicode, but any text we can't convert just is not safe to include verbatim. Perhaps we should make it an option for those who really want to include possibly dangerous text directly in the archives? I know I would prefer a message like "(text with unknown encoding)" over a garbled Japanese terminal any day. Ben --=20 Brought to you by the letters N and M and the number 17. "Johnny! Don't go! It's too dangerous!" "I don't care!" Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/ From loewis@informatik.hu-berlin.de Tue Apr 2 15:03:07 2002 From: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=) Date: 02 Apr 2002 17:03:07 +0200 Subject: [Mailman-i18n] Subject lines in Archives In-Reply-To: <87lmc6jlz4.fsf@nausicaa.interq.or.jp> References: <15528.42600.882007.80793@anthem.wooz.org> <Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at> <15528.44431.221819.696341@anthem.wooz.org> <87r8lzkmnn.fsf@nausicaa.interq.or.jp> <j4y9g64gos.fsf@informatik.hu-berlin.de> <874riulah6.fsf@nausicaa.interq.or.jp> <j4g02e4eqy.fsf@informatik.hu-berlin.de> <87u1qujnpc.fsf@nausicaa.interq.or.jp> <j4zo0m2ryz.fsf@informatik.hu-berlin.de> <87lmc6jlz4.fsf@nausicaa.interq.or.jp> Message-ID: <j4hemu2lec.fsf@informatik.hu-berlin.de> Ben Gertzfield <che@debian.org> writes: > Not everyone views these archives with a GUI web browser that contains > work-arounds for all the invalid encoded text in the world; we need to > be liberal in what we accept, but conservative in what we emit. In a GUI browser, it is not at all dangerous, we appear to agree on that. I'd claim that the majority uses GUI browsers these days, so it is not really clear why the majority should suffer for the comfort of a few. > I know I would prefer a message like "(text with unknown encoding)" > over a garbled Japanese terminal any day. If that is really a concern to you (it is none to me, since I don't use a web browser that may corrupt my terminal), then I think the non-ASCII or control bytes could be qp-encoded - just supressing the text would drop the usability of the archive. Notice that this isn't just necessary for the subjects - it is needed for arbitrary body text as well (and so independent from the subject we are discussing right now). Regards, Martin From che@debian.org Tue Apr 2 15:44:40 2002 From: che@debian.org (Ben Gertzfield) Date: Wed, 03 Apr 2002 00:44:40 +0900 Subject: [Mailman-i18n] Subject lines in Archives In-Reply-To: <j4hemu2lec.fsf@informatik.hu-berlin.de> (loewis@informatik.hu-berlin.de's message of "02 Apr 2002 17:03:07 +0200") References: <15528.42600.882007.80793@anthem.wooz.org> <Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at> <15528.44431.221819.696341@anthem.wooz.org> <87r8lzkmnn.fsf@nausicaa.interq.or.jp> <j4y9g64gos.fsf@informatik.hu-berlin.de> <874riulah6.fsf@nausicaa.interq.or.jp> <j4g02e4eqy.fsf@informatik.hu-berlin.de> <87u1qujnpc.fsf@nausicaa.interq.or.jp> <j4zo0m2ryz.fsf@informatik.hu-berlin.de> <87lmc6jlz4.fsf@nausicaa.interq.or.jp> <j4hemu2lec.fsf@informatik.hu-berlin.de> Message-ID: <878z86jeaf.fsf@nausicaa.interq.or.jp> >>>>> "Martin" =3D=3D Martin v L=F6wis <loewis@informatik.hu-berlin.de> wri= tes: Ben> Not everyone views these archives with a GUI web browser that Ben> contains work-arounds for all the invalid encoded text in the Ben> world; we need to be liberal in what we accept, but Ben> conservative in what we emit. Martin> In a GUI browser, it is not at all dangerous, we appear to Martin> agree on that. I'd claim that the majority uses GUI Martin> browsers these days, so it is not really clear why the Martin> majority should suffer for the comfort of a few. This is only because GUI browsers contain work-arounds for problems just like this. Why make the GUI browser programmers' lives harder? Ben> I know I would prefer a message like "(text with unknown Ben> encoding)" over a garbled Japanese terminal any day. Martin> If that is really a concern to you (it is none to me, Martin> since I don't use a web browser that may corrupt my Martin> terminal), then I think the non-ASCII or control bytes Martin> could be qp-encoded - just supressing the text would drop Martin> the usability of the archive. Or we could give an option to replace text that could not be converted to Unicode with a message, eh? What's the harm in allowing both? Martin> Notice that this isn't just necessary for the subjects - Martin> it is needed for arbitrary body text as well (and so Martin> independent from the subject we are discussing right now). Yes. Ben --=20 Brought to you by the letters I and J and the number 6. "Moshimoshi. Kikoemasu ka?" "Kakenaoshimasu kara ne! 1-do kitte kudasai." Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/ From loewis@informatik.hu-berlin.de Tue Apr 2 16:45:44 2002 From: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=) Date: 02 Apr 2002 18:45:44 +0200 Subject: [Mailman-i18n] Subject lines in Archives In-Reply-To: <878z86jeaf.fsf@nausicaa.interq.or.jp> References: <15528.42600.882007.80793@anthem.wooz.org> <Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at> <15528.44431.221819.696341@anthem.wooz.org> <87r8lzkmnn.fsf@nausicaa.interq.or.jp> <j4y9g64gos.fsf@informatik.hu-berlin.de> <874riulah6.fsf@nausicaa.interq.or.jp> <j4g02e4eqy.fsf@informatik.hu-berlin.de> <87u1qujnpc.fsf@nausicaa.interq.or.jp> <j4zo0m2ryz.fsf@informatik.hu-berlin.de> <87lmc6jlz4.fsf@nausicaa.interq.or.jp> <j4hemu2lec.fsf@informatik.hu-berlin.de> <878z86jeaf.fsf@nausicaa.interq.or.jp> Message-ID: <j4u1qu122v.fsf@informatik.hu-berlin.de> Ben Gertzfield <che@debian.org> writes: > This is only because GUI browsers contain work-arounds for problems > just like this. Why make the GUI browser programmers' lives harder? Please remember that the original problem is in the email clients which don't properly MIME-encode non-ASCII text. Be liberal in what you accept: we should not throw away contents just because we don't know what charset it has. Perhaps the browser can display something meaningful, perhaps not. If the remove the contents from the archive, it is certain that it can't display anthing meaningful. > Or we could give an option to replace text that could not be converted > to Unicode with a message, eh? What's the harm in allowing both? Who would be controlling this option, and how? If the list admin: why is she in a better position to make a decision than we are? Regards, Martin From che@debian.org Wed Apr 3 00:36:52 2002 From: che@debian.org (Ben Gertzfield) Date: Wed, 03 Apr 2002 09:36:52 +0900 Subject: [Mailman-i18n] Subject lines in Archives In-Reply-To: <j4u1qu122v.fsf@informatik.hu-berlin.de> (loewis@informatik.hu-berlin.de's message of "02 Apr 2002 18:45:44 +0200") References: <15528.42600.882007.80793@anthem.wooz.org> <Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at> <15528.44431.221819.696341@anthem.wooz.org> <87r8lzkmnn.fsf@nausicaa.interq.or.jp> <j4y9g64gos.fsf@informatik.hu-berlin.de> <874riulah6.fsf@nausicaa.interq.or.jp> <j4g02e4eqy.fsf@informatik.hu-berlin.de> <87u1qujnpc.fsf@nausicaa.interq.or.jp> <j4zo0m2ryz.fsf@informatik.hu-berlin.de> <87lmc6jlz4.fsf@nausicaa.interq.or.jp> <j4hemu2lec.fsf@informatik.hu-berlin.de> <878z86jeaf.fsf@nausicaa.interq.or.jp> <j4u1qu122v.fsf@informatik.hu-berlin.de> Message-ID: <87zo0lipnf.fsf@nausicaa.interq.or.jp> >>>>> "Martin" =3D=3D Martin v L=F6wis <loewis@informatik.hu-berlin.de> wri= tes: Ben> Or we could give an option to replace text that could not be Ben> converted to Unicode with a message, eh? What's the harm in Ben> allowing both? Martin> Who would be controlling this option, and how? If the list Martin> admin: why is she in a better position to make a decision Martin> than we are? I think the list admin should have the right to decide if they do not wish their customers' terminals to get messed up when browsing illegally encoded text. Ben --=20 Brought to you by the letters J and Z and the number 18. "Sculch is junk." Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/ From m.ramsch@computer.org Wed Apr 3 17:57:53 2002 From: m.ramsch@computer.org (Martin Ramsch) Date: Wed, 3 Apr 2002 19:57:53 +0200 Subject: [Mailman-i18n] Subject lines in Archives References: <15528.42600.882007.80793@anthem.wooz.org><Pine.LNX.4.33.0204012035160.13275-100000@mail.std.priv.at><15528.44431.221819.696341@anthem.wooz.org><87r8lzkmnn.fsf@nausicaa.interq.or.jp><j4y9g64gos.fsf@informatik.hu-berlin.de><874riulah6.fsf@nausicaa.interq.or.jp><j4g02e4eqy.fsf@informatik.hu-berlin.de><87u1qujnpc.fsf@nausicaa.interq.or.jp><j4zo0m2ryz.fsf@informatik.hu-berlin.de><87lmc6jlz4.fsf@nausicaa.interq.or.jp><j4hemu2lec.fsf@informatik.hu-berlin.de><878z86jeaf.fsf@nausicaa.interq.or.jp><j4u1qu122v.fsf@informatik.hu-berlin.de> <87zo0lipnf.fsf@nausicaa.interq.or.jp> Message-ID: <001c01c1dcd2$edfda330$e39590d4@ramsch.org> Greetings to all! Ben Gertzfield <che@debian.org> wrote: > >>>>> "Martin" == Martin v Löwis <loewis@informatik.hu-berlin.de> writes: > > Ben> Or we could give an option to replace text that could not be > Ben> converted to Unicode with a message, eh? What's the harm in > Ben> allowing both? > > Martin> Who would be controlling this option, and how? If the list > Martin> admin: why is she in a better position to make a decision > Martin> than we are? > > I think the list admin should have the right to decide if they do not > wish their customers' terminals to get messed up when browsing > illegally encoded text. I followed this discussion, and strongly second Ben's opinion that an archiver always should output correctly coded pages - no exception! Be liberal in what you accept, but conservative in what we emit. Martin, please re-think about it. Only following this principle ensures to end up with a stable problem free product! Maybe another idea to solve the problem: If the charset of a message is not specified, we first might use heuristics to guess the encoding - in many cases this is possible. But if we really don't know which encoding is used, I'd prefer to replace this message with a _LINK_ saying "text with unknown encoding" which points to a seperate page showing the message in question. This way we only produce correctly encoded output on the main pages, and warn in advance where the encoding potentially might be screwed, but still don't leave out a bit of information. Future talk: to this latter page we maybe even could add a form where readers can suggest which encoding should be used, and this gathered input could be used to finally integrate the message properly ... Cheers, Martin From duke@linux.ee Sat Apr 20 16:12:33 2002 From: duke@linux.ee (Anti Veeranna) Date: Sat, 20 Apr 2002 18:12:33 +0300 Subject: [Mailman-i18n] Mailman - Estonian translation Message-ID: <20020420181233.7e1825c0.duke@linux.ee> Hello If now one else has volunteered to take that job before, I would like to work on the Estonian translation/language pack for Mailman. A little background information about me: I have previous experience in localizing gettext based applications. I'm part of KDE's Estonian language team, where the localization of programs the kdegames module is my responsibility. I have also worked on unofficial translation of Majordomo[1] and a number of other smaller programs. Currently I am also administrator for 2 small Mailman installations, which run 30 and 10 lists respectively. Based on that, I believe that I have the necessary knowledge and skill for this job. I have a pretty clear picture of what is involved and I'm all set up for it; all I need is an OK from you. [1] unofficial, because it didn't have any support for gettext or similar internalization libraries and therefore translating it ment changing the code directly. -- Anti Veeranna duke@linux.ee From michel.guilhem@annuwave.net Sun Apr 21 11:41:22 2002 From: michel.guilhem@annuwave.net (michel.guilhem) Date: Sun, 21 Apr 2002 12:41:22 +0200 Subject: [Mailman-i18n] location of french translation ( and others ) Message-ID: <3CC29752.C8D8A89A@annuwave.net> Where can y obtain the french translation ( and all the others )? and then , the only thing that i have to do is to place a directory called fr ( for example ) under templates and modify in Defaults.py the location of the templates ? I forgot one thong or i am right ? Thanks .