From chaks.yoper at gmail.com Mon Jan 23 06:09:57 2006 From: chaks.yoper at gmail.com (Chakkaradeep C C) Date: Mon, 23 Jan 2006 10:39:57 +0530 Subject: [I18n-sig] newbie hlep Message-ID: Hi all, I want to have multilingual capability in my python program.At the initial stages i want to use languages like english and german.I googled around for help, but didnt get any help regarding this.I would be happy if anybody could provide me with a good "Hello World" tutorial. -- Regards, Chaks, Yoper Ltd. http://www.yoper.com http://www.yoper.com/forum -- The main aim of communication is clarity and simplicity. Simplicity means focussed effort. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/i18n-sig/attachments/20060123/84e95d92/attachment.html From jtauber at jtauber.com Mon Jan 23 06:59:17 2006 From: jtauber at jtauber.com (James Tauber) Date: Mon, 23 Jan 2006 00:59:17 -0500 Subject: [I18n-sig] python implementation of unicode collation algorithm Message-ID: <8D8BE6CC-0553-42E7-9C39-6A93B254792F@jtauber.com> I've made a start on a pure python implementation of the Unicode Collation Algorithm (UTS #10) but I thought I'd best check with this SIG whether such a thing already exists. James -- James Tauber http://jtauber.com/ journeyman of some http://jtauber.com/blog/ From werner.bruhin at free.fr Mon Jan 23 10:17:07 2006 From: werner.bruhin at free.fr (Werner F. Bruhin) Date: Mon, 23 Jan 2006 10:17:07 +0100 Subject: [I18n-sig] newbie hlep In-Reply-To: References: Message-ID: Hi Chaks, Chakkaradeep C C wrote: > Hi all, > > I want to have multilingual capability in my python program.At the > initial stages i want to use languages like english and german.I googled > around for help, but didnt get any help regarding this.I would be happy > if anybody could provide me with a good "Hello World" tutorial. Here is a wiki page on how to do I18n with wxPython, most of it applies to pure Python too. Towards the end of the page there is even a "Hello World" example. Werner > > -- > Regards, > Chaks, > Yoper Ltd. > http://www.yoper.com > http://www.yoper.com/forum > -- > The main aim of communication is clarity and simplicity. Simplicity > means focussed effort. > > > ------------------------------------------------------------------------ > > _______________________________________________ > I18n-sig mailing list > I18n-sig at python.org > http://mail.python.org/mailman/listinfo/i18n-sig From mal at egenix.com Mon Jan 23 10:49:24 2006 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 23 Jan 2006 10:49:24 +0100 Subject: [I18n-sig] python implementation of unicode collation algorithm In-Reply-To: <8D8BE6CC-0553-42E7-9C39-6A93B254792F@jtauber.com> References: <8D8BE6CC-0553-42E7-9C39-6A93B254792F@jtauber.com> Message-ID: <43D4A6A4.4000206@egenix.com> James Tauber wrote: > I've made a start on a pure python implementation of the Unicode > Collation Algorithm (UTS #10) but I thought I'd best check with this > SIG whether such a thing already exists. Not that I'm aware of. Note that given the sizes of the collation tables, it's probably better to have them defined in a C module, rather than a Python data structure. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 23 2006) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From jim at zope.com Mon Jan 23 12:37:10 2006 From: jim at zope.com (Jim Fulton) Date: Mon, 23 Jan 2006 06:37:10 -0500 Subject: [I18n-sig] python implementation of unicode collation algorithm In-Reply-To: <8D8BE6CC-0553-42E7-9C39-6A93B254792F@jtauber.com> References: <8D8BE6CC-0553-42E7-9C39-6A93B254792F@jtauber.com> Message-ID: <43D4BFE6.9070803@zope.com> James Tauber wrote: > I've made a start on a pure python implementation of the Unicode > Collation Algorithm (UTS #10) but I thought I'd best check with this > SIG whether such a thing already exists. I'm not aware of any pure python implementations. I've created a pyrex-based C wrapper of the ICU collation library at: http://svn.zope.org/zope.ucol/trunk/ You don't need pyrex to use this and there is a distutils setup script to install it. I'd be happy to make an official release of this if anyone is interested. There is also a SWIG-based C++ wrapper of a much larger portion of the ICU library, including collation at: http://pyicu.osafoundation.org/ This requires swig, hand editing of makefiles, and dynamic-library machinations, which is in large part why I ended up writing my own wrapper. Jim -- Jim Fulton mailto:jim at zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From werner.bruhin at free.fr Mon Jan 23 13:40:52 2006 From: werner.bruhin at free.fr (Werner F. Bruhin) Date: Mon, 23 Jan 2006 13:40:52 +0100 Subject: [I18n-sig] newbie hlep In-Reply-To: References: Message-ID: <43D4CED4.3070202@free.fr> And here is the link, sorry: http://wiki.wxpython.org/index.cgi/Internationalization Chakkaradeep C C wrote: > Hi all, > > I want to have multilingual capability in my python program.At the > initial stages i want to use languages like english and german.I googled > around for help, but didnt get any help regarding this.I would be happy > if anybody could provide me with a good "Hello World" tutorial. > > -- > Regards, > Chaks, > Yoper Ltd. > http://www.yoper.com > http://www.yoper.com/forum > -- > The main aim of communication is clarity and simplicity. Simplicity > means focussed effort. > > > ------------------------------------------------------------------------ > > _______________________________________________ > I18n-sig mailing list > I18n-sig at python.org > http://mail.python.org/mailman/listinfo/i18n-sig From jtauber at jtauber.com Tue Jan 24 14:18:03 2006 From: jtauber at jtauber.com (James Tauber) Date: Tue, 24 Jan 2006 08:18:03 -0500 Subject: [I18n-sig] python implementation of unicode collation algorithm In-Reply-To: <43D4A6A4.4000206@egenix.com> References: <8D8BE6CC-0553-42E7-9C39-6A93B254792F@jtauber.com> <43D4A6A4.4000206@egenix.com> Message-ID: On 23/01/2006, at 4:49 AM, M.-A. Lemburg wrote: > James Tauber wrote: >> I've made a start on a pure python implementation of the Unicode >> Collation Algorithm (UTS #10) but I thought I'd best check with this >> SIG whether such a thing already exists. > > Not that I'm aware of. > > Note that given the sizes of the collation tables, it's probably > better to have them defined in a C module, rather than a Python > data structure. Yes, this is certainly true of the DUCET, although for language- specific collation element tables, it would be more manageable. I'll probably start with a pure Python implementation and then take it from there (or let someone with better C extension experience optimize it) James From jtauber at jtauber.com Tue Jan 24 14:19:38 2006 From: jtauber at jtauber.com (James Tauber) Date: Tue, 24 Jan 2006 08:19:38 -0500 Subject: [I18n-sig] python implementation of unicode collation algorithm In-Reply-To: <43D4BFE6.9070803@zope.com> References: <8D8BE6CC-0553-42E7-9C39-6A93B254792F@jtauber.com> <43D4BFE6.9070803@zope.com> Message-ID: <3F40CE87-BECA-4A74-8386-CBCB68077D6E@jtauber.com> On 23/01/2006, at 6:37 AM, Jim Fulton wrote: > James Tauber wrote: >> I've made a start on a pure python implementation of the Unicode >> Collation Algorithm (UTS #10) but I thought I'd best check with >> this SIG whether such a thing already exists. > > I'm not aware of any pure python implementations. > > I've created a pyrex-based C wrapper of the ICU collation library at: > > http://svn.zope.org/zope.ucol/trunk/ > > You don't need pyrex to use this and there is a distutils > setup script to install it. > > I'd be happy to make an official release of this if anyone is > interested. I'd like to see an official release, even if I do end up doing a pure Python implementation myself. James -- James Tauber http://jtauber.com/ journeyman of some http://jtauber.com/blog/ From jtauber at jtauber.com Fri Jan 27 07:46:53 2006 From: jtauber at jtauber.com (James Tauber) Date: Fri, 27 Jan 2006 01:46:53 -0500 Subject: [I18n-sig] initial python implementation of UCA available Message-ID: <2ADE2441-DFB2-46C6-A7D5-0E406AA66EDE@jtauber.com> See http://jtauber.com/blog/2006/01/27/ python_unicode_collation_algorithm The core of the UCA is implemented and that's enough to do what I currently need to do (sort Ancient Greek). There's actually more code parsing the collation element table than performing the actual sort key generation :-) Let me know if you have any comments, suggestions, etc. James -- James Tauber http://jtauber.com/ journeyman of some http://jtauber.com/blog/ From mal at egenix.com Fri Jan 27 15:48:10 2006 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 27 Jan 2006 15:48:10 +0100 Subject: [I18n-sig] initial python implementation of UCA available In-Reply-To: <2ADE2441-DFB2-46C6-A7D5-0E406AA66EDE@jtauber.com> References: <2ADE2441-DFB2-46C6-A7D5-0E406AA66EDE@jtauber.com> Message-ID: <43DA32AA.1030009@egenix.com> James Tauber wrote: > See http://jtauber.com/blog/2006/01/27/ > python_unicode_collation_algorithm > > The core of the UCA is implemented and that's enough to do what I > currently need to do (sort Ancient Greek). > > There's actually more code parsing the collation element table than > performing the actual sort key generation :-) > > Let me know if you have any comments, suggestions, etc. for part in key: curr_node = curr_node[1].setdefault(part, [None, {}]) This could be made faster by not using .setdefault(): Python will still build the [None, {}] even if it's not used. In general, it's probably better to marshal the trie and simply load the marshalled form on startup. That saves you the setup time of having to build a trie from a few thousand keys. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 27 2006) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From jtauber at jtauber.com Mon Jan 30 09:35:28 2006 From: jtauber at jtauber.com (James Tauber) Date: Mon, 30 Jan 2006 03:35:28 -0500 Subject: [I18n-sig] possible bug in my UCA implementation Message-ID: My Python Unicode Collation Algorithm implementation is giving unexpected results that could be because of: 1. a bug in my code 2. a bug in the DUCET 3. a difference of opinion between the way I think Ancient Greek should be collated and the way DUCET thinks so I'd like to get the opinion of some of you who are more familiar with UCA (and perhaps can try my example out on ICU) For the purposes of testing, say I'm trying to sort the three words: (1) ???? (2) ???? (3) ???? In my view they should be sorted in the reverse to what they are now, but my pyuca code sorts them in the order listed above. pyuca assigns the words the following sort keys: (1) ['0x124e', '0x0', '0x0', '0x0', '0x1252', '0x1257', '0x126a', '0x0', '0x20', '0x2a', '0x32', '0x97', '0x20', '0x20', '0x20', '0x0', '0x2', '0x2', '0x2', '0x2', '0x2', '0x2', '0x19', '0x0', '0x3b1', '0x314', '0x301', '0x345', '0x3b4', '0x3b7', '0x3c2'] (2) ['0x124e', '0x0', '0x0', '0x124f', '0x1253', '0x125c', '0x0', '0x20', '0x22', '0x32', '0x20', '0x20', '0x20', '0x0', '0x8', '0x2', '0x2', '0x2', '0x2', '0x2', '0x0', '0x391', '0x313', '0x301', '0x3b2', '0x3b5', '0x3bb'] (3) ['0x124e', '0x0', '0x124f', '0x124f', '0x124e', '0x0', '0x0', '0x20', '0x22', '0x20', '0x20', '0x20', '0x32', '0x0', '0x2', '0x2', '0x2', '0x2', '0x2', '0x2', '0x0', '0x3b1', '0x313', '0x3b2', '0x3b2', '0x3b1', '0x301'] The problem is that ? (the first character of (1)) expands to 4 collation elements, ? (the first character of (2)) to 3 and ? (the first character of (3)) to 2 and as a result and, because all but the first element is zero, they are comparing less, just by virtue of having more collation elements. I don't even understand why these letters are being treated as expansions rather than simply taking advantage of the secondary and tertiary levels, but sure enough that is how the DUCET describes them. Am I missing something fundamental in the algorithm? Or is it possible the DUCET is wrong? James