From chaks.yoper at gmail.com  Mon Jan 23 06:09:57 2006
From: chaks.yoper at gmail.com (Chakkaradeep C C)
Date: Mon, 23 Jan 2006 10:39:57 +0530
Subject: [I18n-sig] newbie hlep
Message-ID: <a54a308e0601222109w6c98168av908e2a7f68cea652@mail.gmail.com>

Hi all,

I want to have multilingual capability in my python program.At the initial
stages i want to use languages like english and german.I googled around for
help, but didnt get any help regarding this.I would be happy if anybody
could provide me with a good "Hello World" tutorial.

--
Regards,
Chaks,
Yoper Ltd.
http://www.yoper.com
http://www.yoper.com/forum
--
The main aim of communication is clarity and simplicity. Simplicity means
focussed effort.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/i18n-sig/attachments/20060123/84e95d92/attachment.html 

From jtauber at jtauber.com  Mon Jan 23 06:59:17 2006
From: jtauber at jtauber.com (James Tauber)
Date: Mon, 23 Jan 2006 00:59:17 -0500
Subject: [I18n-sig] python implementation of unicode collation algorithm
Message-ID: <8D8BE6CC-0553-42E7-9C39-6A93B254792F@jtauber.com>


I've made a start on a pure python implementation of the Unicode  
Collation Algorithm (UTS #10) but I thought I'd best check with this  
SIG whether such a thing already exists.

James
--
James Tauber                       http://jtauber.com/
journeyman of some   http://jtauber.com/blog/


From werner.bruhin at free.fr  Mon Jan 23 10:17:07 2006
From: werner.bruhin at free.fr (Werner F. Bruhin)
Date: Mon, 23 Jan 2006 10:17:07 +0100
Subject: [I18n-sig] newbie hlep
In-Reply-To: <a54a308e0601222109w6c98168av908e2a7f68cea652@mail.gmail.com>
References: <a54a308e0601222109w6c98168av908e2a7f68cea652@mail.gmail.com>
Message-ID: <dr26u2$ls3$1@sea.gmane.org>

Hi Chaks,

Chakkaradeep C C wrote:
> Hi all,
>  
> I want to have multilingual capability in my python program.At the 
> initial stages i want to use languages like english and german.I googled 
> around for help, but didnt get any help regarding this.I would be happy 
> if anybody could provide me with a good "Hello World" tutorial.
Here is a wiki page on how to do I18n with wxPython, most of it applies 
to pure Python too.

Towards the end of the page there is even a "Hello World" example.

Werner
> 
> -- 
> Regards,
> Chaks,
> Yoper Ltd.
> http://www.yoper.com
> http://www.yoper.com/forum
> --
> The main aim of communication is clarity and simplicity. Simplicity 
> means focussed effort.
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> I18n-sig mailing list
> I18n-sig at python.org
> http://mail.python.org/mailman/listinfo/i18n-sig


From mal at egenix.com  Mon Jan 23 10:49:24 2006
From: mal at egenix.com (M.-A. Lemburg)
Date: Mon, 23 Jan 2006 10:49:24 +0100
Subject: [I18n-sig] python implementation of unicode collation algorithm
In-Reply-To: <8D8BE6CC-0553-42E7-9C39-6A93B254792F@jtauber.com>
References: <8D8BE6CC-0553-42E7-9C39-6A93B254792F@jtauber.com>
Message-ID: <43D4A6A4.4000206@egenix.com>

James Tauber wrote:
> I've made a start on a pure python implementation of the Unicode  
> Collation Algorithm (UTS #10) but I thought I'd best check with this  
> SIG whether such a thing already exists.

Not that I'm aware of.

Note that given the sizes of the collation tables, it's probably
better to have them defined in a C module, rather than a Python
data structure.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 23 2006)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From jim at zope.com  Mon Jan 23 12:37:10 2006
From: jim at zope.com (Jim Fulton)
Date: Mon, 23 Jan 2006 06:37:10 -0500
Subject: [I18n-sig] python implementation of unicode collation algorithm
In-Reply-To: <8D8BE6CC-0553-42E7-9C39-6A93B254792F@jtauber.com>
References: <8D8BE6CC-0553-42E7-9C39-6A93B254792F@jtauber.com>
Message-ID: <43D4BFE6.9070803@zope.com>

James Tauber wrote:
> I've made a start on a pure python implementation of the Unicode  
> Collation Algorithm (UTS #10) but I thought I'd best check with this  
> SIG whether such a thing already exists.

I'm not aware of any pure python implementations.

I've created a pyrex-based C wrapper of the ICU collation library at:

   http://svn.zope.org/zope.ucol/trunk/

You don't need pyrex to use this and there is a distutils
setup script to install it.

I'd be happy to make an official release of this if anyone is
interested.

There is also a SWIG-based C++ wrapper of a much larger portion of the
ICU library, including collation at:

   http://pyicu.osafoundation.org/

This requires swig, hand editing of makefiles, and dynamic-library
machinations, which is in large part why I ended up writing my own
wrapper.

Jim

-- 
Jim Fulton           mailto:jim at zope.com       Python Powered!
CTO                  (540) 361-1714            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org

From werner.bruhin at free.fr  Mon Jan 23 13:40:52 2006
From: werner.bruhin at free.fr (Werner F. Bruhin)
Date: Mon, 23 Jan 2006 13:40:52 +0100
Subject: [I18n-sig] newbie hlep
In-Reply-To: <a54a308e0601222109w6c98168av908e2a7f68cea652@mail.gmail.com>
References: <a54a308e0601222109w6c98168av908e2a7f68cea652@mail.gmail.com>
Message-ID: <43D4CED4.3070202@free.fr>

And here is the link, sorry:

http://wiki.wxpython.org/index.cgi/Internationalization


Chakkaradeep C C wrote:

> Hi all,
>  
> I want to have multilingual capability in my python program.At the 
> initial stages i want to use languages like english and german.I googled 
> around for help, but didnt get any help regarding this.I would be happy 
> if anybody could provide me with a good "Hello World" tutorial.
> 
> -- 
> Regards,
> Chaks,
> Yoper Ltd.
> http://www.yoper.com
> http://www.yoper.com/forum
> --
> The main aim of communication is clarity and simplicity. Simplicity 
> means focussed effort.
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> I18n-sig mailing list
> I18n-sig at python.org
> http://mail.python.org/mailman/listinfo/i18n-sig


From jtauber at jtauber.com  Tue Jan 24 14:18:03 2006
From: jtauber at jtauber.com (James Tauber)
Date: Tue, 24 Jan 2006 08:18:03 -0500
Subject: [I18n-sig] python implementation of unicode collation algorithm
In-Reply-To: <43D4A6A4.4000206@egenix.com>
References: <8D8BE6CC-0553-42E7-9C39-6A93B254792F@jtauber.com>
	<43D4A6A4.4000206@egenix.com>
Message-ID: <EF2CCE2A-49B9-4580-A42C-4016E0C20AFA@jtauber.com>

On 23/01/2006, at 4:49 AM, M.-A. Lemburg wrote:

> James Tauber wrote:
>> I've made a start on a pure python implementation of the Unicode
>> Collation Algorithm (UTS #10) but I thought I'd best check with this
>> SIG whether such a thing already exists.
>
> Not that I'm aware of.
>
> Note that given the sizes of the collation tables, it's probably
> better to have them defined in a C module, rather than a Python
> data structure.

Yes, this is certainly true of the DUCET, although for language- 
specific collation element tables, it would be more manageable.

I'll probably start with a pure Python implementation and then take  
it from there (or let someone with better C extension experience  
optimize it)

James


From jtauber at jtauber.com  Tue Jan 24 14:19:38 2006
From: jtauber at jtauber.com (James Tauber)
Date: Tue, 24 Jan 2006 08:19:38 -0500
Subject: [I18n-sig] python implementation of unicode collation algorithm
In-Reply-To: <43D4BFE6.9070803@zope.com>
References: <8D8BE6CC-0553-42E7-9C39-6A93B254792F@jtauber.com>
	<43D4BFE6.9070803@zope.com>
Message-ID: <3F40CE87-BECA-4A74-8386-CBCB68077D6E@jtauber.com>

On 23/01/2006, at 6:37 AM, Jim Fulton wrote:

> James Tauber wrote:
>> I've made a start on a pure python implementation of the Unicode   
>> Collation Algorithm (UTS #10) but I thought I'd best check with  
>> this  SIG whether such a thing already exists.
>
> I'm not aware of any pure python implementations.
>
> I've created a pyrex-based C wrapper of the ICU collation library at:
>
>   http://svn.zope.org/zope.ucol/trunk/
>
> You don't need pyrex to use this and there is a distutils
> setup script to install it.
>
> I'd be happy to make an official release of this if anyone is
> interested.

I'd like to see an official release, even if I do end up doing a pure  
Python implementation myself.

James
--
James Tauber                       http://jtauber.com/
journeyman of some   http://jtauber.com/blog/


From jtauber at jtauber.com  Fri Jan 27 07:46:53 2006
From: jtauber at jtauber.com (James Tauber)
Date: Fri, 27 Jan 2006 01:46:53 -0500
Subject: [I18n-sig] initial python implementation of UCA available
Message-ID: <2ADE2441-DFB2-46C6-A7D5-0E406AA66EDE@jtauber.com>

See http://jtauber.com/blog/2006/01/27/ 
python_unicode_collation_algorithm

The core of the UCA is implemented and that's enough to do what I  
currently need to do (sort Ancient Greek).

There's actually more code parsing the collation element table than  
performing the actual sort key generation :-)

Let me know if you have any comments, suggestions, etc.

James
--
James Tauber                       http://jtauber.com/
journeyman of some   http://jtauber.com/blog/


From mal at egenix.com  Fri Jan 27 15:48:10 2006
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 27 Jan 2006 15:48:10 +0100
Subject: [I18n-sig] initial python implementation of UCA available
In-Reply-To: <2ADE2441-DFB2-46C6-A7D5-0E406AA66EDE@jtauber.com>
References: <2ADE2441-DFB2-46C6-A7D5-0E406AA66EDE@jtauber.com>
Message-ID: <43DA32AA.1030009@egenix.com>

James Tauber wrote:
> See http://jtauber.com/blog/2006/01/27/ 
> python_unicode_collation_algorithm
> 
> The core of the UCA is implemented and that's enough to do what I  
> currently need to do (sort Ancient Greek).
> 
> There's actually more code parsing the collation element table than  
> performing the actual sort key generation :-)
> 
> Let me know if you have any comments, suggestions, etc.

        for part in key:
            curr_node = curr_node[1].setdefault(part, [None, {}])

This could be made faster by not using .setdefault():
Python will still build the [None, {}] even if it's
not used.

In general, it's probably better to marshal the trie and
simply load the marshalled form on startup. That saves
you the setup time of having to build a trie from
a few thousand keys.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 27 2006)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From jtauber at jtauber.com  Mon Jan 30 09:35:28 2006
From: jtauber at jtauber.com (James Tauber)
Date: Mon, 30 Jan 2006 03:35:28 -0500
Subject: [I18n-sig] possible bug in my UCA implementation
Message-ID: <AF133EC6-2DCC-441C-83A6-71C1969B4706@jtauber.com>


My Python Unicode Collation Algorithm implementation is giving  
unexpected results that could be because of:

1. a bug in my code
2. a bug in the DUCET
3. a difference of opinion between the way I think Ancient Greek  
should be collated and the way DUCET thinks so

I'd like to get the opinion of some of you who are more familiar with  
UCA (and perhaps can try my example out on ICU)

For the purposes of testing, say I'm trying to sort the three words:

(1)	????
(2)	????
(3)	????

In my view they should be sorted in the reverse to what they are now,  
but my pyuca code sorts them in the order listed above.

pyuca assigns the words the following sort keys:

(1) ['0x124e', '0x0', '0x0', '0x0', '0x1252', '0x1257', '0x126a',  
'0x0', '0x20', '0x2a', '0x32', '0x97', '0x20', '0x20', '0x20', '0x0',  
'0x2', '0x2', '0x2', '0x2', '0x2', '0x2', '0x19', '0x0', '0x3b1',  
'0x314', '0x301', '0x345', '0x3b4', '0x3b7', '0x3c2']
(2) ['0x124e', '0x0', '0x0', '0x124f', '0x1253', '0x125c', '0x0',  
'0x20', '0x22', '0x32', '0x20', '0x20', '0x20', '0x0', '0x8', '0x2',  
'0x2', '0x2', '0x2', '0x2', '0x0', '0x391', '0x313', '0x301',  
'0x3b2', '0x3b5', '0x3bb']
(3) ['0x124e', '0x0', '0x124f', '0x124f', '0x124e', '0x0', '0x0',  
'0x20', '0x22', '0x20', '0x20', '0x20', '0x32', '0x0', '0x2', '0x2',  
'0x2', '0x2', '0x2', '0x2', '0x0', '0x3b1', '0x313', '0x3b2',  
'0x3b2', '0x3b1', '0x301']

The problem is that ? (the first character of (1)) expands to 4  
collation elements, ? (the first character of (2)) to 3 and ? (the  
first character of (3)) to 2 and as a result and, because all but the  
first element is zero, they are comparing less, just by virtue of  
having more collation elements.

I don't even understand why these letters are being treated as  
expansions rather than simply taking advantage of the secondary and  
tertiary levels, but sure enough that is how the DUCET describes them.

Am I missing something fundamental in the algorithm? Or is it  
possible the DUCET is wrong?

James