From stephan.richter@tufts.edu  Tue Apr  1 02:48:23 2003
From: stephan.richter@tufts.edu (Stephan Richter)
Date: Mon, 31 Mar 2003 21:48:23 -0500
Subject: [I18n-sig] gettext tutorial
In-Reply-To: <3E83057F.9020004@zope.com>
References: <3E8138AA.2020307@canada.com> <3E83057F.9020004@zope.com>
Message-ID: <200303312148.23441.stephan.richter@tufts.edu>

On Thursday 27 March 2003 09:06, Jim Fulton wrote:
> FWIW, I think Stephan Richter has already written such an app for
> Zope.
>
> Jim
>
> Anmar Oueja wrote:
> > Hello All:
> >
> > I am working on a web application (written in python of course) that
> > will display a po file and allow people to translate these po files
> > using this web app.

See the Zope 3 gettext PO files import/export filters at 
http://cvs.zope.org/Zope3/src/zope/app/services/translation/filters.py?rev=1.2&content-type=text/vnd.viewcvs-markup

Once you have a dictionary, you can do whatever you want. Note of course that 
the local translation service does much more already; it can also synchronize 
message catalogs...

Regards,
Stephan
-- 
Stephan Richter
CBU Physics & Chemistry (B.S.) / Tufts Physics (Ph.D. student)
Web2k - Web Software Design, Development and Training


From duerst@w3.org  Thu Apr 10 20:44:52 2003
From: duerst@w3.org (Martin Duerst)
Date: Thu, 10 Apr 2003 15:44:52 -0400
Subject: [I18n-sig] IUC24 Call for Papers - September 2003 - Atlanta, Georgia
Message-ID: <4.2.0.58.J.20030410153814.0607d8e0@localhost>

Hi folks,

Yes, it is time to get thinking about the September Unicode conference in
Atlanta. Please see the information below and check out the website for
conference themes and suggested topics for papers.  Submissions are due
May 2.

Thank you,   Martin.

 >>>>>>>>>>>>>>>>>>>>>>>>>>  Call for Papers!  <<<<<<<<<<<<<<<<<<<<<<<<<

     Twenty-fourth Internationalization and Unicode Conference (IUC24)
      Unicode, Internationalization, the Web: Powering Global Business

                          See Call for Papers at:
                http://www.unicode.org/iuc/iuc24/call.html

                           September 3-5, 2003
                             Atlanta, Georgia

 >>>>>>>>>>>>>>>>>>>>  Send in your submission now!  <<<<<<<<<<<<<<<<<<<

                     Submissions due: May 2, 2003
                   Notification date: May 23, 2003
                 Completed papers due: June 13, 2003
            (in electronic form and camera-ready paper form)

 >>>>>>>>>>>>>>>>>>>>>>>>  Just 4 weeks to go!  <<<<<<<<<<<<<<<<<<<<<<<<

WHAT'S NEW

Each conference's theme is different, allowing key subject areas
to be explored in depth. This conference will explore global
business needs and solutions and the impact of new technologies.

Go to the conference web site for a graphical version of this message:
http://www.unicode.org/iuc/iuc24/call.html

INVITATION TO SUBMIT PAPERS

The Internationalization & Unicode Conference is the premier technical
conference worldwide for both software and Web internationalization.
The conference features tutorials, lectures, and panel discussions that
provide coverage of standards, best practices, and recent advances in the
globalization of software and the Internet. The conference continues to
provide a forum for identifying and discussing new issues in this field.

New technologies, innovative Internet applications, and the evolving
Unicode Standard bring new challenges along with their new capabilities.
This technical conference will explore the opportunities created by the latest
advances, how to leverage them, and the potential pitfalls. Their impact on
business and the problem areas that need further research will also be
identified. Best practices for designing applications that can accommodate
any language will be demonstrated.

Attendees benefit from the wide range of basic to advanced topics and the
opportunities for dialog and idea exchange with experts and peers.
We invite you to submit papers that relate to Unicode or any aspect of
software and Web Internationalization, with special emphasis on the themes
discussed below. You can view the programs of previous conferences at:
http://www.unicode.org/unicode/conference/about-conf.html

CONFERENCE ATTENDEES

Conference attendees are generally involved in either the development and
deployment of Unicode software, or the globalization of software and the
Internet. They include managers, software engineers, testers, systems
analysts, program managers, font designers, graphic designers, content
developers, web designers, web administrators, site coordinators, technical
writers, and product marketing personnel.

THEME: INTERNATIONAL COMPUTING SOLUTIONS FOR GLOBAL BUSINESS

"International Computing Solutions for Global Business" is the overall
theme of the Conference.

In today's tight economy, companies are looking for productivity
improvements and increased international sales. One of many challenges is
to maximize use of existing resources while accomplishing these achievements.
Another is to incorporate new standards and technologies to gain competitive
features.

In support of the theme and these challenges, papers on GLOBAL BUSINESS and
NEW TECHNOLOGIES are requested. More details on the theme and other topics
of interest can be found at our web site:
http://www.unicode.org/iuc/iuc24/call.html

We invite you to submit papers which define tomorrow's computing, demonstrate
best practices in computing today, or articulate problems that must be
solved before further advances can occur. Presentations should be geared
towards a technical audience.

EXHIBIT OPPORTUNITIES

The Conference SHOWCASE area is for corporations and individuals who wish
to display and promote their products, technology and/or services. Every
effort will be made to provide maximum exposure, advertising and traffic.

Exhibit space is limited.  For further information or to reserve a place,
please contact Global Meeting Services at info@global-conference.com.

CONFERENCE VENUE

    DoubleTree Hotel Atlanta Buckhead
    3342 Peachtree Road
    Atlanta, GA  30326

    Tel:  +1-404-231-1234
    Fax:  +1-404-231-3112

THE UNICODE CONSORTIUM

The Unicode Consortium is a non-profit organization dedicated to the
development, maintenance and promotion of The Unicode Standard, a worldwide
character encoding.  The Unicode Standard encodes the characters of the
world's principal scripts and languages, and is code-for-code identical to
the international standard ISO/IEC 10646.  The Consortium also defines
character properties and algorithms for use in implementations.  The
membership base of the Unicode Consortium includes major computer
corporations, software producers, database vendors, research institutions,
international agencies and various user groups.

For further information on the Unicode Standard, visit the Unicode Web site
at http://www.unicode.org or e-mail <info@unicode.org>

                            *  *  *  *  *

Unicode(r) and the Unicode logo are registered trademarks of Unicode, Inc.
Used with permission. 


From barry@python.org  Fri Apr 11 18:51:56 2003
From: barry@python.org (Barry Warsaw)
Date: 11 Apr 2003 13:51:56 -0400
Subject: [I18n-sig] Changes to gettext.py for Python 2.3
Message-ID: <1050083516.11172.40.camel@barry>

Hi I18n-ers,

I plan on checking in the following changes to the gettext.py module for
Python 2.3, based on feedback from the Zope and Mailman i18n work. 
Here's a summary of the changes, hopefully there aren't too many
controversies <wink>.  I'll update the tests and the docs at the same
time.

- Expose NullTranslations and GNUTranslations to __all__

- Set the default charset to iso-8859-1.  It used to be None, which
would cause problems with .ugettext() if the file had no charset
parameter.  Arguably, the po/mo file would be broken, but I still think
iso-8859-1 is a reasonable default.

- Add a "coerce" default argument to GNUTranslations's constructor.  The
reason for this is that in Zope, we want all msgids and msgstrs to be
Unicode.  For the latter, we could use .ugettext() but there isn't
currently a mechanism for Unicode-ifying msgids.

The plan then is that the charset parameter specifies the encoding for
both the msgids and msgstrs, and both are decoded to Unicode when read. 
For example, we might encode po files with utf-8. I think the GNU
gettext tools don't care.

Since this could potentially break code [*] that wants to use the
encoded interface .gettext(), the constructor flag is added, defaulting
to False.  Most code I suspect will want to set this to True and use
.ugettext().

- A few other minor changes from the Zope project, including asserting
that a zero-length msgid must have a Project-ID-Version header for it to
be counted as the metadata record.

-Barry

[*] I've come to the opinion that using anything other than Unicode
msgids and msgstrs just won't work well for Python, and thus you really
should be using the .ugettext() method everywhere.  It's also insane to
mix .gettext() and .ugettext(). In Zope, all human readable messages
will be Unicode strings internally, so we definitely want Unicode
msgids.


From martin@v.loewis.de  Fri Apr 11 20:54:50 2003
From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 11 Apr 2003 21:54:50 +0200
Subject: [I18n-sig] Changes to gettext.py for Python 2.3
In-Reply-To: <1050083516.11172.40.camel@barry>
References: <1050083516.11172.40.camel@barry>
Message-ID: <3E971D8A.5020006@v.loewis.de>

Barry Warsaw wrote:

> - Set the default charset to iso-8859-1.  It used to be None, which
> would cause problems with .ugettext() if the file had no charset
> parameter.  Arguably, the po/mo file would be broken, but I still think
> iso-8859-1 is a reasonable default.

I'm -1 here. Why do you think it is a reasonable default?

Errors should never pass silently.
Unless explicitly silenced.

While iso-8859-1 might be a reasonable default in other application
domains, in the context of non-English text (which it typically is),
assuming Latin-1 is bound to create mojibake.

If your application can accept creating mojibake, I suggest a method
setdefaultencoding on the catalog, which has no effect if an encoding
was found in the catalog.

> - Add a "coerce" default argument to GNUTranslations's constructor.  The
> reason for this is that in Zope, we want all msgids and msgstrs to be
> Unicode.  For the latter, we could use .ugettext() but there isn't
> currently a mechanism for Unicode-ifying msgids.

Could you please in what context this is needed? msgids are ASCII, and
you can pass a Unicode string to ugettext just fine.

> The plan then is that the charset parameter specifies the encoding for
> both the msgids and msgstrs, and both are decoded to Unicode when read. 
> For example, we might encode po files with utf-8. I think the GNU
> gettext tools don't care.

They complain loudly if they find bytes > 127 in the msgid.

> Since this could potentially break code [*] that wants to use the
> encoded interface .gettext(), the constructor flag is added, defaulting
> to False.  Most code I suspect will want to set this to True and use
> .ugettext().

To avoid breakage, you could define ugettext as

   def ugettext(self, message):
       if isinstance(message, unicode):
          tmsg = self._catalog.get(message.encode(self._charset))
          if tmsg is None:
             return message
       else:
          tmsg = self._catalog.get(message, message)
       return unicode(tmsg, self._charset)

> - A few other minor changes from the Zope project, including asserting
> that a zero-length msgid must have a Project-ID-Version header for it to
> be counted as the metadata record.

That test was there, and removed on request of Bruno Haible, the GNU
gettext maintainer, as he points out that Project-ID-Version is not
mandatory for the metadata (see Patch #700839).

Regards,
Martin


From barry@python.org  Fri Apr 11 21:26:59 2003
From: barry@python.org (Barry Warsaw)
Date: 11 Apr 2003 16:26:59 -0400
Subject: [I18n-sig] Changes to gettext.py for Python 2.3
In-Reply-To: <3E971D8A.5020006@v.loewis.de>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>
Message-ID: <1050092819.11172.89.camel@barry>

On Fri, 2003-04-11 at 15:54, "Martin v. Löwis" wrote:
> Barry Warsaw wrote:
> 
> > - Set the default charset to iso-8859-1.  It used to be None, which
> > would cause problems with .ugettext() if the file had no charset
> > parameter.  Arguably, the po/mo file would be broken, but I still think
> > iso-8859-1 is a reasonable default.
> 
> I'm -1 here. Why do you think it is a reasonable default?
> 
> Errors should never pass silently.
> Unless explicitly silenced.
> 
> While iso-8859-1 might be a reasonable default in other application
> domains, in the context of non-English text (which it typically is),
> assuming Latin-1 is bound to create mojibake.

Okay, never mind, I'll back this one out.  The problem was caused by my
other patch to unicode-ify on read (see below) without first having a
charset.  I have a different fix for this.

> > - Add a "coerce" default argument to GNUTranslations's constructor.  The
> > reason for this is that in Zope, we want all msgids and msgstrs to be
> > Unicode.  For the latter, we could use .ugettext() but there isn't
> > currently a mechanism for Unicode-ifying msgids.
> 
> Could you please in what context this is needed? msgids are ASCII, and
> you can pass a Unicode string to ugettext just fine.

In Zope, all strings are Unicode and the catalog may include messages
that are extracted from places other than Python source code, e.g.
XML-based files.  Message ids can contain non-ASCII characters if they
are written by a non-English coder.  I think in that case, we'd want to
do something like encode the strings possibly with utf-8 for the .po/.mo
files, but we want them decoded in time to look the Unicode strings up
in the catalog.

Similarly, what happens if a non-English coder writes an i18n'd Python
module with native strings, possibly using a Python 2.3 coding cookie. 
We'd want their message ids to be extracted into the .mo/.po files,
right?

> > The plan then is that the charset parameter specifies the encoding for
> > both the msgids and msgstrs, and both are decoded to Unicode when read. 
> > For example, we might encode po files with utf-8. I think the GNU
> > gettext tools don't care.
> 
> They complain loudly if they find bytes > 127 in the msgid.

Really?  Ok, I'm still confused because I tried the following example:

I wrote a .mo file (charset=utf-8) with the following record:

#: nofile:0
msgid "ab\xc3\x9e"
msgstr "\xc2\xa4yz"

I used standard msgfmt to turn that into a .mo file.  Then created a
GNUTranslation(fp, coerce=True) and called

>>> t.ugettext(u'ab\xde')
u'\xa4yz'

This is what I should expect, right? ;)

> > - A few other minor changes from the Zope project, including asserting
> > that a zero-length msgid must have a Project-ID-Version header for it to
> > be counted as the metadata record.
> 
> That test was there, and removed on request of Bruno Haible, the GNU
> gettext maintainer, as he points out that Project-ID-Version is not
> mandatory for the metadata (see Patch #700839).

Ah, I read the diff backwards in this case.  I'll back this one out too.

-Barry


From barry@python.org  Fri Apr 11 21:37:56 2003
From: barry@python.org (Barry Warsaw)
Date: 11 Apr 2003 16:37:56 -0400
Subject: [I18n-sig] Changes to gettext.py for Python 2.3
In-Reply-To: <3E971D8A.5020006@v.loewis.de>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>
Message-ID: <1050093475.11200.96.camel@barry>

On Fri, 2003-04-11 at 15:54, "Martin v. Löwis" wrote:

> To avoid breakage, you could define ugettext as
> 
>    def ugettext(self, message):
>        if isinstance(message, unicode):
>           tmsg = self._catalog.get(message.encode(self._charset))
>           if tmsg is None:
>              return message
>        else:
>           tmsg = self._catalog.get(message, message)
>        return unicode(tmsg, self._charset)

I suppose we could cache the conversion to make the next lookup more
efficient.  Alternatively, if we always convert internally to Unicode we
could encode on .gettext().  Then we could just pick One Way and do away
with the coerce flag.

-Barry
    

From martin@v.loewis.de  Sat Apr 12 11:34:05 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 12 Apr 2003 12:34:05 +0200
Subject: [I18n-sig] Changes to gettext.py for Python 2.3
In-Reply-To: <1050093475.11200.96.camel@barry>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>
 <1050093475.11200.96.camel@barry>
Message-ID: <m38yug57j6.fsf@mira.informatik.hu-berlin.de>

Barry Warsaw <barry@python.org> writes:

> I suppose we could cache the conversion to make the next lookup more
> efficient.  Alternatively, if we always convert internally to Unicode we
> could encode on .gettext().  Then we could just pick One Way and do away
> with the coerce flag.

If you are concerned about efficiency, I guess there is no way to
avoid converting the file to Unicode on loading. I would then
encourage a change where this flag is available, but has an effect
only on performance, not on the behaviour.

Alternatively, you could subclass GNUTranslation.

Regards,
Martin


From martin@v.loewis.de  Sat Apr 12 12:43:28 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 12 Apr 2003 13:43:28 +0200
Subject: [I18n-sig] Changes to gettext.py for Python 2.3
In-Reply-To: <1050092819.11172.89.camel@barry>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>
 <1050092819.11172.89.camel@barry>
Message-ID: <m3istk3pr3.fsf@mira.informatik.hu-berlin.de>

Barry Warsaw <barry@python.org> writes:

> I used standard msgfmt to turn that into a .mo file.  Then created a
> GNUTranslation(fp, coerce=3DTrue) and called
>=20
> >>> t.ugettext(u'ab\xde')
> u'\xa4yz'
>=20
> This is what I should expect, right? ;)

More or less, yes. Now, what happens if you pot "real" non-ASCII
(i.e. bytes above 127) into the message id, like so:

msgid "ab=F6"
msgstr "\xc2\xa4yz"

msgfmt will still accept that, but msgunfmt will complain:

msgunfmt: warning: The following msgid contains non-ASCII characters.
                   This will cause problems to translators who use a
                   character encoding different from yours. Consider
                   using a pure ASCII msgid instead.

If you think about this, this is really bad: If you mean to apply the
charset=3D to both msgid and msgstr, then translators using a different
charset from yours are in big trouble.

They are faced with three problems:
1. They don't know what the charset of the msgids is. The PO files do
   have a charset declaration, the POT files typically don't.
2. They need to convert the msgids from the POT encoding to their
   native encoding. There are no tools available to support that readily;
   tools like iconv might correctly convert the msgids, but won't update
   the charset=3D in the POT file (if the charset was filled out).
3. By converting the msgids, they are also changing them. That means
   the msgids are not really suitable as keys anymore.

Regards,
Martin


From bh@intevation.de  Wed Apr 16 17:27:26 2003
From: bh@intevation.de (Bernhard Herzog)
Date: 16 Apr 2003 18:27:26 +0200
Subject: [I18n-sig] pygettext and msgfmt support for distutils
Message-ID: <6qisteh0gh.fsf@salmakis.intevation.de>

Is someone working on integrating the gettext utilities with distutils?

Some background:

We've just added some simple gettext support to our geogaphic data
viewer Thuban[1] but the setup we currently use only works on Unix-like
systems because it's just a makefile that can be used to call xgettext
(0.11 which supports python :)) and msgmerge and msgfmt as needed.

I've already adapted our setup.py file to include any formatted .mo
files and any po files in the source distribution and to install the mo
files together with other data files which works well so far.

What I'd still like to have is a way to at least get the functionality
of xgettext (or pygettext) and msgfmt into the distutils setup.py in
such a way that it works on windows as well as on Unix.

Is someone working on this kind of thing? 

If no, what would be needed to get it into the standard distutils?
AFAICT, for a start it might be good to move pygettext.py and msgfmt.py
into the standard library so that distutils can easily call them to do
the actual work.

   Bernhard


[1] http://thuban.intevation.org/

-- 
Intevation GmbH                                 http://intevation.de/
Sketch                                 http://sketch.sourceforge.net/
MapIt!                                           http://www.mapit.de/


From barry@python.org  Wed Apr 16 17:52:06 2003
From: barry@python.org (Barry Warsaw)
Date: 16 Apr 2003 12:52:06 -0400
Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python
 2.3
In-Reply-To: <m3istk3pr3.fsf@mira.informatik.hu-berlin.de>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>
 <1050092819.11172.89.camel@barry>
 <m3istk3pr3.fsf@mira.informatik.hu-berlin.de>
Message-ID: <1050511925.9818.78.camel@barry>

On Sat, 2003-04-12 at 07:43, Martin v. Löwis wrote:

> More or less, yes. Now, what happens if you pot "real" non-ASCII
> (i.e. bytes above 127) into the message id, like so:

But I don't think you'd ever want to do that.  In fact, I think in
general you're probably talking about ascii msgids or utf-8 encoded
Unicode msgids.  I'm not sure what else would make sense.

> msgfmt will still accept that, but msgunfmt will complain:

Didn't even know about msgunfmt. :)

> msgunfmt: warning: The following msgid contains non-ASCII characters.
>                    This will cause problems to translators who use a
>                    character encoding different from yours. Consider
>                    using a pure ASCII msgid instead.
> 
> If you think about this, this is really bad: If you mean to apply the
> charset= to both msgid and msgstr, then translators using a different
> charset from yours are in big trouble.

Right, but see above.  E.g. if your string literals are all Spanish and
you want a Turkish translation, then utf-8 is the only common encoding
you could possibly use in a .po file, right?

> They are faced with three problems:
> 1. They don't know what the charset of the msgids is. The PO files do
>    have a charset declaration, the POT files typically don't.

Yep, although it would be easy for the extractor to add a charset=utf-8
to the pot file.

> 2. They need to convert the msgids from the POT encoding to their
>    native encoding. There are no tools available to support that readily;
>    tools like iconv might correctly convert the msgids, but won't update
>    the charset= in the POT file (if the charset was filled out).
> 3. By converting the msgids, they are also changing them. That means
>    the msgids are not really suitable as keys anymore.

Is this still a problem for when charset=utf-8?

-Barry


From barry@python.org  Wed Apr 16 17:53:53 2003
From: barry@python.org (Barry Warsaw)
Date: 16 Apr 2003 12:53:53 -0400
Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python
 2.3
In-Reply-To: <m38yug57j6.fsf@mira.informatik.hu-berlin.de>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>
 <1050093475.11200.96.camel@barry>
 <m38yug57j6.fsf@mira.informatik.hu-berlin.de>
Message-ID: <1050512032.9818.81.camel@barry>

On Sat, 2003-04-12 at 06:34, Martin v. Löwis wrote:
> Barry Warsaw <barry@python.org> writes:
> 
> > I suppose we could cache the conversion to make the next lookup more
> > efficient.  Alternatively, if we always convert internally to Unicode we
> > could encode on .gettext().  Then we could just pick One Way and do away
> > with the coerce flag.
> 
> If you are concerned about efficiency, I guess there is no way to
> avoid converting the file to Unicode on loading. I would then
> encourage a change where this flag is available, but has an effect
> only on performance, not on the behaviour.
> 
> Alternatively, you could subclass GNUTranslation.

It would take some refactoring, unless you implemented a second pass
over the catalog.  I'd rather not do either, so I'm happy to include
this right in GNUTranslations.

-Barry


From barry@python.org  Wed Apr 16 19:10:40 2003
From: barry@python.org (Barry Warsaw)
Date: 16 Apr 2003 14:10:40 -0400
Subject: [I18n-sig] pygettext and msgfmt support for distutils
In-Reply-To: <6qisteh0gh.fsf@salmakis.intevation.de>
References: <6qisteh0gh.fsf@salmakis.intevation.de>
Message-ID: <1050516640.9818.150.camel@barry>

On Wed, 2003-04-16 at 12:27, Bernhard Herzog wrote:
> Is someone working on integrating the gettext utilities with distutils?
> 
> Some background:
> 
> We've just added some simple gettext support to our geogaphic data
> viewer Thuban[1] but the setup we currently use only works on Unix-like
> systems because it's just a makefile that can be used to call xgettext
> (0.11 which supports python :)) and msgmerge and msgfmt as needed.

I haven't had time to look at the latest xgettext, but do you know if it
supports all the extra features that pygettext supports?  Of primary
importance to me is the -D/--docstrings and -X/--no-docstrings options.

> I've already adapted our setup.py file to include any formatted .mo
> files and any po files in the source distribution and to install the mo
> files together with other data files which works well so far.
> 
> What I'd still like to have is a way to at least get the functionality
> of xgettext (or pygettext) and msgfmt into the distutils setup.py in
> such a way that it works on windows as well as on Unix.

msgfmt I can see, but I'm not so sure about {x,py}gettext.  IME, I don't
want to do message extraction at either build time or tar-it-up time.  I
usually want to do extraction at defined boundaries in the project's
development.  So that seems to me a separate process.  I'm interested in
getting your ideas here.

Hook msgfmt up in some way would definitely be useful.  That way you
wouldn't need to include .mo files in your distro (nor in cvs).

> Is someone working on this kind of thing? 
> 
> If no, what would be needed to get it into the standard distutils?

Let's start with a patch! :)

> AFAICT, for a start it might be good to move pygettext.py and msgfmt.py
> into the standard library so that distutils can easily call them to do
> the actual work.

Hmm, possibly.  They may need to be rewritten or refactored to make them
more appropriate as library modules.  I can see an i18n package being
added to Python's stdlib someday which might contain the raw materials,
with the Tools/i18n scripts being mostly just __main__ and getargs
wrappers.

-Barry


From martin@v.loewis.de  Wed Apr 16 20:20:34 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 16 Apr 2003 21:20:34 +0200
Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3
In-Reply-To: <1050511925.9818.78.camel@barry>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>
 <1050092819.11172.89.camel@barry>
 <m3istk3pr3.fsf@mira.informatik.hu-berlin.de>
 <1050511925.9818.78.camel@barry>
Message-ID: <m3u1cy9rlp.fsf@mira.informatik.hu-berlin.de>

Barry Warsaw <barry@python.org> writes:

> Right, but see above.  E.g. if your string literals are all Spanish and
> you want a Turkish translation, then utf-8 is the only common encoding
> you could possibly use in a .po file, right?

That's why your string literals should never be all Spanish. If you
have Spanish string literals and use escape codes in the msgid,
reading the Spanish msgid becomes difficult, anyway.

> > 3. By converting the msgids, they are also changing them. That means
> >    the msgids are not really suitable as keys anymore.
> 
> Is this still a problem for when charset=utf-8?

If the msgids are UTF-8, with non-ASCII characters C-escaped,
translators will *still* put non-UTF-8 encodings into the catalogs.
This will then be a problem: The catalog encoding won't be UTF-8,
and you can't process the msgids.

Regards,
Martin


From martin@v.loewis.de  Wed Apr 16 20:24:43 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 16 Apr 2003 21:24:43 +0200
Subject: [I18n-sig] pygettext and msgfmt support for distutils
In-Reply-To: <6qisteh0gh.fsf@salmakis.intevation.de>
References: <6qisteh0gh.fsf@salmakis.intevation.de>
Message-ID: <m3k7du9res.fsf@mira.informatik.hu-berlin.de>

Bernhard Herzog <bh@intevation.de> writes:

> What I'd still like to have is a way to at least get the functionality
> of xgettext (or pygettext) and msgfmt into the distutils setup.py in
> such a way that it works on windows as well as on Unix.
> 
> Is someone working on this kind of thing? 

I'm with Barry here: You shouldn't have xgettext as part of the build
or install commands. Providing a different command would be fine. For
msgfmt, having that as a build step would be useful.

However, more important, to me, seems to define a mechanism to
smoothly install .mo files, in a location where gettext would find
them.

Regards,
Martin


From barry@python.org  Wed Apr 16 20:36:08 2003
From: barry@python.org (Barry Warsaw)
Date: 16 Apr 2003 15:36:08 -0400
Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python
 2.3
In-Reply-To: <m3u1cy9rlp.fsf@mira.informatik.hu-berlin.de>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>
 <1050092819.11172.89.camel@barry>
 <m3istk3pr3.fsf@mira.informatik.hu-berlin.de>
 <1050511925.9818.78.camel@barry>
 <m3u1cy9rlp.fsf@mira.informatik.hu-berlin.de>
Message-ID: <1050521768.14112.15.camel@barry>

On Wed, 2003-04-16 at 15:20, Martin v. Löwis wrote:
> Barry Warsaw <barry@python.org> writes:
> 
> > Right, but see above.  E.g. if your string literals are all Spanish and
> > you want a Turkish translation, then utf-8 is the only common encoding
> > you could possibly use in a .po file, right?
> 
> That's why your string literals should never be all Spanish. If you
> have Spanish string literals and use escape codes in the msgid,
> reading the Spanish msgid becomes difficult, anyway.

So why isn't the English/US-ASCII bias for msgids considered a liability
for gettext?  Do non-English programmers not want to use native literals
in their source code?

If we adhere to this limitation instead of extending gettext then it
seems like Zope will be forced to use something else, and that seems
like a waste.  Its msgids come from sources other than program source
code and such sources may indeed be written in non-English.  It seems
like gettext is so close and all the machinery is almost there, that
this small enhancement should be harmless and helpful.

BTW, I believe that if all your msgids /are/ us-ascii, you should be
able to ignore this change and have it works backwards compatibly.

Also, this change ought to visibly only affect .ugettext() which isn't
part of the traditional gettext API anyway.

> > > 3. By converting the msgids, they are also changing them. That means
> > >    the msgids are not really suitable as keys anymore.
> > 
> > Is this still a problem for when charset=utf-8?
> 
> If the msgids are UTF-8, with non-ASCII characters C-escaped,
> translators will *still* put non-UTF-8 encodings into the catalogs.
> This will then be a problem: The catalog encoding won't be UTF-8,
> and you can't process the msgids.

Isn't this just another validation step to run on the .po files?  There
are already several ways translators can (and do!) make mistakes, so we
already have to validate the files anyway.

-Barry


From barry@python.org  Wed Apr 16 20:59:44 2003
From: barry@python.org (Barry Warsaw)
Date: 16 Apr 2003 15:59:44 -0400
Subject: [I18n-sig] pygettext and msgfmt support for distutils
In-Reply-To: <m3k7du9res.fsf@mira.informatik.hu-berlin.de>
References: <6qisteh0gh.fsf@salmakis.intevation.de>
 <m3k7du9res.fsf@mira.informatik.hu-berlin.de>
Message-ID: <1050523183.14115.41.camel@barry>

On Wed, 2003-04-16 at 15:24, Martin v. Löwis wrote:

> However, more important, to me, seems to define a mechanism to
> smoothly install .mo files, in a location where gettext would find
> them.

Excellent point!

setup.py has some provisions for installing data files, so maybe that
can be piggybacked?  I don't have time to look into this right now, but
it would be nice to do something like

- specify the domain in setup
- have setup drop the files in <--install-data>/xx/LC_MESSAGES

-Barry


From martin@v.loewis.de  Wed Apr 16 23:07:15 2003
From: martin@v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 17 Apr 2003 00:07:15 +0200
Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python
 2.3
In-Reply-To: <1050521768.14112.15.camel@barry>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>	 <1050092819.11172.89.camel@barry>	 <m3istk3pr3.fsf@mira.informatik.hu-berlin.de>	 <1050511925.9818.78.camel@barry>	 <m3u1cy9rlp.fsf@mira.informatik.hu-berlin.de> <1050521768.14112.15.camel@barry>
Message-ID: <3E9DD413.8030002@v.loewis.de>

Barry Warsaw wrote:

> So why isn't the English/US-ASCII bias for msgids considered a liability
> for gettext?  Do non-English programmers not want to use native literals
> in their source code?

Using English for msgids is about the only way to get translation. 
Finding a Turkish speaker who can translate from Spanish is 
*significantly* more difficult than starting from English; if you were 
starting from, say, Chinese, and going to Hebrew might just be impossible.

So any programmer who seriously wants to have his software translated 
will put English texts into the source code. Non-English literals are 
only used if l10n is not an issue.

> If we adhere to this limitation instead of extending gettext then it
> seems like Zope will be forced to use something else, and that seems
> like a waste.  

It's not a limitation of gettext, but a usage guideline: gettext can map 
arbitrary byte strings to arbitrary other byte strings.

> BTW, I believe that if all your msgids /are/ us-ascii, you should be
> able to ignore this change and have it works backwards compatibly.

"This" change being addition of the "coerce" argument? If you think
you will need it, we can leave it in.

>>If the msgids are UTF-8, with non-ASCII characters C-escaped,
>>translators will *still* put non-UTF-8 encodings into the catalogs.
>>This will then be a problem: The catalog encoding won't be UTF-8,
>>and you can't process the msgids.
> 
> 
> Isn't this just another validation step to run on the .po files?  There
> are already several ways translators can (and do!) make mistakes, so we
> already have to validate the files anyway.

I'm not sure how exactly a validation step would be executed. Would that
step simply verify that the encoding of a catalog is UTF-8? That 
validation step would fail for catalogs that legally use other charsets.

Regards,
Martin


From bh@intevation.de  Thu Apr 17 11:23:52 2003
From: bh@intevation.de (Bernhard Herzog)
Date: 17 Apr 2003 12:23:52 +0200
Subject: [I18n-sig] pygettext and msgfmt support for distutils
In-Reply-To: <m3k7du9res.fsf@mira.informatik.hu-berlin.de>
References: <6qisteh0gh.fsf@salmakis.intevation.de>
 <m3k7du9res.fsf@mira.informatik.hu-berlin.de>
Message-ID: <6qel41e81z.fsf@salmakis.intevation.de>

martin@v.loewis.de (Martin v. Löwis) writes:

> I'm with Barry here: You shouldn't have xgettext as part of the build
> or install commands.

That wasn't quite my intention. For the xgettext step I was thinking
more of a separate command that is not automatically called when doing a
build. It seems to me that maybe such a command should be run as part of
the sdist command to make sure that a .pot file shipped with the sources
is up to date.

Another reason I thought it might be good to have it as part of
distutils is that it could make use of information the distutils have
to, say, extract the translatable strings from all python source files
in the distribution.

> Providing a different command would be fine. For msgfmt, having that
> as a build step would be useful.
> 
> However, more important, to me, seems to define a mechanism to
> smoothly install .mo files, in a location where gettext would find
> them.

That wasn't much of a problem in our case, but thuban is more an
application than a library so we don't install under site-packages and
so have more control over where to look for mo files. 

I simply put the mo files into a directory right next to another
directory containing data files (icons). I use this scheme in Sketch too
(although without distutils so far) and so far nobody has complained
about this :).

   Bernhard

-- 
Intevation GmbH                                 http://intevation.de/
Sketch                                 http://sketch.sourceforge.net/
MapIt!                                           http://www.mapit.de/


From bh@intevation.de  Sat Apr 19 19:20:44 2003
From: bh@intevation.de (Bernhard Herzog)
Date: 19 Apr 2003 20:20:44 +0200
Subject: [I18n-sig] pygettext and msgfmt support for distutils
References: <6qisteh0gh.fsf@salmakis.intevation.de>
 <1050516640.9818.150.camel@barry>
Message-ID: <6q65paic1v.fsf@salmakis.intevation.de>

Barry Warsaw <barry@python.org> writes:

> I haven't had time to look at the latest xgettext, but do you know if it
> supports all the extra features that pygettext supports?

AFAICT python support means mostly that it understands the python syntax
enough to recognize all string literals correctly.

> Of primary
> importance to me is the -D/--docstrings and -X/--no-docstrings options.

There doesn't seem to be support for this. At least there's nothing in
the docs about this.

   Bernhard

-- 
Intevation GmbH                                 http://intevation.de/
Sketch                                 http://sketch.sourceforge.net/
MapIt!                                           http://www.mapit.de/


From tex@I18nGuy.com  Sun Apr 20 00:01:37 2003
From: tex@I18nGuy.com (Tex Texin)
Date: Sat, 19 Apr 2003 19:01:37 -0400
Subject: [I18n-sig] IUC24 Call for Papers INTERNATIONAL COMPUTING SOLUTIONS FOR GLOBAL
 BUSINESS
Message-ID: <3EA1D551.80C8B684@I18nGuy.com>

Join us in Atlanta this September!
>>>>>>>>>>>>>>>>>>>>>>>>>>  Call for Papers!  <<<<<<<<<<<<<<<<<<<<<<<<<

    Twenty-fourth Internationalization and Unicode Conference (IUC24)
     Unicode, Internationalization, the Web: Powering Global Business

                         See Call for Papers at:
               http://www.unicode.org/iuc/iuc24/call.html

                          September 3-5, 2003
                            Atlanta, Georgia

>>>>>>>>>>>>>>>>>>>>  Send in your submission now!  <<<<<<<<<<<<<<<<<<<

                    Submissions due: May 2, 2003
                  Notification date: May 23, 2003
                Completed papers due: June 13, 2003
           (in electronic form and camera-ready paper form)

>>>>>>>>>>>>>>>>>>>>>>>>  Just 2 weeks to go!  <<<<<<<<<<<<<<<<<<<<<<<<

WHAT'S NEW

Each conference's theme is different, allowing key subject areas
to be explored in depth. This conference will explore global
business needs and solutions and the impact of new technologies.

Go to the conference web site for a graphical version of this message,
and submit your proposal via our new web-based form!
 http://www.unicode.org/iuc/iuc24/call.html

THEME: INTERNATIONAL COMPUTING SOLUTIONS FOR GLOBAL BUSINESS

"International Computing Solutions for Global Business" is the overall
theme of the Conference.

In today's tight economy, companies are looking for productivity
improvements and increased international sales. One of many challenges is 
to maximize use of existing resources while accomplishing these achievements. 
Another is to incorporate new standards and technologies to gain competitive 
features.

In support of the theme and these challenges, papers on GLOBAL BUSINESS and
NEW TECHNOLOGIES are requested. More details on the theme and other topics
of interest can be found at our web site:
http://www.unicode.org/iuc/iuc24/call.html

INVITATION TO SUBMIT PAPERS

We invite you to submit papers which define tomorrow's computing, demonstrate
best practices in computing today, or articulate problems that must be
solved before further advances can occur. Presentations should be geared
towards a technical audience.

The Internationalization & Unicode Conference is the premier technical
conference worldwide for both software and Web internationalization.
The conference features tutorials, lectures, and panel discussions that
provide coverage of standards, best practices, and recent advances in the
globalization of software and the Internet. The conference continues to
provide a forum for identifying and discussing new issues in this field.

New technologies, innovative Internet applications, and the evolving
Unicode Standard bring new challenges along with their new capabilities. 
This technical conference will explore the opportunities created by the 
latest advances, how to leverage them, and the potential pitfalls. Their 
impact on business and the problem areas that need further research will 
also be identified. Best practices for designing applications that can 
accommodate any language will be demonstrated.

Attendees benefit from the wide range of basic to advanced topics and the
opportunities for dialog and idea exchange with experts and peers.
We invite you to submit papers that relate to Unicode or any aspect of
software and Web Internationalization, with special emphasis on the themes
discussed below. You can view the programs of previous conferences at:
http://www.unicode.org/unicode/conference/about-conf.html

CONFERENCE ATTENDEES

Conference attendees are generally involved in either the development and
deployment of Unicode software, or the globalization of software and the
Internet. They include managers, software engineers, testers, systems
analysts, program managers, font designers, graphic designers, content
developers, web designers, web administrators, site coordinators, technical
writers, and product marketing personnel.


EXHIBIT OPPORTUNITIES

The Conference SHOWCASE area is for corporations and individuals who wish
to display and promote their products, technology and/or services. Every
effort will be made to provide maximum exposure, advertising and traffic.

Exhibit space is limited.  For further information or to reserve a place,
please contact Global Meeting Services at info@global-conference.com.

CONFERENCE VENUE

   DoubleTree Hotel Atlanta Buckhead
   3342 Peachtree Road
   Atlanta, GA  30326

   Tel:  +1-404-231-1234
   Fax:  +1-404-231-3112

THE UNICODE CONSORTIUM

The Unicode Consortium is a non-profit organization dedicated to the
development, maintenance and promotion of The Unicode Standard, a worldwide
character encoding.  The Unicode Standard encodes the characters of the
world's principal scripts and languages, and is code-for-code identical to
the international standard ISO/IEC 10646.  The Consortium also defines
character properties and algorithms for use in implementations.  The
membership base of the Unicode Consortium includes major computer
corporations, software producers, database vendors, research institutions, 
international agencies and various user groups.

For further information on the Unicode Standard, visit the Unicode Web site
at http://www.unicode.org or e-mail <info@unicode.org>

                           *  *  *  *  *

Unicode(r) and the Unicode logo are registered trademarks of Unicode, Inc.
Used with permission.


From perky@fallin.lv  Mon Apr 21 00:01:03 2003
From: perky@fallin.lv (Hye-Shik Chang)
Date: Mon, 21 Apr 2003 08:01:03 +0900
Subject: [I18n-sig] ANN: iconvcodec 1.0 is released
Message-ID: <20030420230103.GA20594@fallin.lv>

Hi, i18n guys!

 I just released iconvcodec 1.0.
 The iconvcodec is an universal unicode codec module for Python
 using POSIX iconv(3). It supports various libiconv
 implementations including GNU libiconv, GNU libc, FreeBSD iconv,
 Solaris iconv and etc. And, supports the following features:

  * PEP293 Error Callbacks (for Python 2.3 only)
  * Reentrant-safe encoder and decoder
  * Adaptive multiple unicode encodings: UCS, swapped UCS, UTF-8
  * Stateful/context-aware StreamReader and StreamWriter

 You can download the source and binary packages for FreeBSD,
 RedHat and/or Windows from SourceForge:
 
   http://sourceforge.net/project/showfiles.php?group_id=46747


 Thank you!


Regards,

    Hye-Shik =)


From barry@python.org  Tue Apr 22 20:19:47 2003
From: barry@python.org (Barry Warsaw)
Date: 22 Apr 2003 15:19:47 -0400
Subject: [I18n-sig] pygettext and msgfmt support for distutils
In-Reply-To: <6qel41e81z.fsf@salmakis.intevation.de>
References: <6qisteh0gh.fsf@salmakis.intevation.de>
 <m3k7du9res.fsf@mira.informatik.hu-berlin.de>
 <6qel41e81z.fsf@salmakis.intevation.de>
Message-ID: <1051039187.32583.37.camel@barry>

On Thu, 2003-04-17 at 06:23, Bernhard Herzog wrote:
> martin@v.loewis.de (Martin v. Löwis) writes:
> 
> > I'm with Barry here: You shouldn't have xgettext as part of the build
> > or install commands.
> 
> That wasn't quite my intention. For the xgettext step I was thinking
> more of a separate command that is not automatically called when doing a
> build. It seems to me that maybe such a command should be run as part of
> the sdist command to make sure that a .pot file shipped with the sources
> is up to date.

I tend to think about updating the .pot file on a much different
schedule than creating source distributions.  Actually, sdist-time would
be too late since I usually like to give my translators a little
heads-up before a release.

> Another reason I thought it might be good to have it as part of
> distutils is that it could make use of information the distutils have
> to, say, extract the translatable strings from all python source files
> in the distribution.

In my experience, this isn't too much of a problem.  It's usually pretty
easy to write a find script to calculate the files for extraction.  The
hard part (for me) is figuring out which files you also want to extract
docstrings for, and such a distinction isn't built into distutils
(although possibly could be -- they're usually command line scripts).

-Barry


From barry@python.org  Tue Apr 22 20:22:41 2003
From: barry@python.org (Barry Warsaw)
Date: 22 Apr 2003 15:22:41 -0400
Subject: [I18n-sig] pygettext and msgfmt support for distutils
In-Reply-To: <6q65paic1v.fsf@salmakis.intevation.de>
References: <6qisteh0gh.fsf@salmakis.intevation.de>
 <1050516640.9818.150.camel@barry>  <6q65paic1v.fsf@salmakis.intevation.de>
Message-ID: <1051039360.32583.41.camel@barry>

On Sat, 2003-04-19 at 14:20, Bernhard Herzog wrote:
> Barry Warsaw <barry@python.org> writes:
> 
> > I haven't had time to look at the latest xgettext, but do you know if it
> > supports all the extra features that pygettext supports?
> 
> AFAICT python support means mostly that it understands the python syntax
> enough to recognize all string literals correctly.

That's a good start, for sure.

> > Of primary
> > importance to me is the -D/--docstrings and -X/--no-docstrings options.
> 
> There doesn't seem to be support for this. At least there's nothing in
> the docs about this.

Ok, IBWNI.  BTW, the reason I want this is mostly because I put usage
information in module docstrings for command line scripts.  I really
don't want to use something like:

__doc__ = _("""mailmanctl -- start and stop the qrunner daemons
...
""")

I use this as

def usage(...):
   print _(__doc__)

And I just want to be able to say, okay, extract the docstring for
bin/mailmanctl, but not for certain other files.

-Barry


From barry@python.org  Tue Apr 22 20:53:25 2003
From: barry@python.org (Barry Warsaw)
Date: 22 Apr 2003 15:53:25 -0400
Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python
 2.3
In-Reply-To: <3E9DD413.8030002@v.loewis.de>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>
 <1050092819.11172.89.camel@barry>
 <m3istk3pr3.fsf@mira.informatik.hu-berlin.de>
 <1050511925.9818.78.camel@barry>
 <m3u1cy9rlp.fsf@mira.informatik.hu-berlin.de>
 <1050521768.14112.15.camel@barry>  <3E9DD413.8030002@v.loewis.de>
Message-ID: <1051041205.32490.51.camel@barry>

On Wed, 2003-04-16 at 18:07, "Martin v. Löwis" wrote:

> > So why isn't the English/US-ASCII bias for msgids considered a liability
> > for gettext?  Do non-English programmers not want to use native literals
> > in their source code?
> 
> Using English for msgids is about the only way to get translation. 
> Finding a Turkish speaker who can translate from Spanish is 
> *significantly* more difficult than starting from English; if you were 
> starting from, say, Chinese, and going to Hebrew might just be impossible.
> 
> So any programmer who seriously wants to have his software translated 
> will put English texts into the source code. Non-English literals are 
> only used if l10n is not an issue.

That's probably true.  I'm just not sure Zope wants to make that a
requirement.

> > BTW, I believe that if all your msgids /are/ us-ascii, you should be
> > able to ignore this change and have it works backwards compatibly.
> 
> "This" change being addition of the "coerce" argument? If you think
> you will need it, we can leave it in.

Actually, thinking about this more, we probably don't even need the
coerce flag.  If all your msgids are us-ascii, you don't care whether
they've been coerced to Unicode or not because they'll still compare
equal.

So I propose to remove the coerce flag, but still Unicode-ify both
msgids and msgstrs.  Then .ugettext() will just return the Unicode
msgstr in the catalog, while .gettext() will encode it to an 8-bit
string based on the charset.  Personally, I think most i18n Python apps
are going to want to use .ugettext() anyway, so for the average program
this will just work as expected.

I have the tests passing for this change.  Any objections?

> >>If the msgids are UTF-8, with non-ASCII characters C-escaped,
> >>translators will *still* put non-UTF-8 encodings into the catalogs.
> >>This will then be a problem: The catalog encoding won't be UTF-8,
> >>and you can't process the msgids.
> > 
> > Isn't this just another validation step to run on the .po files?  There
> > are already several ways translators can (and do!) make mistakes, so we
> > already have to validate the files anyway.
> 
> I'm not sure how exactly a validation step would be executed. Would that
> step simply verify that the encoding of a catalog is UTF-8? That 
> validation step would fail for catalogs that legally use other charsets.

The validation step would make sure that all the msgids and msgstrs
could be decoded using the encoding claimed in the headers.  If msgids
are us-ascii then (just about) any other encoding for msgstrs should
work just fine.  If there are non-ascii in both msgids and msgstrs, then
some common encoding would have to be used (what other than utf-8?). 
It's a choice left up to the application and its translators.

-Barry


From martin@v.loewis.de  Tue Apr 22 23:15:08 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 23 Apr 2003 00:15:08 +0200
Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3
In-Reply-To: <1051041205.32490.51.camel@barry>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>
 <1050092819.11172.89.camel@barry>
 <m3istk3pr3.fsf@mira.informatik.hu-berlin.de>
 <1050511925.9818.78.camel@barry>
 <m3u1cy9rlp.fsf@mira.informatik.hu-berlin.de>
 <1050521768.14112.15.camel@barry> <3E9DD413.8030002@v.loewis.de>
 <1051041205.32490.51.camel@barry>
Message-ID: <m3fzoatc0j.fsf@mira.informatik.hu-berlin.de>

Barry Warsaw <barry@python.org> writes:

> So I propose to remove the coerce flag, but still Unicode-ify both
> msgids and msgstrs.  Then .ugettext() will just return the Unicode
> msgstr in the catalog, while .gettext() will encode it to an 8-bit
> string based on the charset.  Personally, I think most i18n Python apps
> are going to want to use .ugettext() anyway, so for the average program
> this will just work as expected.
> 
> I have the tests passing for this change.  Any objections?

For safety, I'd recommend that you use byte string msgids if
conversion to Unicode fails. Otherwise, I'm fine with automatically
coercing everything to Unicode.

I do know about catalogs that use Latin-1 in msgids (to represent
accented characters in the names of authors). That should not cause
failures.

Regards,
Martin


From barry@python.org  Thu Apr 24 15:58:36 2003
From: barry@python.org (Barry Warsaw)
Date: 24 Apr 2003 10:58:36 -0400
Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python
 2.3
In-Reply-To: <m3fzoatc0j.fsf@mira.informatik.hu-berlin.de>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>
 <1050092819.11172.89.camel@barry>
 <m3istk3pr3.fsf@mira.informatik.hu-berlin.de>
 <1050511925.9818.78.camel@barry>
 <m3u1cy9rlp.fsf@mira.informatik.hu-berlin.de>
 <1050521768.14112.15.camel@barry> <3E9DD413.8030002@v.loewis.de>
 <1051041205.32490.51.camel@barry>
 <m3fzoatc0j.fsf@mira.informatik.hu-berlin.de>
Message-ID: <1051196316.22909.13.camel@barry>

On Tue, 2003-04-22 at 18:15, Martin v. Löwis wrote:

> For safety, I'd recommend that you use byte string msgids if
> conversion to Unicode fails. Otherwise, I'm fine with automatically
> coercing everything to Unicode.

For now, I'll add a comment to the code at the point of conversion since
I'm not sure whether it's better to throw an exception or attempt to
carry on with 8-bit strings.  I'll update the docs too.

> I do know about catalogs that use Latin-1 in msgids (to represent
> accented characters in the names of authors). That should not cause
> failures.

Cool, thanks for the feedback Martin!
-Barry