From diedrich@xmission.com Tue Jan 1 19:28:39 2002 From: diedrich@xmission.com (Karl T. Diedrich) Date: Tue, 1 Jan 2002 13:28:39 -0600 Subject: [I18n-sig] python gettext example? Message-ID: <200201011928.g01JSdm02291@hanguk.homenet.org> Hello, Are there an examples of Python programs using gettext. I am trying to work up a simple example so I can internationalize an open source Python project I have. I think I know how to prepare the source code prepared but I don't understand how it all works together. I put this at the top of files: from gettext import gettext as _ from gettext import bindtextdomain, textdomain from os import sep from locale import setlocale, LC_ALL LOCALE_PREFIX = "%susr" % (sep) LOCALE_DIR = "%s%sshare%slocale" % ( LOCALE_PREFIX, sep, sep ) PACKAGE = "deodas" setlocale( LC_ALL ) bindtextdomain( PACKAGE, LOCALE_DIR ) textdomain( PACKAGE ) # in the code string = _( u'A sentance to translate.' ) The source code looks like module/__init__.py module/etc... po/POTFILES.in po/es.po po/es.mo I used pygettext filename & msgfmt.py es.po to pull out the strings and make the translation catalog. How do I run the program using a translation file? -- Karl Diedrich http://deodas.sourceforge.net/ From martin@v.loewis.de Tue Jan 1 21:29:30 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: Tue, 1 Jan 2002 22:29:30 +0100 Subject: [I18n-sig] python gettext example? In-Reply-To: <200201011928.g01JSdm02291@hanguk.homenet.org> (diedrich@xmission.com) References: <200201011928.g01JSdm02291@hanguk.homenet.org> Message-ID: <200201012129.g01LTUJ13439@mira.informatik.hu-berlin.de> > Are there an examples of Python programs using gettext. mailman does. > I put this at the top of files: > > from gettext import gettext as _ > from gettext import bindtextdomain, textdomain > from os import sep > from locale import setlocale, LC_ALL > > LOCALE_PREFIX = "%susr" % (sep) > LOCALE_DIR = "%s%sshare%slocale" % ( LOCALE_PREFIX, sep, sep ) If you install into the system locale dir (i.e. the one of the python prefix), you don't need to bind the text domain; the default search path should be fine. > PACKAGE = "deodas" > > setlocale( LC_ALL ) This has no effect. To set the locale, use setlocale(LC_ALL, "") OTOH, gettext.py will work even without a setlocale call. > The source code looks like > module/__init__.py > module/etc... > po/POTFILES.in > po/es.po > po/es.mo [...] > How do I run the program using a translation file? You should install the files, into /usr/share/locale/es/LC_MESSAGES/deodas.mo. Alternatively, you could install them anywhere else, e.g. /tmp/LC_MESSAGES/deodas.mo then you should set LOCALE_DIR to /tmp. If you want to use the locale files right in their soure location, you should do trans = gettext.GNUTranslation(open("po/es.mo")) _ = trans.gettext HTH, Martin From jdavid@nuxeo.com Thu Jan 3 17:41:23 2002 From: jdavid@nuxeo.com (Juan David =?ISO-8859-1?Q?Ib=E1=F1ez?= Palomar) Date: Thu, 03 Jan 2002 18:41:23 +0100 Subject: [I18n-sig] Normal and unicode strings Message-ID: <3C3497C3.2040905@nuxeo.com> Hi all, I've started to look at Unicode.. There're two types of strings in Python, 'str' and 'unicode'. I guess there're technical reasons to have two different classes. Please, could somebody explain me these reasons? (or tell me where this is documented). Please, keep in mind that I've never looked at the Python sources and I'm still quite ignorant about Unicode. I think that for the user (the Python programmer) it would be better to have only one class of strings, if possible of course. Is there any chance that this will be addressed in future versions of Python? Something similar to the unification of integers and long integers. I haven't found anything in the index of PEPs. Many thanks for your time, jdavid From martin@v.loewis.de Thu Jan 3 22:10:18 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: Thu, 3 Jan 2002 23:10:18 +0100 Subject: [I18n-sig] Normal and unicode strings In-Reply-To: <3C3497C3.2040905@nuxeo.com> (message from Juan David =?ISO-8859-1?Q?Ib=E1=F1ez?= Palomar on Thu, 03 Jan 2002 18:41:23 +0100) References: <3C3497C3.2040905@nuxeo.com> Message-ID: <200201032210.g03MAIp01523@mira.informatik.hu-berlin.de> From martin@v.loewis.de Thu Jan 3 22:20:12 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: Thu, 3 Jan 2002 23:20:12 +0100 Subject: [I18n-sig] Normal and unicode strings In-Reply-To: <3C3497C3.2040905@nuxeo.com> (message from Juan David =?ISO-8859-1?Q?Ib=E1=F1ez?= Palomar on Thu, 03 Jan 2002 18:41:23 +0100) References: <3C3497C3.2040905@nuxeo.com> Message-ID: <200201032220.g03MKCr01551@mira.informatik.hu-berlin.de> > I've started to look at Unicode.. > > There're two types of strings in Python, 'str' and 'unicode'. > I guess there're technical reasons to have two different > classes. Please, could somebody explain me these reasons? Strings, traditionally, have been used for two things: - byte strings, as you get them when reading from a file or a network connection, or interacting with the operating system in a variety of other ways, and - character strings, to represent text - typically intended for the eventual display to the user using glyphs in some font. Notice that both uses of strings are equally important. If you disagree, just consider how you would do things like bitmaps (GIF files, JPEG files, video streams) or networking protocols (like HTTP or NFS) without byte strings. It turns out that there is no meaningful way to support both simultaneously. To support bytes properly (including the C API), you really need the property that each element has 256 values which form a contiguous block in your computer's memory. To support character strings properly, you need much more than 256 values. Unicode is an international standard that associated well-defined meanings with more than 100,000 of these values, so that all languages can represent all characters in a single character set. > Please, keep in mind that I've never looked at the Python sources > and I'm still quite ignorant about Unicode. If you really want to get familiar with Unicode, the Python documentation alone is the wrong place. Please refer to www.unicode.org; they recommend to by their book, but have a lot of introductory material also. > I think that for the user (the Python programmer) it would > be better to have only one class of strings, if possible of > course. No. The user should be always aware whether what he has is a byte string or a character string. For byte strings, the type name 'str' should be used; for character strings, the type named 'unicode' is good. > Is there any chance that this will be addressed in future versions > of Python? Perhaps, but it is unclear how this could work. Most likely, string literals would mean "character string", but then people that want to have byte string literals will complain - even the standard library uses both byte string literals and character string literals, without distinguishing between them. There is a patch on SF proposing a migration strategy: First introduce the notion of byte string literals (b'HTTP/1.0'), then, years later, consider changing the meaning of plain strings to mean Unicode. Regards, Martin From mal@lemburg.com Fri Jan 4 09:25:28 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 04 Jan 2002 10:25:28 +0100 Subject: [I18n-sig] Normal and unicode strings References: <3C3497C3.2040905@nuxeo.com> <200201032220.g03MKCr01551@mira.informatik.hu-berlin.de> Message-ID: <3C357508.6073EB94@lemburg.com> "Martin v. Loewis" wrote: > > > Please, keep in mind that I've never looked at the Python sources > > and I'm still quite ignorant about Unicode. > > If you really want to get familiar with Unicode, the Python > documentation alone is the wrong place. Please refer to > www.unicode.org; they recommend to by their book, but have a lot of > introductory material also. You might also want to take a look at the slides I have on the Python Software pages (see link in sig): I gave a talk about Unicode and Python at the Bordeaux conference last year. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From jdavid@nuxeo.com Fri Jan 4 18:58:21 2002 From: jdavid@nuxeo.com (Juan David =?ISO-8859-1?Q?Ib=E1=F1ez?= Palomar) Date: Fri, 04 Jan 2002 19:58:21 +0100 Subject: [I18n-sig] Normal and unicode strings References: <3C3497C3.2040905@nuxeo.com> <200201032220.g03MKCr01551@mira.informatik.hu-berlin.de> <3C357508.6073EB94@lemburg.com> Message-ID: <3C35FB4D.2060108@nuxeo.com> Many thanks Martin and Marc-Andre for helping me with with this, now I understand it better. That's all for now. Cheers. M.-A. Lemburg wrote: >"Martin v. Loewis" wrote: > >>>Please, keep in mind that I've never looked at the Python sources >>>and I'm still quite ignorant about Unicode. >>> >>If you really want to get familiar with Unicode, the Python >>documentation alone is the wrong place. Please refer to >>www.unicode.org; they recommend to by their book, but have a lot of >>introductory material also. >> > >You might also want to take a look at the slides I have on >the Python Software pages (see link in sig): I gave a talk >about Unicode and Python at the Bordeaux conference last year. > > -- J. David Ibáñez, Nuxeo.com Python programmer (http://www.python.org) From Misha.Wolf@reuters.com Fri Jan 4 22:06:52 2002 From: Misha.Wolf@reuters.com (Misha.Wolf@reuters.com) Date: Fri, 04 Jan 2002 22:06:52 +0000 Subject: [I18n-sig] Last Call for Papers - 21st Unicode Conference - May 2002 - Dublin, Ireland Message-ID: >>>>>>>>>>>>>>>>>>>>>>> Last Call for Papers! <<<<<<<<<<<<<<<<<<<<<<< Twenty-First International Unicode Conference (IUC21) Unicode, Localization and the Web: The Global Connection http://www.unicode.org/iuc/iuc21 May 14-17, 2002 Dublin, Ireland >>>>>>>>>>>>>>>>>>>>>>>>> Just 1 week to go! <<<<<<<<<<<<<<<<<<<<<<<< Submissions due: January 11, 2002 Notification date: February 1, 2002 Completed papers due : February 22, 2002 (in electronic form and camera-ready paper form) >>>>>>>>>>>>>>>>>>> Send in your submission now! <<<<<<<<<<<<<<<<<<<< The Unicode Standard has become the foundation for all modern text processing. It is used on large machines, tiny portable devices, and for distributed processing across the Internet. The standard brings cost-reducing efficiency to international applications and enables the exchange of text in an ever increasing list of natural languages. New technologies and innovative Internet applications, as well as the evolving Unicode Standard, bring new challenges along with their new capabilities. This technical conference will explore the opportunities created by the latest advances and how to leverage them, as well as potential pitfalls to be aware of, and problem areas that need further research. We invite you to submit papers which either define the software of tomorrow, demonstrate best practice with today's software, or articulate problems that must be solved before further advances can occur. Papers should discuss subjects in the context of Unicode, internationalization or localization. You can view the programs of previous conferences at: http://www.unicode.org/unicode/conference/about-conf.html Conference attendees are generally involved in either the development, deployment or use of Unicode software or content, or the globalization of software and the Internet. They include managers, software engineers, systems analysts, font designers, graphic designers, content developers, technical writers, and product marketing personnel. THEME & TOPICS Computing with Unicode is the overall theme of the Conference. Presentations should be geared towards a technical audience. Topics of interest include, but are not limited to, the following (within the context of Unicode, internationalization or localization): - UTFs: Not enough or too many? - Security concerns e.g. Avoiding the spoofing of UTF-8 data - Impact of new encoding standards - Implementing Unicode: Practical and political hurdles - Portable devices - Implementing new features of recent versions of Unicode - Algorithms (e.g. normalization, collation, bidirectional) - Programming languages and libraries (Java, Perl, et al) - The World Wide Web (WWW) - Search engines - Library and archival concerns - Operating systems - Databases - Large scale networks - Government applications - Evaluations (case studies, usability studies) - Natural language processing - Migrating legacy applications - Cross platform issues - Printing and imaging - Optimizing performance of systems and applications - Testing applications - XML and Web protocols - Business models for software development (e.g. Open source) SESSIONS The Conference Program will provide a wide range of sessions including: - Keynote presentations - Workshops/Tutorials - Technical presentations - Panel sessions All sessions except the Workshops/Tutorials will be of 40 minute duration. In some cases, two consecutive 40 minute program slots may be devoted to a single session. The Workshops/Tutorials will each last approximately three hours. They should be designed to stimulate discussion and participation, using slides and demonstrations. PUBLICITY If your paper is accepted, your details will be included in the Conference brochure and Web pages and the paper itself will appear on a Conference CD, with an optional printed book of Conference Proceedings. CONFERENCE LANGUAGE The Conference language is English. All submissions, papers and presentations should be provided in English. SUBMISSIONS Submissions MUST contain: 1. An abstract of 150-250 words, consisting of statement of purpose, paper description, and your conclusions or final summary. 2. A brief biography. 3. The details listed below: SESSION TITLE: _________________________________________ _________________________________________ TITLE (eg Dr/Mr/Mrs/Ms): _________________________________________ NAME: _________________________________________ JOB TITLE: _________________________________________ ORGANIZATION/AFFILIATION: _________________________________________ ORGANIZATION'S WWW URL: _________________________________________ OWN WWW URL: _________________________________________ ADDRESS FOR PAPER MAIL: _________________________________________ _________________________________________ _________________________________________ TELEPHONE: _________________________________________ FAX: _________________________________________ E-MAIL ADDRESS: _________________________________________ TYPE OF SESSION: [ ] Keynote presentation [ ] Workshop/Tutorial [ ] Technical presentation [ ] Panel PANELISTS (if Panel): _________________________________________ _________________________________________ _________________________________________ _________________________________________ _________________________________________ _________________________________________ _________________________________________ _________________________________________ TARGET AUDIENCE (you may select more than one category): [ ] Content Developers [ ] Font Designers [ ] Graphic Designers [ ] Managers [ ] Marketers [ ] Software Engineers [ ] Systems Analysts [ ] Technical Writers [ ] Others (please specify): _________________________________________ _________________________________________ LEVEL OF SESSION (you may select more than one category): [ ] Beginner [ ] Intermediate [ ] Advanced Submissions should be sent by e-mail to either of the following addresses: papers@unicode.org info@global-conference.com They should use ASCII, non-compressed text and the following subject line: Proposal for IUC 21 If desired, a copy of the submission may also be sent by post to: 21st International Unicode Conference c/o Global Meeting Services, Inc. 8949 Lombard Place #416 San Diego, CA 92122 USA Tel: +1 858 638 0206 Fax: +1 858 638 0504 CONFERENCE PROCEEDINGS All Conference papers will be published on CD. Printed proceedings will be offered as an option. EXHIBIT OPPORTUNITIES The Conference will have an Exhibition area for corporations or individuals who wish to display and promote their products, technology and/or services. Every effort will be made to provide maximum exposure and advertising. Exhibit space is limited. For further information or to reserve a place, please contact Global Meeting Services at the above location. CONFERENCE VENUE The Burlington Hotel Upper Leeson Street Dublin 4 Ireland Tel: +353 1 660 5222 Fax: +353 1 660 8496 THE UNICODE CONSORTIUM The Unicode Consortium was founded as a non-profit organization in 1991. It is dedicated to the development, maintenance and promotion of The Unicode Standard, a worldwide character encoding. The Unicode Standard encodes the characters of the world's principal scripts and languages, and is code-for-code identical to the international standard ISO/IEC 10646. In addition to cooperating with ISO on the future development of ISO/IEC 10646, the Consortium is responsible for providing character properties and algorithms for use in implementations. Today the membership base of the Unicode Consortium includes major computer corporations, software producers, database vendors, research institutions, international agencies and various user groups. For further information on the Unicode Standard, visit the Unicode Web site at http://www.unicode.org or e-mail * * * * * Unicode(r) and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission. ------------------------------------------------------------- --- Visit our Internet site at http://www.reuters.com Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd. From Misha.Wolf@reuters.com Mon Jan 7 22:34:44 2002 From: Misha.Wolf@reuters.com (Misha.Wolf@reuters.com) Date: Mon, 07 Jan 2002 22:34:44 +0000 Subject: [I18n-sig] 20th Unicode Conference, Jan 2002, Washington DC -- Three weeks to go! Message-ID: >>>>>>>>>>>>>>>>>>>>>>>> Just 3 weeks to go! <<<<<<<<<<<<<<<<<<<<<<<< Twentieth International Unicode Conference (IUC20) Unicode and the Web: The Global Connection http://www.unicode.org/iuc/iuc20 January 28-31, 2002 Washington, DC, USA >>>>>>>>>>>>>>>>>>>>>>>>>>> Register now! <<<<<<<<<<<<<<<<<<<<<<<<<<< NEWS * Hotel guest room group rate extended to January 10! * Early bird registration rate extended to January 18! * Visit the Conference Web site ( http://www.unicode.org/iuc/iuc20 ) to check the updated Conference program and register. To help you choose Conference sessions, we've included abstracts of talks and speakers' biographies. * The World Wide Web Consortium (W3C) Internationalization Workshop is taking place in the same venue, on February 1 -- See the Call for Participation ( http://www.w3.org/2002/02/01-i18n-workshop/cfp ) CONFERENCE SPONSORS Agfa Monotype Corporation Basis Technology Corporation Microsoft Corporation Netscape Communications Oracle Corporation Progress Software Corporation Reuters Ltd. Sun Microsystems, Inc. World Bank World Wide Web Consortium (W3C) CONFERENCE VENUE Omni Shoreham Hotel 2500 Calvert Street, NW Washington, DC 20008 USA Tel: +1 202 234 0700 Fax: +1 202 265 7972 GLOBAL COMPUTING SHOWCASE Visit the Showcase to find out more about products supporting the Unicode Standard, and products and services that can help you globalize/localize your software, documentation and Internet content. For details, visit the Conference Web site: http://www.unicode.org/iuc/iuc20 Exhibitors to date include: * Agfa/Monotype Corporation * Basis Technology Corporation * InfoTech * Language Technology Research Center * Multilingual Computing, Inc. * Rasmussen Software, Inc. * SymbioSys, Inc. CONFERENCE MANAGEMENT Global Meeting Services Inc. 8949 Lombard Place #416 San Diego, CA 92122, USA Tel: +1 858 638 0206 (voice) +1 858 638 0504 (fax) Email: info@global-conference.com or: conference@unicode.org * * * * * Unicode(r) and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission. -------------------------------------------------------------- -- Visit our Internet site at http://www.reuters.com Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd. From martin@v.loewis.de Tue Jan 8 21:32:18 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: Tue, 8 Jan 2002 22:32:18 +0100 Subject: [I18n-sig] Bug #500595 Message-ID: <200201082132.g08LWIH02077@mira.informatik.hu-berlin.de> In http://sourceforge.net/tracker/index.php?func=detail&aid=500595&group_id=5470&atid=105470 the submitter reports that gettext.install fails if no catalog is found. This is undesirable. Instead, it should fallback to installing a _ function that is the identity mapping. I propose the following strategy to fix that, both in 2.2.1, and 2.3: - add a parameter fallback= to gettext.translation which, if set to true, returns a NullTranslation if no translation can be located, instead of raising an exception. - in gettext.install, call translation() with fallback=1. What do you think? Would it be also be acceptable to reverse the behaviour, making fallback=0 the default (so that you'll have to explicitly request the exception, instead of requesting the null translation) I'll also like to implement a per-message fallback mechanism, but that is certainly for 2.3 (and out of scope of this report). Regards, Martin From Misha.Wolf@reuters.com Fri Jan 18 19:51:29 2002 From: Misha.Wolf@reuters.com (Misha.Wolf@reuters.com) Date: Fri, 18 Jan 2002 19:51:29 +0000 Subject: [I18n-sig] 20th Unicode Conference, Jan 2002, Washington DC -- Just 1 week to go! Message-ID: >>>>>>>>>>>>>>>>>>>>>>>>> Just 1 week to go! <<<<<<<<<<<<<<<<<<<<<<<< Twentieth International Unicode Conference (IUC20) Unicode and the Web: The Global Connection http://www.unicode.org/iuc/iuc20 January 28-31, 2002 Washington, DC, USA >>>>>>>>>>>>>>>>>>>>>>>>>>> Register now! <<<<<<<<<<<<<<<<<<<<<<<<<<< NEWS * Hotel guest rooms still available at the group rate. * Visit the Conference Web site ( http://www.unicode.org/iuc/iuc20 ) to check the updated Conference program and register. To help you choose Conference sessions, we've included abstracts of talks and speakers' biographies. * The World Wide Web Consortium (W3C) Internationalization Workshop is taking place in the same venue, on February 1 -- See the Call for Participation ( http://www.w3.org/2002/02/01-i18n-workshop/cfp ) CONFERENCE SPONSORS Agfa Monotype Corporation Basis Technology Corporation Microsoft Corporation Netscape Communications Oracle Corporation Progress Software Corporation Reuters Ltd. Sun Microsystems, Inc. World Bank World Wide Web Consortium (W3C) CONFERENCE VENUE Omni Shoreham Hotel 2500 Calvert Street, NW Washington, DC 20008 USA Tel: +1 202 234 0700 Fax: +1 202 265 7972 GLOBAL COMPUTING SHOWCASE Visit the Showcase to find out more about products supporting the Unicode Standard, and products and services that can help you globalize/localize your software, documentation and Internet content. For details, visit the Conference Web site: http://www.unicode.org/iuc/iuc20 Exhibitors to date include: * Agfa/Monotype Corporation * Basis Technology Corporation * Everlasting Systems Ltd. * InfoTech * Language Technology Research Center * Multilingual Computing, Inc. * Rasmussen Software, Inc. * SymbioSys, Inc. * TRADOS Corporation CONFERENCE MANAGEMENT Global Meeting Services Inc. 8949 Lombard Place #416 San Diego, CA 92122, USA Tel: +1 858 638 0206 (voice) +1 858 638 0504 (fax) Email: info@global-conference.com or: conference@unicode.org * * * * * Unicode(r) and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission. ------------------------------------------------------------- --- Visit our Internet site at http://www.reuters.com Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.