internationalization problems

Thu May 2 09:40:48 EDT 2002

Johann <programisci at NOSPAM.murator.com.pl> writes:

> So I need to implement codec for 3 different charsets because
> Windows uses window-1250, Linux uses iso-8859-2 and Mac uses its own
> different coding for Polish characters (I assume, my application
> will be in Polish characters native to each platform).

It wouldn't be good to rely on Linux using iso-8859-2; instead, I
recommend to use locale.nl_langinfo(locale.CODESET), which is
available since Python 2.2.

> I thought it would be good to use codec library, but I have almost no
> examples in manual how to use it. I have also no experience with this
> library. Is it good idea to choose it? 

That should work fine for this application

> I have also problem with converting from utf-8 into window-1250 or
> iso-8859-2 (latin-2). E.g.
> 
> x = u'some text'
> x.encode('latin-1')
> # it works but.... I need latin-2...
> x.encode('latin-2')
> #It does not work :-( 
> #err: LookupError: unknown encoding

Use 'iso-8859-2' instead; only a latin-1 alias is available.

In any case, stay away from Unicode literals (unless they contain only
ASCII characters): the bytes in the literal are interpreted as Latin-1.
So the typical conversion is something like

utf8_string = get_data_from_somewhere()
target_string = unicode(utf8_string, 'utf-8').encode('iso-8859-2')

HTH,
Martin