internationalization problem...

Werner Schiendl ws-news at gmx.at
Thu May 2 10:55:43 EDT 2002


Hi,

you must first convert the data you read from the file to unicode.
As Martin already suggested:

utf8_string = get_data_from_somewhere()
target_string = unicode(utf8_string, 'utf-8').encode('iso-8859-2')

or, in your example, use

x = unicode(buf[1], 'utf-8')

instead of x = buf[1]

hth
Werner


"Johann" <programisci at NOSPAM.murator.com.pl> wrote in message
news:28e2du8f1jacggesbq6vdqqctn5424tq7l at 4ax.com...
> On Thu, 02 May 2002 14:33:50 +0200, Johann
> <programisci at NOSPAM.murator.com.pl> wrote:
>
> >>use the 'right' names for the encodings:
> >>
> >>>>> x=u'some text'
> >>>>> x.encode('iso-8859-1')
> >>'some text'
> >>>>> x.encode('iso-8859-2')
> >>'some text'
> >
> >THANX! It solve everything. :-)
>
> Not exactly. I checked it carefully. It does not work as I wanted. :-(
> Let mi show an example. I created with XMLSpy a file coded to utf-8:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <main>aÄ?cÄ?lÅ?nÅ?oósÅ?zÅ¥xź</main>
>
> In interpreter I wrote:
>
> fh = open(path + r'\utf8.xml', 'r')
> buf = fh.readlines()
> fh.close()
> x = buf[1]
> print x
>
> <main>aÃ?â??cÃ?â?·lıâ?¹nıâ?¾oÄ?Å?sıâ?ºzıĽxıż</main>
>
> for c in x.encode('ISO-8859-2'): print c,
>
> < m a i n > a Ã? â?? c Ã? â?· l ı â?¹ n ı â?¾ o Ä? Å? s ı â?º z ı
> Ľ x ı ż < / m a i n >
>
> There are splitted (2-byte) utf-8 characters into two independent
> 8-bit characters. :-( It is not conversion from utf-8 to iso-8859-2 at
> all. I found ot works correctly but only for... 7-bit u'strings'. I
>
> --
> Johann





More information about the Python-list mailing list