internationalization problem...

Johann programisci at NOSPAM.murator.com.pl
Thu May 2 09:48:39 EDT 2002


On Thu, 02 May 2002 14:33:50 +0200, Johann
<programisci at NOSPAM.murator.com.pl> wrote:

>>use the 'right' names for the encodings:
>>
>>>>> x=u'some text'
>>>>> x.encode('iso-8859-1')
>>'some text'
>>>>> x.encode('iso-8859-2')
>>'some text'
>
>THANX! It solve everything. :-)

Not exactly. I checked it carefully. It does not work as I wanted. :-(
Let mi show an example. I created with XMLSpy a file coded to utf-8:

<?xml version="1.0" encoding="UTF-8"?>
<main>aÄ?cÄ?lĹ?nĹ?oĂłsĹ?zĹĽxĹş</main>

In interpreter I wrote:

fh = open(path + r'\utf8.xml', 'r')
buf = fh.readlines()
fh.close()
x = buf[1]
print x

<main>aĂ?â??cĂ?â?ˇlÄąâ?šnÄąâ?žoÄ?Ĺ?sÄąâ?şzĹĽxĹş</main>
 
for c in x.encode('ISO-8859-2'): print c,

< m a i n > a Ă? â?? c Ă? â?ˇ l Äą â?š n Äą â?ž o Ä? Ĺ? s Äą â?ş z Äą
Ä˝ x Äą Ĺź < / m a i n >

There are splitted (2-byte) utf-8 characters into two independent
8-bit characters. :-( It is not conversion from utf-8 to iso-8859-2 at
all. I found ot works correctly but only for... 7-bit u'strings'. I

--
Johann



More information about the Python-list mailing list