file - codecs - unicode ???

Sébastien Libert sebastien.libert at comexis.com
Tue Mar 13 05:11:30 EST 2001


Hi,

Thx for your advice !!!!!
It work ( ?? sometimes !!!!)
But i have a lot of problem with the /012 :
If i keep it :

>>> fline=unicode(fline,'UTF-16').encode('LATIN-1')
Traceback (innermost last):
  File "<interactive input>", line 1, in ?
UnicodeError: UTF-16 decoding error: truncated data

if i delete it :

>>> fline=fline[:-1]
>>> fline=unicode(fline,'UTF-16').encode('LATIN-1')
Traceback (innermost last):
  File "<interactive input>", line 1, in ?
UnicodeError: Latin-1 encoding error: ordinal not in range(256)

Another clue :
I have converted the file in UTF-8 ( with notepad ) :

gline
'0.00\011info\011Log_Version\0111.2\012'
>>> gline=unicode(gline,'UTF-8').encode('LATIN-1')
>>> gline
'0.00\011info\011Log_Version\0111.2\012'
;-)

So the file was in UTF-16 and is now in UTF-8 : The result is good for me !
When i do that with python :

f=open("c:\\mylog.log")
>>> fline=f.readline()
>>> fline=unicode(fline,'UTF-16').encode('utf-8')
Traceback (innermost last):
  File "<interactive input>", line 1, in ?
UnicodeError: UTF-16 decoding error: truncated data


If you have another advise for me !! Fell free to answer !
thx
"Marcin 'Qrczak' Kowalczyk" <qrczak at knm.org.pl> wrote in message
news:slrn9aqka1.3vk.qrczak at qrnik.zagroda...
> Mon, 12 Mar 2001 16:29:42 +0100, Sébastien Libert
<sebastien.libert at comexis.com> pisze:
>
> > >>> line
> >
'\377\3760\000.\0000\0000\000\011\000i\000n\000f\000o\000\011\000L\000o\000g
> >
\000_\000S\000t\000a\000n\000d\000a\000r\000d\000\011\000n\000g\000L\000o\00
> > 0g\000\015\000\012'
> >
> > What can i do with this kind of thing ?????
>
> It's encoded in UTF-16, so:
>     unicode(line, 'UTF-16').encode('ASCII')
> except that the '\012' at the end is bogus. It shouldn't be there.
>
> You may want to use a different encoding than ASCII, because ASCII
> is only able to encode Latin letters without accents. If the string
> contains other character, you will get an exception.
>
> --
>  __("<  Marcin Kowalczyk * qrczak at knm.org.pl http://qrczak.ids.net.pl/
>  \__/
>   ^^                      SYGNATURA ZASTÊPCZA
> QRCZAK





More information about the Python-list mailing list