read(1) returns string of length 2

wolfgang haefelinger wh2005 at web.de
Wed Nov 24 07:03:41 EST 2004


Greetings,

I'm trying to read (japanese) chars from a file. While doing so
I encounter that a char with length 2 is returned. Is this to be
expected or is there something wrong?

Basically it's this what I'm doing:

import codecs
f = codecs.open("ident.in",'rb','Shift-JIS')   ##  japanses codecs installed

c = f.read(1)
while c:
   if len(c)==1:
      print hex(ord(c)),
   else:
      print "{",
      for x in c: print hex(ord(x)),
      print "}",
   c = f.read(1)

This is my input (file is also attached):

$ od -tx1 ident.in
0000000 8d 87 8c 76 8e 9e 8a d4 3b 0d 0a
0000013

This is what I'm getting:

$ python ident.py                                          ## python 2.3.4 
on Windows
0x5408 0x8a08 0x6642 0x9593 { 0x3b 0xd } 0xa

"Python" believes that there are 6 chars on the stream while there are
actually 7 chars.

My naive assumption was that f.read(1) returns always a char of length 1 (or
zero).

Remark:
The input is believed to be "SJIS" but I haven't found a Python codecs for 
this.
Therefore I'm using Shift-JIS. Of course this could be the problem. Note 
that
when feeding Java with my input  "correct" using SJIS, chars are spit out:

  c=21512 c=35336 c=26178 c=38291 c=59 c=13 c=10 : 7 char(s)

References:
I downloaded Japanese codecs from here (version: 1.4.10)
  http://www.asahi-net.or.jp/~rd6t-kjym/python/

Thanks for any hints,
Wolfgang.



begin 666 ident.in
+C8>,=HZ>BM0[#0H`
`
end




More information about the Python-list mailing list