unicode to ascii converting

Fri Aug 6 14:39:35 EDT 2004

I  tried the function, actually this does not seem to work as I expected.

What happens is that the character encoding seems to change in the 
following way: placing what is the equivalent of some return character 
after each character ... or when I view the file in excel there is a blank 
row in between between each row.

Its very strange.

back to the drawing board

At 02:17 PM 8/6/2004, Peter Wilkinson wrote:
>Thanks Tom B.,
>
>I will try that for now ....
>
>It would be good to find out _why_ this happens in the first place. I will 
>keep do a little searching on this for a few days.
>
>
>Peter W.
>
>
>At 02:04 PM 8/6/2004, Tom B. wrote:
>
>>"Peter Wilkinson" <pwilkinson at videotron.ca> wrote in message
>>news:mailman.1296.1091813051.5135.python-list at python.org...
>> > Hello tlistmembers,
>> >
>> > I am using the encoding function to convert unicode to ascii. At one point
>> > this code was working just fine, however, now it has broken.
>> >
>> > I am reading a text file that has is in unicode (I am unsure of which
>> > flavour or bit depth). as I read in the file one line at a time
>> > (readlines()) it converts to ascii. Simple enough. At the same time I am
>> > copressing to bz2 with the bz2 module but that works just fine.  The code
>> > is and error reported appears below. I am unsure what to do.
>> >
>> > I assume that because it is reporting that ordinal is not in range, that
>> > something to do with the character width that I am reading?
>> >
>> > Peter W.
>> >
>> > def encode_file(file_path, encode_type, compress='N'):
>> >      """
>> >      Changes encoding of file
>> >      """
>> >      new_encode = encode_type
>> >      old_file_path = file_path + '.old'
>> >      new_file_path = file_path
>> >      os.rename(file_path,old_file_path)
>> >      file_in  = file(old_file_path,'r')
>> >
>> >      if compress == 'Y' or compress == 'y':
>> >          bz_file_path = file_path + '.bz2'
>> >          bz_file_out  = bz2.BZ2File(bz_file_path, 'w')
>> >          for line in file_in.readlines():
>> >              bz_file_out.write(line.encode(new_encode))
>> >          bz_file_out.close()
>> >
>> >      else:
>> >          file_out = file(file_path,'w')
>> >          for line in file_in.readlines():
>> >              file_out.write(line.encode(new_encode))
>> >          file_out.close()
>> >
>> >      file_in.close()
>> >      os.remove(old_file_path)
>> >
>> > ERROR Reported:
>> >
>> > Parsing
>> >
>>X:\GenomeQuebec_repository\microarray\HIS\M15K\Step_1_repository\HISH0224.tx
>>t
>> > Traceback (most recent call last):
>> >    File "C:\Program Files\ActiveState Komodo 2.5\callkomodo\kdb.py", line
>> > 433, in _do_start
>> >      self.kdb.run(code_ob, locals, locals)
>> >    File "C:\Python23\lib\bdb.py", line 350, in run
>> >      exec cmd in globals, locals
>> >    File "C:\Python23\Lib\site-packages\xBio\Scripts\unicodeToAscii.py",
>> > line 158, in ?
>> >      main()
>> >    File "C:\Python23\Lib\site-packages\xBio\Scripts\unicodeToAscii.py",
>> > line 75, in main
>> >      encode_file(fileToProcess, options.encode,  'Y')
>> >    File "C:\Python23\Lib\site-packages\xBio\Scripts\unicodeToAscii.py",
>> > line 144, in encode_file
>> >      bz_file_out.write(line.encode(new_encode))
>> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0:
>> > ordinal not in range(128)
>> >
>>I've encountered this problem before and the solution I've come up with a
>>fix that works but is probably not the best,
>>
>>def is_ord (strng):
>>     new_text = ''
>>     for i in strng:
>>         if ord(i) > 127:
>>             new_text = new_text + ''
>>         else:
>>             new_text = new_text + i
>>     return new_text
>>
>>#Then just,
>>
>>text_from_file = is_ord(text_from_file)
>>
>>Tom
>>
>>
>>--
>>http://mail.python.org/mailman/listinfo/python-list
>
>--
>http://mail.python.org/mailman/listinfo/python-list