unicode to ascii converting

Tom B. sbabbitt at commspeed.net
Fri Aug 6 14:04:22 EDT 2004


"Peter Wilkinson" <pwilkinson at videotron.ca> wrote in message
news:mailman.1296.1091813051.5135.python-list at python.org...
> Hello tlistmembers,
>
> I am using the encoding function to convert unicode to ascii. At one point
> this code was working just fine, however, now it has broken.
>
> I am reading a text file that has is in unicode (I am unsure of which
> flavour or bit depth). as I read in the file one line at a time
> (readlines()) it converts to ascii. Simple enough. At the same time I am
> copressing to bz2 with the bz2 module but that works just fine.  The code
> is and error reported appears below. I am unsure what to do.
>
> I assume that because it is reporting that ordinal is not in range, that
> something to do with the character width that I am reading?
>
> Peter W.
>
> def encode_file(file_path, encode_type, compress='N'):
>      """
>      Changes encoding of file
>      """
>      new_encode = encode_type
>      old_file_path = file_path + '.old'
>      new_file_path = file_path
>      os.rename(file_path,old_file_path)
>      file_in  = file(old_file_path,'r')
>
>      if compress == 'Y' or compress == 'y':
>          bz_file_path = file_path + '.bz2'
>          bz_file_out  = bz2.BZ2File(bz_file_path, 'w')
>          for line in file_in.readlines():
>              bz_file_out.write(line.encode(new_encode))
>          bz_file_out.close()
>
>      else:
>          file_out = file(file_path,'w')
>          for line in file_in.readlines():
>              file_out.write(line.encode(new_encode))
>          file_out.close()
>
>      file_in.close()
>      os.remove(old_file_path)
>
> ERROR Reported:
>
> Parsing
>
X:\GenomeQuebec_repository\microarray\HIS\M15K\Step_1_repository\HISH0224.tx
t
> Traceback (most recent call last):
>    File "C:\Program Files\ActiveState Komodo 2.5\callkomodo\kdb.py", line
> 433, in _do_start
>      self.kdb.run(code_ob, locals, locals)
>    File "C:\Python23\lib\bdb.py", line 350, in run
>      exec cmd in globals, locals
>    File "C:\Python23\Lib\site-packages\xBio\Scripts\unicodeToAscii.py",
> line 158, in ?
>      main()
>    File "C:\Python23\Lib\site-packages\xBio\Scripts\unicodeToAscii.py",
> line 75, in main
>      encode_file(fileToProcess, options.encode,  'Y')
>    File "C:\Python23\Lib\site-packages\xBio\Scripts\unicodeToAscii.py",
> line 144, in encode_file
>      bz_file_out.write(line.encode(new_encode))
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0:
> ordinal not in range(128)
>
I've encountered this problem before and the solution I've come up with a
fix that works but is probably not the best,

def is_ord (strng):
    new_text = ''
    for i in strng:
        if ord(i) > 127:
            new_text = new_text + ''
        else:
            new_text = new_text + i
    return new_text

#Then just,

text_from_file = is_ord(text_from_file)

Tom





More information about the Python-list mailing list