unicode to ascii converting
Tom B.
sbabbitt at commspeed.net
Fri Aug 6 14:04:22 EDT 2004
"Peter Wilkinson" <pwilkinson at videotron.ca> wrote in message
news:mailman.1296.1091813051.5135.python-list at python.org...
> Hello tlistmembers,
>
> I am using the encoding function to convert unicode to ascii. At one point
> this code was working just fine, however, now it has broken.
>
> I am reading a text file that has is in unicode (I am unsure of which
> flavour or bit depth). as I read in the file one line at a time
> (readlines()) it converts to ascii. Simple enough. At the same time I am
> copressing to bz2 with the bz2 module but that works just fine. The code
> is and error reported appears below. I am unsure what to do.
>
> I assume that because it is reporting that ordinal is not in range, that
> something to do with the character width that I am reading?
>
> Peter W.
>
> def encode_file(file_path, encode_type, compress='N'):
> """
> Changes encoding of file
> """
> new_encode = encode_type
> old_file_path = file_path + '.old'
> new_file_path = file_path
> os.rename(file_path,old_file_path)
> file_in = file(old_file_path,'r')
>
> if compress == 'Y' or compress == 'y':
> bz_file_path = file_path + '.bz2'
> bz_file_out = bz2.BZ2File(bz_file_path, 'w')
> for line in file_in.readlines():
> bz_file_out.write(line.encode(new_encode))
> bz_file_out.close()
>
> else:
> file_out = file(file_path,'w')
> for line in file_in.readlines():
> file_out.write(line.encode(new_encode))
> file_out.close()
>
> file_in.close()
> os.remove(old_file_path)
>
> ERROR Reported:
>
> Parsing
>
X:\GenomeQuebec_repository\microarray\HIS\M15K\Step_1_repository\HISH0224.tx
t
> Traceback (most recent call last):
> File "C:\Program Files\ActiveState Komodo 2.5\callkomodo\kdb.py", line
> 433, in _do_start
> self.kdb.run(code_ob, locals, locals)
> File "C:\Python23\lib\bdb.py", line 350, in run
> exec cmd in globals, locals
> File "C:\Python23\Lib\site-packages\xBio\Scripts\unicodeToAscii.py",
> line 158, in ?
> main()
> File "C:\Python23\Lib\site-packages\xBio\Scripts\unicodeToAscii.py",
> line 75, in main
> encode_file(fileToProcess, options.encode, 'Y')
> File "C:\Python23\Lib\site-packages\xBio\Scripts\unicodeToAscii.py",
> line 144, in encode_file
> bz_file_out.write(line.encode(new_encode))
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0:
> ordinal not in range(128)
>
I've encountered this problem before and the solution I've come up with a
fix that works but is probably not the best,
def is_ord (strng):
new_text = ''
for i in strng:
if ord(i) > 127:
new_text = new_text + ''
else:
new_text = new_text + i
return new_text
#Then just,
text_from_file = is_ord(text_from_file)
Tom
More information about the Python-list
mailing list