[Tutor] converting EBCIDIC to ASCII

Steven D'Aprano steve at pearwood.info
Sat Jul 14 03:41:53 CEST 2012


Prinn, Craig wrote:
> I am trying to convert an EBCIDIC file to ASCII, when the records are fixed
> length I can convert it fine, I have some files that are coming in as
> variable length records, is there a way to convert the file in Python? I
> tried using no length but then it just reads in to a fixed buffer size and
> I can't seem to break the records properly


I'm afraid that I have no idea what you mean here. What are you actually 
doing? What does "tried using no length" mean?

Converting from one encoding to another should have nothing to do with whether 
they are fixed-length records, variable-length records, or free-form text. 
First you read the file as bytes, then use the encoding to convert to text, 
then process the file however you like.

Using Python 3, I prepared an EBCIDIC file. If I open it in binary mode, you 
get the raw bytes, which are a mess:

py> raw = open('/home/steve/ebcidic.text', 'rb').read()
py> print(raw)
b'\xe3\x88\x89\xa2@\x89\xa2@\\\xa2\x96\x94\x85\\@\xe3 ...

For brevity, I truncated the output.

But if you open in text mode, and set the encoding correctly, Python 
automatically converts the bytes into text according to the rules of EBCIDIC:


py> text = open('/home/steve/ebcidic.text', 'r', encoding='cp500').read()
py> print(text)
This is *some* Text containing "punctuation" & other things(!) which
may{?} NOT be the +++same+++ when encoded into ASCII|EBCIDIC.


This is especially useful if you need to process the file line by line. Simple 
open the file with the right encoding, then loop over the file as normal.


f = open('/home/steve/ebcidic.text', 'r', encoding='cp500')
for line in f:
     print(line)


In this case, I used IBM's standard EBCIDIC encoding for Western Europe. 
Python knows about some others, see the documentation for the codecs module 
for the list.

http://docs.python.org/library/codecs.html
http://docs.python.org/py3k/library/codecs.html

Once you have the text, you can then treat it as fixed width, variable width, 
or whatever else you might have.


Python 2 is a little trickier. You can manually decode the bytes:

# not tested
text = open('/home/steve/ebcidic.text', 'rb').read().decode('cp500')

or you can use the codecs manual to get very close to the same functionality 
as Python 3:

# also untested
import codecs
f = codecs.open('/home/steve/ebcidic.text', 'r', encoding='cp500')
for line in f:
     print line



-- 
Steven



More information about the Tutor mailing list