[Tutor] Reading characters from file in binary mode

Kent Johnson kent37 at tds.net
Tue Jun 6 18:21:51 CEST 2006


Kermit Rose wrote:
> Hello.
>  
> I wish to translate a SAS data file to text, and do not have the 
> professional
> version of SAS to do so.
>  
> I have the student version of SAS, and have translated the shortest of 4 SAS
> data sets given.
>  
> For the other 3, I wish to construct a python program to read the 
> characters in, one
> at a time, translate them to hexadecimal, then figure out how the data 
> matches
> the data dictionary that I have.
>  
> I experimented with writing code in C++ to do this.
>  
> My first experiment, in C++  is
>  
> #include <stdio.h>
> #include <iostream>
> #define TRUE    1                       /* Define some handy constants  */
> #define FALSE   0                       /* Define some handy constants  */
> ifstream f("CMT_MCAID",ios_base::binary);
> ofstream G("mcaid.txt",ios_base::app);
> char ch
> int k
> int kh,kl
> int limit
> limit = 1000
>  
> for (int I=1;I<= limit;I++)
>  
> {
> f >> ch;
> k = ch;
> kl = k%16;
> kh = (k -kl)/16;
> G << kh," ",kl," ";
> }
>  
>  
> How can I begin to experiment using python?  What would be python code 
> equivalent
> to the above C++ code?

Hmm, my C++ is remarkably rusty but I think you want something like this:

inp = open("CMT_MCAID", "rb")
out = open("mcaid.txt", "w")

for i in range(1000):
   ch = inp.read(1)
   if not ch: break # EOF
   k = ord(ch) # convert to integer
   kl = k % 16
   kh = k / 16
   out.write('%x %x ' % (kh, kl))
out.close()

If your input file will fit in memory, there is no need to read a byte 
at a time, you could change the for / read /test to this:
for ch in inp.read()[:1000]:

If you can live without the space between the two digits you could use
out.write('%02x' % k)

With these two changes the entire loop becomes
for ch in inp.read()[:1000]:
   out.write('%02x' % ord(ch))


If your input files are in a well-understood format, you might be 
interested in the struct module in the standard lib, which will unpack 
fixed format binary data, or pyconstruct which I think is a bit more 
flexible:
http://pyconstruct.wikispaces.com/

Kent



More information about the Tutor mailing list