import and package confusion

Thu Apr 30 18:42:47 EDT 2009

Terry Reedy wrote:
> Dale Amon wrote:
> 
>> Now I can move on to parsing those pesky Fortran card
>> images... There wouldn't happen to be a way to take n
>> continguous slices from a string (card image) where each slice may be 
>> a different length would there? Fortran you know. No spaces between 
>> input fields. :-)
>>
>> I know a way to do it, iterating over a list of slice sizes,
> 
> Yes.
> 
>> perhaps in a list comprehension, but some of the august python 
>> personages here no doubt know better ways.
> 
> No.  Off the top of my head, here is what I would do something like 
> (untested)
> 
> def card_slice(card, sizes):
>   "Card is data input string. Sizes is an iterable of int field sizes, 
> where negatives are skipped fields.  Return list of strings."
>   pos, ret = 0, []
>   for i in sizes:
>     if i > 0:
>       ret.append(card[pos:pos+i])
>     else:
>       i = -i
>     pos += i
>   return ret
> 
> To elaborate this, make sizes an iterable of (size, class) pairs, where 
> class is str, int, or float (for Fortran) or other for more generel use. 
>  Then
>   ...
>   for i,c in sizes:
>     if i > 0:
>       ret.append(c(card[pos:pos+i]))
> 
> Terry Jan Reedy
> 
> -- 
> http://mail.python.org/mailman/listinfo/python-list
> 
------------------------------------------------

Terry is on right track.

I use this:

*              1#
comment="""
      Structure for TYPE III file named  <tmp_att.dbf>
      Number of bytes per record : 129
      Number of fields in record : 25
      Number of records in file  : 0
      Date file was last updated : 11/20/ 8
      Field   Label       Type  Size/Dec.  Offset
         1    AREA         N      14  3        1
         2    PERIMITER    N      14  3       15
         3    DUMMY1       N      11  0       29
         4    DUMMY2       N      11  0       40
         5    BL_X         N      12  0       51
         6    BL_Y         N      12  0       63
         7    ACRES        C      13  0       75
         .
         .
"""

Then:
open file
skip any header bytes
while not eof:
   read full record (129 bytes in this case) into buffer (tstln)

#parse:
   PERIMITER= tstln[15:29]
   DUMMY1= tstln[29:40]
   DUMMY2= tstln[40:51]
   BL_X= tstln[51:63]
   BL_Y= tstln[63:75]
   ACRES= tstln[75:88]
   .
   .
   do stuff
#loop

The other method he mentions works too.  I create an Ashton-Tate dBASE 
III+ from a template file, add the structure I want and populate it 
from, well... a wide variety of sources. The 'header' has all the field 
names and sizes and such. I can then use the above manual method or let 
the program do the parsing.  The second method is much more flexible.

import StringIO

   source1= StringIO.StringIO("""(This file must be converted with 
BinHex 4.0)

:#h0[GA*MC6%ZC'*Q!$q3#!#3"!4L!*!%'H!$#!B$!*!%B36M!*!98e4"9%P26Pp
133"$#3$'44i!!!!"!*!,48a&63#3"d-R!-C&"!!!!!%!N!YC48&568m!N!9$+`$
'43B!!!!"!*!,4%&C-$%!-3#3"%-a!-C&"J!!!!%!N!Y%39N`-J!b!*!%3cF!aN8
'!!!!!3#3#d4"@6!c!$-!N!4$23$'43B!!!!"!*!,4%&C-$3!0!#3"%0$!-C&"J!
!!!%!N!Y%39N`03!e!*!%3dN!aN8'!!!!!3#3#d4"@6!f!$B!N!4$6`$'43B!!!!
"!*!,4%&C-$F!0`#3"%09!-C&"J!!!!%!N!Y%39N`1!!i!*!%3eX!aN8'!!!!!3#
3#d4"@6!j!$N!N!4$B3$'43B!!!!"!*!,4%&C-6!!-!#3"%0R!-C&"J!!!!%!N!Y
%39Na-3#3"N0Y!-C&"J!!!!%!N!Y%39Na-J!b!*!%3h-!aN8'!!!!!3#3#d4"@6%
c!$-!N!4$H3$'43B!!!!"!*!,4%&C-63!0!#3"%0r!-C&"J!!!!%!N!Y%39Na03!
e!*!%3i8!aN8'!!!!!3#3#d4"@6%f!$B!N!4$L`$'43B!!!!"!*!,4%&C-6F!0`#
3"%14!-C&"J!!!!%!N!Y%39Na1!!i!*!%3jF!aN8'!!!!!3#3#d4"@6%j!$N!N!4
$R3$'43B!!!!"!*!,4%&C-M!!-!#3"%1M!-C&"J!!!!%!N!Y%39Nb-3!a!*!%3kN
!aN8'!!!!!3#3#d4"@6)b!$)!N!4$V`$'43B!!!!"!*!,4%&C-M-!-`#3"%1e!-C
&"J!!!!%!N!Y%39Nb0!!d!*!%3lX!aN8'!!!!!3#3#d4"@6)e!$8!N!4$`3$'43B
!!!!"!*!,4%&C-MB!0J#3"%2(!-C&"J!!!!%!N!Y%39Nb0`!h!*!%3md!aN8'!!!
!!3#3#d4"@6)i!$J!N!4$d`$'43B!!!!"!*!,4%&C-MN!13#3"%2C!-C&"J!!!!%
!N!Y%39Nc-!!`!*!%3pm!aN8'!!!!!3#3#d4"@6-a!$%!N!4$j3$'43B!!!!"!*!
,$4TU9`!!:
""")
   hexbin(source1, 'source1.dbf')
   source1.close()
   del source1
######### above makes a structure

def rdhdr(adbf):
   adbf.seek(4)                                     #ver,yr,mo,day
   hdr= struct.unpack('L',adbf.read(4))             #number of records
   hdr= hdr + struct.unpack('H',adbf.read(2))       #length of header
   hdr= hdr + struct.unpack('H',adbf.read(2))       #length of records
   adbf.seek(32)
   fld= 1
   while adbf.tell() < (hdr[1] - 32):          #each field def is 32bytes
     adbf.seek(fld*32)
     hdrn= struct.unpack('11s',adbf.read(11))[0].strip('x\00')
     adbf.seek(5,1)
     hdrs= (struct.unpack('B',adbf.read(1))[0])
     hdr= hdr + ((hdrn,hdrs,fld),)             #name,size,seq.number
     fld= fld + 1
   return(hdr)
################ above sets up for parsing  (reading or writing or both)

If the Fortran is in fact from a punch card then your record will be 80 
columns (IBM) or 90 (UniVac).  The green bar paper is 132.  In each 
case, offset zero is page control and last 4 columns of card are for 
sequencing the cards in case you (or someone) dropped the deck.  Last 6 
columns for sequence on green bar as I recall. (decks numbers additive)

The advantage of using the .dbf is it creates a user friendly file. 
Excel, well - almost any spread sheet or database program.
Plus it is a dbf and database operations are 'right there'. Microsoft 
office, OpenOffice and the list goes on, all read/write .dbf

By the way - CSV (comma seperated values) were in use in the past, but 
due to memory (or lack of) were only for sequential use. You have to 
count the commas and compare to number expected at EOL.
              SDF (standard data format) the card was for random access.
                 good old  lseek(record_number * record_size, basepoint)

snipets are from actual code dated Feb 2008.
using Python 2.5.2 on Linux Slackware 10.2
today: 20090430

Steve