text file reformatting

cbrown at cbrownsystems.com cbrown at cbrownsystems.com
Sun Oct 31 17:48:19 EDT 2010


On Oct 31, 12:48 pm, Tim Chase <python.l... at tim.thechases.com> wrote:
> > PRJ01001 4 00100END
> > PRJ01002 3 00110END
>
> > I would like to pick only some columns to a new file and put them to a
> > certain places (to match previous data) - definition file (def.csv)
> > could be something like this:
>
> > VARIABLE   FIELDSTARTS     FIELD SIZE      NEW PLACE IN NEW DATA FILE
> > ProjID     ;       1       ;       5       ;       1
> > CaseID     ;       6       ;       3       ;       10
> > UselessV  ;        10      ;       1       ;
> > Zipcode    ;       12      ;       5       ;       15
>
> > So the new datafile should look like this:
>
> > PRJ01    001       00100END
> > PRJ01    002       00110END
>
> How flexible is the def.csv format?  The difficulty I see with
> your def.csv format is that it leaves undefined gaps (presumably
> to be filled in with spaces) and that you also have a blank "new
> place in new file" value.  If instead, you could specify the
> width to which you want to pad it and omit variables you don't
> want in the output, ordering the variables in the same order you
> want them in the output:
>
>   Variable; Start; Size; Width
>   ProjID; 1; 5; 10
>   CaseID; 6; 3; 10
>   Zipcode; 12; 5; 5
>   End; 16; 3; 3
>
> (note that I lazily use the same method to copy the END from the
> source to the destination, rather than coding specially for it)
> you could do something like this (untested)
>
>    import csv
>    f = file('def.csv', 'rb')
>    f.next() # discard the header row
>    r = csv.reader(f, delimiter=';')
>    fields = [
>      (varname, slice(int(start), int(start)+int(size)), width)
>      for varname, start, size, width
>      in r
>      ]
>    f.close()
>    out = file('out.txt', 'w')
>    try:
>      for row in file('data.txt'):
>        for varname, slc, width in fields:
>          out.write(row[slc].ljust(width))
>        out.write('\n')
>    finally:
>      out.close()
>
> Hope that's fairly easy to follow and makes sense.  There might
> be some fence-posting errors (particularly your use of "1" as the
> initial offset, while python uses "0" as the initial offset for
> strings)
>
> If you can't modify the def.csv format, then things are a bit
> more complex and I'd almost be tempted to write a script to try
> and convert your existing def.csv format into something simpler
> to process like what I describe.
>
> -tkc

To your point about the non-stand csv encoding in the defs.csv file,
you could use a reg exp instead of the csv module to solve that:

    import re

    parse_columns = re.compile(r'\s*;\s*')

    f = file('defs.csv', 'rb')
    f.readline() # discard the header row
    r = (parse_columns.split(line.strip()) for line in f)
    fields = [
     (varname, slice(int(start), int(start)+int(size), int(width) if
width else 0))
        for varname, start, size, width in r
     ]
    f.close()

which given the OP's csv produces for fields:

[('ProjID', slice(1, 6, 1)), ('CaseID', slice(6, 9, 10)), ('UselessV',
slice(10, 11, 0)), ('Zipcode', slice(12, 17, 15))]

and that should work with the remainder of your original code;
although perhaps the OP wants something else to happen when width is
omitted from the csv...

Cheers - Chas




More information about the Python-list mailing list