text file reformatting

cbrown at cbrownsystems.com cbrown at cbrownsystems.com
Mon Nov 1 03:59:10 EDT 2010


On Oct 31, 11:46 pm, iwawi <iwawi... at gmail.com> wrote:
> On 31 loka, 21:48, Tim Chase <python.l... at tim.thechases.com> wrote:
>
>
>
> > > PRJ01001 4 00100END
> > > PRJ01002 3 00110END
>
> > > I would like to pick only some columns to a new file and put them to a
> > > certain places (to match previous data) - definition file (def.csv)
> > > could be something like this:
>
> > > VARIABLE   FIELDSTARTS     FIELD SIZE      NEW PLACE IN NEW DATA FILE
> > > ProjID     ;       1       ;       5       ;       1
> > > CaseID     ;       6       ;       3       ;       10
> > > UselessV  ;        10      ;       1       ;
> > > Zipcode    ;       12      ;       5       ;       15
>
> > > So the new datafile should look like this:
>
> > > PRJ01    001       00100END
> > > PRJ01    002       00110END
>
> > How flexible is the def.csv format?  The difficulty I see with
> > your def.csv format is that it leaves undefined gaps (presumably
> > to be filled in with spaces) and that you also have a blank "new
> > place in new file" value.  If instead, you could specify the
> > width to which you want to pad it and omit variables you don't
> > want in the output, ordering the variables in the same order you
> > want them in the output:
>
> >   Variable; Start; Size; Width
> >   ProjID; 1; 5; 10
> >   CaseID; 6; 3; 10
> >   Zipcode; 12; 5; 5
> >   End; 16; 3; 3
>
> > (note that I lazily use the same method to copy the END from the
> > source to the destination, rather than coding specially for it)
> > you could do something like this (untested)
>
> >    import csv
> >    f = file('def.csv', 'rb')
> >    f.next() # discard the header row
> >    r = csv.reader(f, delimiter=';')
> >    fields = [
> >      (varname, slice(int(start), int(start)+int(size)), width)
> >      for varname, start, size, width
> >      in r
> >      ]
> >    f.close()
> >    out = file('out.txt', 'w')
> >    try:
> >      for row in file('data.txt'):
> >        for varname, slc, width in fields:
> >          out.write(row[slc].ljust(width))
> >        out.write('\n')
> >    finally:
> >      out.close()
>
> > Hope that's fairly easy to follow and makes sense.  There might
> > be some fence-posting errors (particularly your use of "1" as the
> > initial offset, while python uses "0" as the initial offset for
> > strings)
>
> > If you can't modify the def.csv format, then things are a bit
> > more complex and I'd almost be tempted to write a script to try
> > and convert your existing def.csv format into something simpler
> > to process like what I describe.
>
> > -tkc- Piilota siteerattu teksti -
>
> > - Näytä siteerattu teksti -
>
> Hi,
>
> Thanks for your reply.
>
> Def.csv could be modified so that every line has the same structure:
> variable name, field start, field size and new place and would be
> separated with semicolomns as you mentioned.
>
> I tried your script (which seems quite logical) but I get this
>
> Traceback (most recent call last):
>   File "testing.py", line 16, in <module>
>     out.write (row[slc].ljust(width))
> TypeError: an integer is required
>
> Yes - you said it was untested, but I can't figure out how to
> proceed...

The line

    (varname, slice(int(start), int(start)+int(size)), width)

should instead be

    (varname, slice(int(start), int(start)+int(size)), int(width))

although you give an example where there is no width - what does that
imply? In the above case, it will throw an exception.

Anyway, I think you'll find there's something a bit off in the output
loop with the parameter passed to ljust() as well. The value given in
your csv seems to be the absolute position, but as it's implemented by
Tim, it acts as the relative position.

Given Tim's parsing into the list fields, I have a feeling that what
you really want instead of

    for varname, slc, width in fields:
        out.write(row[slc].ljust(width))
    out.write('\n')

is to have

    s = ''
    for varname, slc, width in fields:
        s += " "*(width - len(s)) + row[slc]
    out.write(s+'\n')

And if that is what you want, then you will surely want to globally
replace the name 'width' with for example 'start_column', because then
it all makes sense :).

Cheers - Chas




More information about the Python-list mailing list