[TriZPUG] More Fun With Text Processing

Chris Rossi chris at christophermrossi.com
Fri Apr 3 18:03:20 CEST 2009


Or maybe it's already outputting tab characters?

Chris


On Fri, Apr 3, 2009 at 11:51 AM, Stephan Altmueller <
stephan_altmueller at unc.edu> wrote:

> Josh,
>
> I think the first thing you should do is nail down the exact file format.
> If you have missing values and spaces in your format you have no
> unambiguous way
> to decide what column an entry belongs to.
>
> Can you make the command line program insert some sort of delimiter like
> commas ?
>
>    -- Stephan
>
> Josh Johnson wrote:
> > Ok all,
> > Since we've got a brain trust of pythonistas that know how to deal
> > with strings, here's a problem I'm facing right now that I'd like some
> > input on:
> >
> > I've got a tabular list, it's the output from a command-line program,
> > and I need to parse it into some sort of structure.
> >
> > Here's an example of the data (the headings and column width will vary):
> > TARGET         VOLUME GROUP        LENGTH     AVAILABLE         NPE
> > MIRROR
> > 1.1               HIGHAVAIL    5001.023GB    4501.008GB     1192337  2.1
> > 1.3                  BACKUP    5001.023GB    4250.759GB     1192337
> > 1.4                  BACKUP    3000.613GB    3000.353GB      715402
> > 2.2               HIGHAVAIL    5001.023GB    5001.015GB     1192337  1.2
> > 2.3                  BACKUP    5001.023GB    5000.763GB     1192337
> > 2.4                  BACKUP    3000.613GB    3000.353GB      715402
> >
> > I'd like a structure I can work with, like say, a list of hashes.
> >
> > My initial approach involves treating the header row as the guide for
> > the field lengths, and then extracting substrings for each field in
> > each row.
> >
> > I also thought about just doing a split on spaces, but some of the
> > fields could have spaces in their data.
> >
> > What do you guys think?
> >
> > JJ
> > _______________________________________________
> > TriZPUG mailing list
> > TriZPUG at python.org
> > http://mail.python.org/mailman/listinfo/trizpug
> > http://trizpug.org is the Triangle Zope and Python Users Group
>
>
> --
> -------------------------------------------------
> Stephan Altmueller
> Applications Analyst, Enterprise Applications
> Office of Arts and Sciences Information Services
> University of North Carolina at Chapel Hill
> CB 3056, 06 Howell Hall
> Chapel Hill, NC 27599-3056
> 919.448.5936 (direct line)
> stephan_altmueller at unc.edu
> AIM: oasisaltmuell
> http://oasis.unc.edu
>
> _______________________________________________
> TriZPUG mailing list
> TriZPUG at python.org
> http://mail.python.org/mailman/listinfo/trizpug
> http://trizpug.org is the Triangle Zope and Python Users Group
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/trizpug/attachments/20090403/fc1d4c99/attachment.htm>


More information about the TriZPUG mailing list