[TriZPUG] More Fun With Text Processing

Josh Johnson jj at email.unc.edu
Fri Apr 3 20:07:30 CEST 2009


It might be, I'll check. It's coming through pexpect so the output might 
be weird (I know the tty module messes with the line endings)

JJ

Chris Rossi wrote:
> Or maybe it's already outputting tab characters?
>
> Chris
>
>
> On Fri, Apr 3, 2009 at 11:51 AM, Stephan Altmueller 
> <stephan_altmueller at unc.edu <mailto:stephan_altmueller at unc.edu>> wrote:
>
>     Josh,
>
>     I think the first thing you should do is nail down the exact file
>     format.
>     If you have missing values and spaces in your format you have no
>     unambiguous way
>     to decide what column an entry belongs to.
>
>     Can you make the command line program insert some sort of
>     delimiter like
>     commas ?
>
>        -- Stephan
>
>     Josh Johnson wrote:
>     > Ok all,
>     > Since we've got a brain trust of pythonistas that know how to deal
>     > with strings, here's a problem I'm facing right now that I'd
>     like some
>     > input on:
>     >
>     > I've got a tabular list, it's the output from a command-line
>     program,
>     > and I need to parse it into some sort of structure.
>     >
>     > Here's an example of the data (the headings and column width
>     will vary):
>     > TARGET         VOLUME GROUP        LENGTH     AVAILABLE         NPE
>     > MIRROR
>     > 1.1               HIGHAVAIL    5001.023GB    4501.008GB    
>     1192337  2.1
>     > 1.3                  BACKUP    5001.023GB    4250.759GB     1192337
>     > 1.4                  BACKUP    3000.613GB    3000.353GB      715402
>     > 2.2               HIGHAVAIL    5001.023GB    5001.015GB    
>     1192337  1.2
>     > 2.3                  BACKUP    5001.023GB    5000.763GB     1192337
>     > 2.4                  BACKUP    3000.613GB    3000.353GB      715402
>     >
>     > I'd like a structure I can work with, like say, a list of hashes.
>     >
>     > My initial approach involves treating the header row as the
>     guide for
>     > the field lengths, and then extracting substrings for each field in
>     > each row.
>     >
>     > I also thought about just doing a split on spaces, but some of the
>     > fields could have spaces in their data.
>     >
>     > What do you guys think?
>     >
>     > JJ
>     > _______________________________________________
>     > TriZPUG mailing list
>     > TriZPUG at python.org <mailto:TriZPUG at python.org>
>     > http://mail.python.org/mailman/listinfo/trizpug
>     > http://trizpug.org is the Triangle Zope and Python Users Group
>
>
>     --
>     -------------------------------------------------
>     Stephan Altmueller
>     Applications Analyst, Enterprise Applications
>     Office of Arts and Sciences Information Services
>     University of North Carolina at Chapel Hill
>     CB 3056, 06 Howell Hall
>     Chapel Hill, NC 27599-3056
>     919.448.5936 (direct line)
>     stephan_altmueller at unc.edu <mailto:stephan_altmueller at unc.edu>
>     AIM: oasisaltmuell
>     http://oasis.unc.edu
>
>     _______________________________________________
>     TriZPUG mailing list
>     TriZPUG at python.org <mailto:TriZPUG at python.org>
>     http://mail.python.org/mailman/listinfo/trizpug
>     http://trizpug.org is the Triangle Zope and Python Users Group
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> TriZPUG mailing list
> TriZPUG at python.org
> http://mail.python.org/mailman/listinfo/trizpug
> http://trizpug.org is the Triangle Zope and Python Users Group



More information about the TriZPUG mailing list