parsing question

Tim Chase python.list at tim.thechases.com
Mon May 31 11:07:02 EDT 2010


On 05/31/2010 08:42 AM, Mag Gam wrote:
> I have a file with bunch of nfsstat -c (on AIX) which has all the
> hostnames, for example
...
> Is there a an easy way to parse this file according to each host?
>
> So,
> r1svr.Connectionless.calls=6553
> r1svr.Connectionless.badcalls=0
>
> and so on...
>
>
> I am currently using awk which I am able to get what I need, but
> curious if in python how people handle block data.

Since you already profess to having an awk solution, I felt it 
was okay to at least take a stab at my implementation (rather 
than doing your job for you :).  Without a complete spec for the 
output, it's a bit of guesswork, but I got something fairly close 
to what you want.  It uses nested dictionaries which mean the 
keys and values have to be referenced like

   servers["r1svr"]["connectionless"]["calls"]

and the values are strings (I'm not sure what you want in the 
case of the data that has both a value and percentage) not 
ints/floats/percentages/etc.

That said, this should get you fairly close to what you describe:

###########################################

import re
header_finding_re = re.compile(r'\b\w{2,}')
version_re = re.compile(r'^Version (\d+):\s*\(.*\)$', re.I)
CLIENT_HEADER = 'Client '
CONNECTION_HEADER = 'Connection'
servers = {}
server = client = orig_client = subtype = None
source = file('data.txt')
for line in source:
   line = line.rstrip('\r\n')
   if not line.strip(): continue
   if line.startswith('='*5) and line.endswith('='*5):
     server = line.strip('=')
     client = orig_client = subtype = None
   elif line.startswith(CLIENT_HEADER):
     orig_client = client = line[len(CLIENT_HEADER):-1]
     subtype = 'all'
   elif line.startswith(CONNECTION_HEADER):
     subtype = line.replace(' ', '').lower()
   else: # it's a version or header row
     m = version_re.match(line)
     if m:
       subtype = "v" + m.group(1)
     else:
       if None in (server, client, subtype):
         print "Missing data", repr((server, client, subtype))
         continue
       dest = servers.setdefault(server, {}
         ).setdefault(client, {}
         ).setdefault(subtype, {})
       data = source.next()
       row = header_finding_re.finditer(line)
       prev = row.next()
       for header in row:
         key = prev.group(0)
         value = data[prev.start():header.start()].strip()
         prev = header
         dest[key] = value
       key = prev.group(0)
       value = data[prev.start():].strip()
       dest[key] = value

for server, clients in servers.items():
   for client, subtypes in clients.items():
     for subtype, kv in subtypes.items():
       for key, value in kv.items():
         print ".".join([server, client, subtype, key]),
         print '=', value

###########################################

Have fun,

-tkc







More information about the Python-list mailing list