pyparsing question

Neil Cerutti mr.cerutti at gmail.com
Tue Jan 1 18:54:54 EST 2008


On Jan 1, 2008 6:32 PM, hubritic <colinlandrum at gmail.com> wrote:

> I am trying to parse data that looks like this:
>
> IDENTIFIER    TIMESTAMP   T  C   RESOURCE_NAME   DESCRIPTION
> 2BFA76F6     1208230607   T   S   SYSPROC                    SYSTEM
> SHUTDOWN BY USER
> A6D1BD62   1215230807     I
> H                                            Firmware Event
>
> My problem is that sometimes there is a RESOURCE_NAME and sometimes
> not, so I wind up with "Firmware" as my RESOURCE_NAME and "Event" as
> my DESCRIPTION.  The formating seems to use a set number of spaces.
>


> The data I have has a fixed number of characters per field, so I could
> split it up that way, but wouldn't that defeat the purpose of using a
> parser?  I am determined to become proficient with pyparsing so I am
> using it even when it could be considered overkill; thus, it has gone
> past mere utility now, this is a matter of principle!


If your data is really in fixed-size columns, then pyparsing is the wrong
tool.

There's no standard Python tool for reading and writing fixed-length field
"flatfile" data files, but it's pretty simple to use named slices to get at
the data.

identifier = slice(0, 8)
timestamp = slice(8, 18)
t = slice(18, 21)
c = slice(21, 24)
resource_name = slice(24, 35)
description = slice(35)

for line in file:
   line = line.rstrip("\n")
   print "id:", line[identifier]
   print "timestamp:", line[timestamp]
   ...etc...
-- 
Neil Cerutti
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20080101/799cb92f/attachment-0001.html>


More information about the Python-list mailing list