On text processing

Paul McGuire ptmcg at austin.rr.com
Fri Mar 23 22:31:53 EDT 2007


On Mar 23, 5:30 pm, "Daniel Nogradi" <nogr... at gmail.com> wrote:
> Hi list,
>
> I'm in a process of rewriting a bash/awk/sed script -- that grew to
> big -- in python. I can rewrite it in a simple line-by-line way but
> that results in ugly python code and I'm sure there is a simple
> pythonic way.
>
> The bash script processed text files of the form...
>
> Any elegant solution for this?

Is a parser overkill?  Here's how you might use pyparsing for this
problem.

I just wanted to show that pyparsing's returned results can be
structured as more than just lists of tokens.  Using pyparsing's Dict
class (or the dictOf helper that simplifies using Dict), you can
return results that can be accessed like a nested list, like a dict,
or like an instance with named attributes (see the last line of the
example).

You can adjust the syntax definition of keys and values to fit your
actual data, for instance, if the matrices are actually integers, then
define the matrixRow as:

matrixRow = Group( OneOrMore( Word(nums) ) ) + eol


-- Paul


from pyparsing import ParserElement, LineEnd, Word, alphas, alphanums,
\
        Group, ZeroOrMore, OneOrMore, Optional, dictOf

data = """key1    value1
key2    value2
key3    value3


key4    value4
spec11  spec12   spec13   spec14
spec21  spec22   spec23   spec24
spec31  spec32   spec33   spec34


key5    value5
key6    value6


key7    value7
more11   more12   more13
more21   more22   more23


key8    value8
"""

# retain significant newlines (pyparsing reads over whitespace by
default)
ParserElement.setDefaultWhitespaceChars(" \t")

eol = LineEnd().suppress()
elem = Word(alphas,alphanums)
key = elem
matrixRow = Group( elem + elem + OneOrMore(elem) ) + eol
matrix = Group( OneOrMore( matrixRow ) ) + eol
value = elem + eol + Optional( matrix ) + ZeroOrMore(eol)
parser = dictOf(key, value)

# parse the data
results = parser.parseString(data)

# access the results
# - like a dict
# - like a list
# - like an instance with keys for attributes
print results.keys()
print

for k in sorted(results.keys()):
    print k,
    if isinstance( results[k], basestring ):
        print results[k]
    else:
        print results[k][0]
        for row in results[k][1]:
            print "   "," ".join(row)
print

print results.key3


Prints out:
['key8', 'key3', 'key2', 'key1', 'key7', 'key6', 'key5', 'key4']

key1 value1
key2 value2
key3 value3
key4 value4
    spec11 spec12 spec13 spec14
    spec21 spec22 spec23 spec24
    spec31 spec32 spec33 spec34
key5 value5
key6 value6
key7 value7
    more11 more12 more13
    more21 more22 more23
key8 value8

value3






More information about the Python-list mailing list