On text processing
Paul McGuire
ptmcg at austin.rr.com
Fri Mar 23 22:31:53 EDT 2007
On Mar 23, 5:30 pm, "Daniel Nogradi" <nogr... at gmail.com> wrote:
> Hi list,
>
> I'm in a process of rewriting a bash/awk/sed script -- that grew to
> big -- in python. I can rewrite it in a simple line-by-line way but
> that results in ugly python code and I'm sure there is a simple
> pythonic way.
>
> The bash script processed text files of the form...
>
> Any elegant solution for this?
Is a parser overkill? Here's how you might use pyparsing for this
problem.
I just wanted to show that pyparsing's returned results can be
structured as more than just lists of tokens. Using pyparsing's Dict
class (or the dictOf helper that simplifies using Dict), you can
return results that can be accessed like a nested list, like a dict,
or like an instance with named attributes (see the last line of the
example).
You can adjust the syntax definition of keys and values to fit your
actual data, for instance, if the matrices are actually integers, then
define the matrixRow as:
matrixRow = Group( OneOrMore( Word(nums) ) ) + eol
-- Paul
from pyparsing import ParserElement, LineEnd, Word, alphas, alphanums,
\
Group, ZeroOrMore, OneOrMore, Optional, dictOf
data = """key1 value1
key2 value2
key3 value3
key4 value4
spec11 spec12 spec13 spec14
spec21 spec22 spec23 spec24
spec31 spec32 spec33 spec34
key5 value5
key6 value6
key7 value7
more11 more12 more13
more21 more22 more23
key8 value8
"""
# retain significant newlines (pyparsing reads over whitespace by
default)
ParserElement.setDefaultWhitespaceChars(" \t")
eol = LineEnd().suppress()
elem = Word(alphas,alphanums)
key = elem
matrixRow = Group( elem + elem + OneOrMore(elem) ) + eol
matrix = Group( OneOrMore( matrixRow ) ) + eol
value = elem + eol + Optional( matrix ) + ZeroOrMore(eol)
parser = dictOf(key, value)
# parse the data
results = parser.parseString(data)
# access the results
# - like a dict
# - like a list
# - like an instance with keys for attributes
print results.keys()
print
for k in sorted(results.keys()):
print k,
if isinstance( results[k], basestring ):
print results[k]
else:
print results[k][0]
for row in results[k][1]:
print " "," ".join(row)
print
print results.key3
Prints out:
['key8', 'key3', 'key2', 'key1', 'key7', 'key6', 'key5', 'key4']
key1 value1
key2 value2
key3 value3
key4 value4
spec11 spec12 spec13 spec14
spec21 spec22 spec23 spec24
spec31 spec32 spec33 spec34
key5 value5
key6 value6
key7 value7
more11 more12 more13
more21 more22 more23
key8 value8
value3
More information about the Python-list
mailing list