Extracting values from text file
Paul McGuire
ptmcg at austin.rr._bogus_.com
Fri Jun 16 10:29:49 EDT 2006
"Preben Randhol" <randhol at bacchus.pvv.ntnu.no> wrote in message
news:slrne94rcg.2qnb.randhol at bacchus.pvv.ntnu.no...
> What I first though was if there was possible to make a filter such as:
>
> Apples (apples)
> (ducks) Ducks
> (butter) g butter
>
> The data can be put in a hash table.
>
> Or maybe there are better ways? I generally want something that is
> flexible so one can easily make a filter settings if the text file
> format changes.
>
Here is a simple filter builder using pyparsing. Pyparsing runs in two
passes: first, to parse your filter patterns; then to use the generated
grammar to parse some incoming source string. Pyparsing comes with a
similar EBNF compiler, written by Seo Sanghyeon. I'm sorry this is not
really a newbie example, but it does allow you to easily construct simple
filters, and the implementation will give you something to chew on... :)
Pyparsing wont be as fast as re's, but I cobbled this filter compiler
together in about 3/4 of an hour, and may serve as a decent prototype for a
more full-featured package.
-- Paul
Pyparsing's home Wiki is at http://pyparsing.wikispaces.com.
-----------------
from pyparsing import *
sourceText = """
Apples 34
56 Ducks
Some more text.
0.5 g butter
"""
patterns = """\
Apples (apples)
(ducks:%) Ducks
(butter:#) g butter"""
def compilePatternList(patternList, openTagChar="(", closeTagChar=")",
greedy=True):
def compileType(s,l,t):
return {
"%" : Word(nums+"-",nums).setName("integer"),
"#" :
Combine(Optional("-")+Word(nums)+"."+Optional(Word(nums))).setName("float"),
"$" : Word(alphas).setName("alphabetic word"),
"*" : Word(printables).setName("char-group")
}[t[0]]
backgroundWord = Word(alphanums).setParseAction(lambda
s,l,t:Literal(t[0]))
matchType = Optional(Suppress(":") + oneOf("% # $
*"),default="*").setParseAction(compileType)
matchPattern = Combine(openTagChar +
Word(alphas,alphanums).setResultsName("nam") +
matchType.setResultsName("typ") +
closeTagChar)
matchPattern.setParseAction(lambda s,l,t:
(t.typ).setResultsName(t.nam) )
patternGrammar = OneOrMore( backgroundWord |
matchPattern ).setParseAction(lambda s,l,t:And([expr for expr in t]))
patterns = []
for p in patternList:
print p,
pattExpr = patternGrammar.parseString(p)[0]
print pattExpr
patterns.append(pattExpr)
altern = (greedy and Or or MatchFirst)
return altern( patterns )
grammar = compilePatternList( patterns.split("\n") )
print grammar
allResults = ParseResults([])
for t,s,e in grammar.scanString(sourceText):
print t
allResults += t
print
print allResults.keys()
for k in allResults.keys():
print k,allResults[k]
-----------------
Prints:
Apples (apples) {"Apples" char-group}
(ducks:%) Ducks {integer "Ducks"}
(butter:#) g butter {float "g" "butter"}
{{"Apples" char-group} ^ {integer "Ducks"} ^ {float "g" "butter"}}
['Apples', '34']
['56', 'Ducks']
['0.5', 'g', 'butter']
['butter', 'apples', 'ducks']
butter 0.5
apples 34
ducks 56
More information about the Python-list
mailing list