need simple parsing ability

Paul McGuire ptmcg at austin.rr._bogus_.com
Fri Jul 16 14:53:53 EDT 2004


"george young" <gry at ll.mit.edu> wrote in message
news:20040716111324.09267883.gry at ll.mit.edu...
> [python 2.3.3, x86 linux]
> For each run of my app, I have a known set of (<100) wafer names.
> Names are sometimes simply integers, sometimes a short string, and
> sometimes a short string followed by an integer, e.g.:
>
>   5, 6, 7, 8, 9, bar, foo_6, foo_7, foo_8, foo_9, foo_10, foo_11
>
> I need to read user input of a subset of these.  The user will type a
> set of names separated by commas (with optional white space), but there
> may also be sequences indicated by a dash between two integers, e.g.:
>
>    "9-11"       meaning 9,10,11
>    "foo_11-13"  meaning foo_11, foo_12, and foo_13.
>    "foo_9-11"   meaning foo_9,foo_10,foo_11, or
>    "bar09-11"   meaning bar09,bar10,bar11
>
> (Yes, I have to deal with integers with and without leading zeros)
> [I'll proclaim inverse sequences like "foo_11-9" invalid]
> So a sample input might be:
>
>    9,foo7-9,2-4,xxx   meaning 9,foo7,foo8,foo9,2,3,4,xxx
>
> The order of the resultant list of names is not important; I have
> to sort them later anyway.
>
> Fancy error recovery is not needed; an invalid input string will be
> peremptorily wiped from the screen with an annoyed beep.
>
> Can anyone suggest a clean way of doing this?  I don't mind
> installing and importing some parsing package, as long as my code
> using it is clear and simple.  Performance is not an issue.
>
>
> -- George Young
> -- 
> "Are the gods not just?"  "Oh no, child.
> What would become of us if they were?" (CSL)

Here's a pyparsing solution.  The best way to read this is to first look
over the grammar definitions, then to the parse actions attached to the
different bits of the grammar.  The most complicated part is the parse
action for integer ranges, in which we try to keep leading zeroes if they
were given in the original string.

You said exception handling is not a big deal, but it is built into
pyparsing. So use as much or as little as you like.

-- Paul


# download pyparsing at http://pyparsing.sourceforge.net

from pyparsing import
Word,delimitedList,alphas,alphanums,nums,Literal,StringEnd,ParseException

# define basic grammar
integer = Word(nums)
integerRange = integer.setResultsName("start") + "-" + \
                integer.setResultsName("end")
word = Word(alphas+"_")
wordRange = word.setResultsName("base") + ( integerRange | integer )
waferList  = delimitedList( integerRange | integer | wordRange | word ) + \
                        StringEnd()

# define parse actions (to expand range references)
def expandIntRange(st,loc,toks):
    expandedNums = range( int(toks.start), int(toks.end)+1 )
    # make sure leading zeroes are retained
    if toks.start.startswith('0'):
        return [ "%0*d"%(len(toks.start),n) for n in expandedNums ]
    else:
        return [ str(n) for n in expandedNums ]

def expandWordRange(st,loc,toks):
    baseNumPairs = zip( [toks.base]*(len(toks)-1), toks[1:] )
    return [ "".join(pair) for pair in baseNumPairs ]

# attach parse actions to grammar elements
integerRange.setParseAction( expandIntRange )
wordRange.setParseAction( expandWordRange )

# run tests (last one an error)
testData = """
9,foo7-9,2-4,xxx
9,foo_7- 9, 2-4, xxx
9 , foo07-09,2 - 4, bar6, xxx
9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10-11
9,foo7-9,2-4,xxx,5- 9, bar, foo_06, foo_010-11
9,foo7-9,2-4,xxx,foo_099-101
9,f07-09-12,xxx
"""

for t in testData.split("\n")[1:-1]:
    try:
        print t
        print waferList.parseString(t)
    except ParseException, pe:
        print t
        print (" "*pe.loc) + "^"
        print pe.msg
    print

=====================
output:
9,foo7-9,2-4,xxx
['9', 'foo7', 'foo8', 'foo9', '2', '3', '4', 'xxx']

9,foo_7- 9, 2-4, xxx
['9', 'foo_7', 'foo_8', 'foo_9', '2', '3', '4', 'xxx']

9 , foo07-09,2 - 4, bar6, xxx
['9', 'foo07', 'foo08', 'foo09', '2', '3', '4', 'bar6', 'xxx']

9,foo7-9,2-4,xxx,5, 6, 7, 8, 9, bar, foo_6, foo_10-11
['9', 'foo7', 'foo8', 'foo9', '2', '3', '4', 'xxx', '5', '6', '7', '8', '9',
'bar', 'foo_6', 'foo_10', 'foo_11']

9,foo7-9,2-4,xxx,5- 9, bar, foo_06, foo_010-11
['9', 'foo7', 'foo8', 'foo9', '2', '3', '4', 'xxx', '5', '6', '7', '8', '9',
'bar', 'foo_06', 'foo_010', 'foo_011']

9,foo7-9,2-4,xxx,foo_099-101
['9', 'foo7', 'foo8', 'foo9', '2', '3', '4', 'xxx', 'foo_099', 'foo_100',
'foo_101']

9,f07-09-12,xxx
9,f07-09-12,xxx
        ^
Expected end of text







More information about the Python-list mailing list