Problem using Optional pyparsing

Peter Otten __peter__ at web.de
Thu Aug 16 03:57:02 EDT 2007


Nathan Harmston wrote:

> I know this isnt the pyparsing list, but it doesnt seem like there is
> one. I m trying to use pyparsing to parse a file however I cant get
> the Optional keyword to work. My file generally looks like this:
> 
> ALIGNMENT  1020  YS2-10a02.q1k chr09     1295       42    141045
> 142297   C    1254 95.06 1295 reject_bad_break 0
> 
> or this:
> 
> ALIGNMENT  36    YS2-10a08.q1k chrm      208      165     10745
> 10788   C      44 95.45 593 reject_low 10,14
> 
> and my grammar work well for these lines, however somethings the row looks
like:
> ALIGNMENT  53    YS2-10b03.p1k chr12      180      125   1067465
> 1067520   C      56 98.21 532|5,2 reject_low 25
> 
> So I try to parse the 532 using
> 
> from pyparsing import *
> 
> integer = Word( nums )
> float = Word( nums+".")
> identifier = Word( alphanums+"-_." )
> 
> alignment = Literal("ALIGNMENT ").suppress()
> row_1 = integer.setResultsName("row_1")#.setParseAction(make_int)
> src_id = identifier.setResultsName("src_id")
> dest_id = identifier.setResultsName("dest_id")
> src_start = integer.setResultsName("src_start")#.setParseAction(make_int)
> src_stop = integer.setResultsName("src_stop")#.setParseAction(make_int)
> dest_start =
integer.setResultsName("dest_start")#.setParseAction(make_int)
> dest_stop = integer.setResultsName("dest_stop")#.setParseAction(make_int)
> row_8 = oneOf("F C").setResultsName("row_8")
> length = integer.setResultsName("length")#.setParseAction(make_int)
> percent_id =
float.setResultsName("percent_id")#.setParseAction(make_float)
> row_11 = integer + Optional(Literal("|") + commaSeparatedList )
> )#.setResultsName("row_11")#.setParseAction(make_int)
> result = Word(alphas+"_").setResultsName("result")
> row_13 = commaSeparatedList.setResultsName("row_13")
> 
> def make_alilines_status_parser():
>     return alignment + row_1 + src_id + dest_id + src_start + src_stop
> + dest_start + dest_stop + row_8 + length + percent_id + row_11 +
> result + row_13
> 
> def parse_alilines_status(ifile):
>     alilines = make_alilines_status_parser()
>     for l in ifile:
>         yield alilines.parseString( l )
> 
> However my parser always fails on lines of type 3. Does anyone know
> why the Optional part is not working.

The commaSeparatedList includes the rest of the line into its last item:

>>> commaSeparatedList.parseString("a,b    c")
(['a', 'b    c'], {})

You can fix this by defining your own delimitedList that doesnt accept
whitespace, e. g.:

>>> delimitedList(Word(alphanums)).parseString("a,b c")
(['a', 'b'], {})

Peter




More information about the Python-list mailing list