Pyparsing troubles

Harry George harry.g.george at boeing.com
Mon Dec 11 09:57:33 EST 2006


poromenos at gmail.com writes:

> Hello,
> I have written a small pyparsing parser to recognize dates in the style
> "november 1st". I wrote something to the effect of:
> 
> expression = task + date
> 
> and tried to parse "Doctor's appointment on november 1st", hoping that
> task would be "Doctor's appointment" and date would be "on november
> 1st" (the parser does match "on november 1st" to "date"). I have set
> task as Regex(".*?"), ZeroOrMore(Word(alphas)), etc, but I can't get it
> to match, it matches everything to task and ignores date until it gets
> to the end of the string.
> 
> Can anyone help?
> 

As described, this is a Natural Language Programming (NLP) problem,
which means you will have a lot more trouble with understanding what
you want to do than in coding it.  Also, dates are notoriously tough
to parse, because of so many variants, so there are libraries to do
just that.

If you want to tackle it systematically:

1. Get a "corpus" of texts which illustrate the ways the users might
   state the date.  E.g., "2006-11-01", "1-Nov-06", "November 1",
   "Nov. first", "first of November", "10 days prior to Veterans Day",
   "next week", .....

2. If you can control the input, much better.  Either by a form which
   forces specific values for day, month, year, hour, minute, or by
   requiring IETF format (yyyy-mm-ddThh:mm:ss).

3. Determine the syntax rules for each example.  If possible, abstract
   these to general rules which work on more than one example.

4. At this point, you should know enough to decide if it is a:

a) Regular expression, parseable with a regexp engine

b) Context Free Grammar (CFG), parseable with a LL(1) or LALR(1) parser.

c) Context Dependent Grammar, parseable with an ad hoc parser with special rules.

d) Free text, not parseable in the normal sense, but perhaps
understandable with statistical analysis NLP techniques.

f) Hodgepodge not amenable to machine analysis.

5. Then we could look at using pyparser.  But we'd have to see
   the pyparser code you tried.

-- 
Harry George
PLM Engineering Architecture



More information about the Python-list mailing list