module to parse "pseudo natural" language?

Andrew E andrew at nospam.com
Sun Apr 17 07:38:09 EDT 2005


Hi all

I've written a python program that adds orders into our order routing
simulation system. It works well, and has a syntax along these lines:

  ./neworder --instrument NOKIA --size 23 --price MARKET --repeats 20

etc

However, I'd like to add a mode that will handle, say:

  ./neworder buy 23 NOKIA at MKT x 20

I could enter several orders either by running multiple times, or use a
comma-separated approach, like:

  ./neworder buy 23 NOKIA at MKT on helsinki, sell 20 NOKIA at market on
helsinki

The thing about this is that its a "tolerant" parser, so all of these
should also work:

  # omit words like "at", "on"
  ./neworder buy 23 NOKIA mkt helsinki

  # take any symbol for helsinki
  ./neworder buy 23 mkt helsinki

  # figure out that market=helsinki
  ./neworder buy 23 NOKIA at market price


I've started writing a simple state-based parser, usage like:

  class NaturalLanguageInsructionBuilder:

    def parse( self, arglist ):
      """Given a sequence of args, return an Instruction object"""
      ...
      return Instruction( instrument, size, price, ... )


  class Instruction:
    """encapsulate a single instruction to buy, sell, etc"""

    def __init__( self, instrument, size, price, ... ):
      ...


This doesn't work yet, but I know with time I'll get there.

Question is - is there a module out there that will already handle this
approach?

Thanks for any suggestions :)

Andrew



More information about the Python-list mailing list