module to parse "pseudo natural" language?

John Roth newsgroups at jhrothjr.com
Sun Apr 17 12:25:45 EDT 2005


"Andrew E" <andrew at nospam.com> wrote in message 
news:d3thql$4rh$1 at news.hispeed.ch...
> Hi all
>
> I've written a python program that adds orders into our order routing
> simulation system. It works well, and has a syntax along these lines:
>
>  ./neworder --instrument NOKIA --size 23 --price MARKET --repeats 20
>
> etc
>
> However, I'd like to add a mode that will handle, say:
>
>  ./neworder buy 23 NOKIA at MKT x 20
>
> I could enter several orders either by running multiple times, or use a
> comma-separated approach, like:
>
>  ./neworder buy 23 NOKIA at MKT on helsinki, sell 20 NOKIA at market on
> helsinki
>
> The thing about this is that its a "tolerant" parser, so all of these
> should also work:
>
>  # omit words like "at", "on"
>  ./neworder buy 23 NOKIA mkt helsinki
>
>  # take any symbol for helsinki
>  ./neworder buy 23 mkt helsinki
>
>  # figure out that market=helsinki
>  ./neworder buy 23 NOKIA at market price
>
>
> I've started writing a simple state-based parser, usage like:
>
>  class NaturalLanguageInsructionBuilder:
>
>    def parse( self, arglist ):
>      """Given a sequence of args, return an Instruction object"""
>      ...
>      return Instruction( instrument, size, price, ... )
>
>
>  class Instruction:
>    """encapsulate a single instruction to buy, sell, etc"""
>
>    def __init__( self, instrument, size, price, ... ):
>      ...
>
>
> This doesn't work yet, but I know with time I'll get there.
>
> Question is - is there a module out there that will already handle this
> approach?
>
> Thanks for any suggestions :)

There's NLTK (on Sourceforge) which has already been  mentioned.
However, it's a teaching tool, not a real production natural language
parser.

I'd suggest you step back from the problem and take a wider
view. Parsing natural language, in all its variations, is an unsolved
research problem that is part of what has given Artificial Intelligence
somewhat of a black eye.

Your problem is, however, much simpler than the general one:
you've got a limited number of commands which pretty much
all follow the VO (verb operands) pattern.

You've also got a lot of words from limited and disjunct
vocabularies that can be used to drive the parse. In your example,
at least one of 'buy' and 'sell' is required to start a clause,
MKT is one of maybe a half dozen
qualifiers that specify other information that must be present,
there are a limited number of exchanges, and the number of
shares seems to be the only number present.

I'd also take a bit of advice from the XP community: don't
write the library first, wait until you've got at least three
working examples so you know the services that the
library really needs to support.

John Roth
>
> Andrew 




More information about the Python-list mailing list