module to parse "pseudo natural" language?

bytecolor bytecolor at yahoo.com
Sun Apr 17 17:54:43 EDT 2005


Andrew E wrote:
> Hi all
>
> I've written a python program that adds orders into our order routing
> simulation system. It works well, and has a syntax along these lines:
>
>   ./neworder --instrument NOKIA --size 23 --price MARKET --repeats 20
>
> etc
>
> However, I'd like to add a mode that will handle, say:
>
>   ./neworder buy 23 NOKIA at MKT x 20
>
> I could enter several orders either by running multiple times, or use
a
> comma-separated approach, like:
>
>   ./neworder buy 23 NOKIA at MKT on helsinki, sell 20 NOKIA at market
on
> helsinki

You could add a string option instead:
$ neworder -c 'buy 23 NOKIA at MKT on helsinki, sell 20 NOKIA at market
on helsinki'

This would leave your current option parsing in tact. Then just
split on the comma.

Another suggestion would be to drop into an interactive mode if
no arguments are passed:
$ neworder
->? buy 23 NOKIA at MKT on helsinki
->? sell 20 NOKIA at market on helsinki
->? ^d

> The thing about this is that its a "tolerant" parser, so all of these
> should also work:
>
>   # omit words like "at", "on"
>   ./neworder buy 23 NOKIA mkt helsinki
>
>   # take any symbol for helsinki
>   ./neworder buy 23 mkt helsinki
>
>   # figure out that market=helsinki
>   ./neworder buy 23 NOKIA at market price
>
>
> I've started writing a simple state-based parser, usage like:
>
>   class NaturalLanguageInsructionBuilder:
>
>     def parse( self, arglist ):
>       """Given a sequence of args, return an Instruction object"""
>       ...
>       return Instruction( instrument, size, price, ... )
>
>
>   class Instruction:
>     """encapsulate a single instruction to buy, sell, etc"""
>
>     def __init__( self, instrument, size, price, ... ):
>       ...
>
>
> This doesn't work yet, but I know with time I'll get there.
>
> Question is - is there a module out there that will already handle
this
> approach?
>
> Thanks for any suggestions :)
>
> Andrew

If I were in your situation, I'd probably write a BNF for the
tiny-language. This would help wrap my brain around the problem.
The BNF would help show what kind of regular expression you are
looking at creating as well.

--
bytecolor




More information about the Python-list mailing list