Python parser

Robert Kern robert.kern at gmail.com
Mon Mar 2 17:53:59 EST 2009


On 2009-03-02 16:14, Clarendon wrote:
> Thank you, Lie and Andrew for your help.
>
> I have studied NLTK quite closely but its parsers seem to be only for
> demo. It has a very limited grammar set, and even a parser that is
> supposed to be "large" does not have enough grammar to cover common
> words like "I".
>
> I need to parse a large amount of texts collected from the web (around
> a couple hundred sentences at a time) very quickly, so I need a parser
> with a broad scope of grammar, enough to cover all these texts. This
> is what I mean by 'random'.
>
> An advanced programmer has advised me that Python is rather slow in
> processing large data, and so there are not many parsers written in
> Python. He recommends that I use Jython to use parsers written in
> Java. What are your views about this?

Let me clarify your request: you are asking for a parser of the English 
language, yes? Not just parsers in general? Not many English-language parsers 
are written in *any* language.

AFAIK, there is no English-language parser written in Python beyond those 
available in NLTK. There are probably none (in any language) which will robustly 
parse all of the grammatically correct English texts you will encounter by 
scraping the web, much less all of the incorrect English you will encounter.

Python can be rather slow for certain kinds of processing of large volumes (and 
really quite speedy for others). In this case, it's neither here nor there; the 
algorithms are reasonably slow in any language.

You may try your luck with link-grammar, which is implemented in C:

   http://www.abisource.com/projects/link-grammar/

Or The Stanford Parser, implemented in Java:

   http://nlp.stanford.edu/software/lex-parser.shtml

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco




More information about the Python-list mailing list