[Tutor] Learning natural language processing and Python? [learning about "stemming" words]

Stephen Harris cyberdiction@hotmail.com
Wed, 18 Sep 2002 13:49:26 -0700


----- Original Message -----
From: "Danny Yoo" <dyoo@hkn.eecs.berkeley.edu>
To: "Tutor" <tutor@python.org>
Sent: Tuesday, September 17, 2002 10:52 PM
Subject: [Tutor] Learning natural language processing and Python? [learning
about "stemming" words]


> Hi everyone,
>
> I'm starting to learn "Natural Language Processing", which tries to use a
> computer to tease meaning out of natural language.  I was wondering if
> anyone was interested in this sort of thing?
>

I have been interested in Artificial Intelligence (AI). Natural language
translation
is almost the same challenge as a Turing test passing AI program. The CyC
project took about 18 years to get to its present level of success.

http://www.cyc.com/nl.html
The Cyc NL Subsystem
The Cyc-NL system can be described in terms of its three primary components,
although in truth there are many other utilities that contribute to its
success. The three
main components are the lexicon, the syntactic parser, and the semantic
interpreter.

There is speech dictation software. Let us suppose we want to check some
text
we have dictated for spelling and grammar errors. There are words which
sound
alike but are spelled differently which will not be highligted by the
spelling and grammar
checkers. This is where an AI type meaning checker with associated correct
spelling
enters the picture. These work fairly well for concrete levels of thinking;
they require
apprehending the meaning but not necessarily creating a sense of identity or
personhood
required of the Turing test passing program.

As meaning becomes more abstract, there is a corresponding increase in
ambiguity
which means the filter tree of inference rules can become confused. So
poetry that
is translated from English to Russian and then back, is pretty garbled. I
think it would
be worse from English to Russian to French. The metaphors become quite
complex
and the common-sense rules of analogy require a lot of relational
sophistication working
on a very large set of facts. I'm speaking about why some translations of
The Illiad are
recognized as "gifted" and other translators are too literal or dont quite
grasp the meaning
of the author. For instance when Homer writes about some wisdom that old age
brings,
that idea will be better translated by the translator who has experienced
that realization.
So how does one write rules that represent a mature point of view in varying
contexts?!

Phrases like "which way does the wind blow" have layers of meaning depending
on context.
There are some phrases in every natural language which do not have literal
translation into
another language. So the meaning has to be captured before it can be
rendered appropriately.
Like altavista translation just works well enough so that you can usually
fill in the %$#$ words.

Anyway, I wrote because you seemed to have a mildly dabbling attitude about
a project that
I think would take a lot of time to create anything useful. The CyC webpage
has more on theory.

Best regards,
Stephen