[Tutor] Evaluating Swahili Part of Speech Tagging. How can I write a Python script for that?

Emad Nawfal (عماد نوفل) emadnawfal at gmail.com
Tue Mar 24 17:57:40 CET 2009


2009/3/24 Emad Nawfal (عماد نوفل) <emadnawfal at gmail.com>

>
>
> 2009/3/24 Eduardo Vieira <eduardo.susan at gmail.com>
>
> 2009/3/24 Emad Nawfal (عماد نوفل) <emadnawfal at gmail.com>:
>> > Evaluating Swahili Part of Speech Tagging. How can I write a Python
>> script
>> > for that?
>> > # The information provided herein about Swahili may not be accurate
>> > # it is just intended to illustrate the problem
>> >
>> Hello, Mr. Emad! Have you checked the NLTK (Natural Language Toolkit -
>> http://www.nltk.org ) a Python package for Linguistics applications?
>> Maybe they have something already implemented. I actually liked a lot
>> their tutorials about python and using pythons for Linguistics. Very
>> good explanations.
>>
>
>
> I have checked the NLTK, and it does not seem to have something like this.
> Thanks for the suggestion though
>
> --
> لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه كالحقيقة.....محمد
> الغزالي
> "No victim has ever been more repressed and alienated than the truth"
>
> Emad Soliman Nawfal
> Indiana University, Bloomington
> --------------------------------------------------------
>


Thanks James,
I'm using the TnT POS Tagger, and I treat it as a black box, otherwise I
have to write my own, which is a huge task.
The Segmenter I use is home-grown, and it is supposedly the best available.
I used to evaluate on whole words, and this was easy. After the segmentation
and tagging, I combined the various segments of each word, and this
elimintaed the discrepancy in alignment. For example, I would have an output
like this:

the+man Det+Noun the+man Det+Noun
who+came+to+us <tag> whocame+to+us <wrongTag>
It is easy to do it this way if you use a WORD_END_DELIMITER, but this is
very tedious, and you have to recalculate the segment accuracy.
I'm looking for something smarter than this.

-- 
لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه كالحقيقة.....محمد
الغزالي
"No victim has ever been more repressed and alienated than the truth"

Emad Soliman Nawfal
Indiana University, Bloomington
--------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090324/0b757658/attachment-0001.htm>


More information about the Tutor mailing list