[Tutor] Newbie
Nicole Seitz
Nicole.Seitz@urz.uni-hd.de
Fri, 22 Feb 2002 22:02:31 +0100
>
>
>Hi Nicole,
>
>
>At the moment, the AddWord() function appends a new lexicon entry into the
>file, so all new entries are "tacked on" in the back. One approach to
>keep the lexicon sorted is to load the lexicon into memory, sort the
>lexicon and then write the whole lexicon back to the file.
>
>Python lists have a 'sort()' method that you can use to make this work.
>Here's an interpreter session that may help:
>
>
>###
> >>> names = ['fish', 'sleep', 'their', 'the', 'an', 'big', 'good',
>'paris']
> >>> names.sort()
> >>> names
>['an', 'big', 'fish', 'good', 'paris', 'sleep', 'the', 'their']
>###
>
>If we want to have the function sort in a different way, we can pass the
>sort() method an optional "comparision function" that tells it how two
>elements compare to each other.
>
>For example, let's say that we'd like to sort these strings by length. We
>can write this comparision function:
>
>###
> >>> def cmp_by_length(word1, word2):
>... return cmp(len(word1), len(word2))
>...
> >>> names.sort(cmp_by_length)
> >>> names
>['an', 'big', 'the', 'fish', 'good', 'paris', 'sleep', 'their']
>###
This was quite helpful, so thanx a lot.
I wrote these functions which worked well:
def ignoreCase(word1,word2):
import string
word1,word2 = string.lower(word1),string.lower(word2)
return cmp(word1,word2)
def sort_my_list(mylist):
mylist.sort(ignoreCase)
return mylist
Now I have great difficulties writing the sorted lexicon to file.
This is one of the horrible outputs I got:
----------------------------------------------------------------------
fish nHeidelberg nocean nstudents ntourists n
fish visleep vi
travel vtvisit vt
a detan detthe det
big adjclever adj
in prepto prep
-----------------------------------------------------------------------------------------------------------
The input sentence was: Stupid students study in Heidelberg.
New words : "stupid", "study" and "Heidelberg"
Only "Heidelberg" was added.
Why did the first two new entries get lost ?
For testing I chose to create a new file to store the sorted lexicon.So my
old lexicon is still there in case something goes wrong.
And indeed, many things went wrong. I hope my code is at least a bit
readable(Sorry, there are hardly comments) for you.I got really confused by
my own code!There must be a lot of mistakes. Important are sort_lexicon.py
and maybe lexphase2.py. The remaining file is only to understand the whole
program.
I hope someone has the time to read all this stuff!Thank you!
Ok, here we go:
veryNewLex.txt
----------------------------------------------------------------------------------------------------
fish n
ocean n
students n
tourists n
fish vi
sleep vi
travel vt
visit vt
a det
an det
the det
big adj
clever adj
in prep
to prep
-----------------------------------------------------------------------------------------------------------------------
sort_lexicon.py
-----------------------------------------------------------------------
def ignoreCase(word1,word2):
import string
word1,word2 = string.lower(word1),string.lower(word2)
return cmp(word1,word2)
def sort_my_list(mylist):
mylist.sort(ignoreCase)
return mylist
def keepLexSorted(new_entry, partOfSpeech):
nested_list =[]
file = open("veryNewLex.txt","r")
text = file.read()
import string
paragraphs = string.split(text, '\n\n')
for paragraph in paragraphs:
new_lists = string.split(paragraph, '\n')
if (partOfSpeech == new_lists[0][-1]or partOfSpeech ==
new_lists[0][-3:] or partOfSpeech == new_lists[0][-4:]) : #this looks strange
new_lists.append(new_entry)
#is there a better way?
sorted_list = sort_my_list(new_lists)
sorted_list.append('\n\n')
nested_list.append(sorted_list)
new_file = open("new_test_lex.txt","w")
for list in nested_list:
new_file.writelines(list)
new_file.close
-----------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
lexphase2.py
--------------------------------------------------------------------------------
def punctuation (list):
lastWord = list[-1]
if (lastWord[-1] == '!' or lastWord[-1]== '.') :
lastWord = lastWord[:-1]
list[-1] = lastWord
return list
def lookupWordInLex(word):
file = open('veryNewLex.txt','r')
import string
while 1:
line = file.readline()
if not line: break
tmp = string.split(line)
if tmp:
if (word == tmp[0]):
cat = tmp[1]
return cat
file.close()
def WordNotFound(unknown_word) :
import string
print "Couldn't find %s in lexicon!" %string.upper(unknown_word)
print "If you want to continue, please add it to lexicon: "
addWord()
return partOfSpeech
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
testfile.py
----------------------------------------------------------------------------------------------------
import string
import lexphase2
# Get sentence from user
sentence = raw_input("Please enter a sentence: ")
print
#split input string into tokens
wordlist = string.split(sentence)
#empty list to store categories
wordCat = []
# category for each word :
for x in wordlist:
Cat = lexphase2.lookupWordInLex(x)
if Cat: wordCat.append(Cat)
else:
NewCat = lexphase2.WordNotFound(x)
wordCat.append(NewCat)
print lexphase2.punctuation(wordlist)
print wordCat
# display input table
print
"-----------------------------------------------------------------------------------"
count = 0
print "INPUT ","|","\t",
while(count < len(wordlist)):
print wordlist[count],"|","\t",
count = count + 1
print
print
"-----------------------------------------------------------------------------------"
count = 0
print "LEXICON","|","\t",
while(count < len(wordCat)):
print wordCat[count],"|","\t","\t",
count = count + 1
print
print
"-----------------------------------------------------------------------------------"
count = 0
print "POSITION","|","\t",
while(count < len(wordlist)):
print count+1 ,"|","\t","\t",
count = count + 1
print
>I'm sorry I'm rushing things; I'm getting hungry and must get big, fishy
>french food. mmm... fooo...good...
>
>
>Please feel free to ask more questions. Good luck to you!