[Tutor] Newbie

Fri, 22 Feb 2002 22:02:31 +0100

>
>
>Hi Nicole,
>
>
>At the moment, the AddWord() function appends a new lexicon entry into the
>file, so all new entries are "tacked on" in the back.  One approach to
>keep the lexicon sorted is to load the lexicon into memory, sort the
>lexicon and then write the whole lexicon back to the file.
>
>Python lists have a 'sort()' method that you can use to make this work.
>Here's an interpreter session that may help:
>
>
>###
> >>> names = ['fish', 'sleep', 'their', 'the', 'an', 'big', 'good',
>'paris']
> >>> names.sort()
> >>> names
>['an', 'big', 'fish', 'good', 'paris', 'sleep', 'the', 'their']
>###
>
>If we want to have the function sort in a different way, we can pass the
>sort() method an optional "comparision function" that tells it how two
>elements compare to each other.
>
>For example, let's say that we'd like to sort these strings by length.  We
>can write this comparision function:
>
>###
> >>> def cmp_by_length(word1, word2):
>...     return cmp(len(word1), len(word2))
>...
> >>> names.sort(cmp_by_length)
> >>> names
>['an', 'big', 'the', 'fish', 'good', 'paris', 'sleep', 'their']
>###

This was quite helpful, so thanx a lot.
I wrote these functions which worked well:

def ignoreCase(word1,word2):
     import string
     word1,word2 = string.lower(word1),string.lower(word2)
     return cmp(word1,word2)

def sort_my_list(mylist):
     mylist.sort(ignoreCase)
     return mylist

Now I have great difficulties writing the sorted lexicon to file.

This is one of the horrible outputs I got:
----------------------------------------------------------------------
fish nHeidelberg nocean nstudents ntourists n

  fish visleep vi

  travel vtvisit vt

a detan detthe det

  big adjclever adj

  in prepto prep
-----------------------------------------------------------------------------------------------------------
The input sentence was: Stupid students study in Heidelberg.

New words : "stupid", "study" and "Heidelberg"

Only  "Heidelberg" was added.
Why did the first two new entries get lost ?
For testing I chose to create a new file to store the sorted lexicon.So my 
old lexicon is still there in case something goes wrong.
And indeed, many things went wrong. I hope my code is at least a bit 
readable(Sorry, there are hardly comments) for you.I got really confused by 
my own code!There must be a lot of mistakes. Important are sort_lexicon.py 
and maybe lexphase2.py. The remaining file is only to understand the whole 
program.
I hope someone has the time to read all this stuff!Thank you!
Ok, here we go:

veryNewLex.txt

----------------------------------------------------------------------------------------------------
fish n
ocean n
students n
tourists n

fish vi
sleep vi

travel vt
visit vt

a det
an det
the det

big adj
clever adj

in prep
to prep
-----------------------------------------------------------------------------------------------------------------------
sort_lexicon.py
-----------------------------------------------------------------------
def ignoreCase(word1,word2):
     import string
     word1,word2 = string.lower(word1),string.lower(word2)
     return cmp(word1,word2)

def sort_my_list(mylist):
     mylist.sort(ignoreCase)
     return mylist

def keepLexSorted(new_entry, partOfSpeech):
   nested_list =[]
   file = open("veryNewLex.txt","r")
   text = file.read()
   import string
   paragraphs = string.split(text, '\n\n')

   for paragraph in paragraphs:

        new_lists = string.split(paragraph, '\n')
        if (partOfSpeech == new_lists[0][-1]or partOfSpeech == 
new_lists[0][-3:] or partOfSpeech == new_lists[0][-4:]) : #this looks strange
             new_lists.append(new_entry) 
#is there a better way?

        sorted_list = sort_my_list(new_lists)
        sorted_list.append('\n\n')
        nested_list.append(sorted_list)

   new_file = open("new_test_lex.txt","w")
   for list in nested_list:
             new_file.writelines(list)

   new_file.close

-----------------------------

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
lexphase2.py
--------------------------------------------------------------------------------
def punctuation (list):
     lastWord = list[-1]
     if (lastWord[-1] == '!' or lastWord[-1]== '.') :
        lastWord = lastWord[:-1]
        list[-1] = lastWord
     return list

def lookupWordInLex(word):

         file = open('veryNewLex.txt','r')
         import string
         while 1:
           line = file.readline()
           if not line: break
           tmp = string.split(line)
           if tmp:
            if (word == tmp[0]):
               cat = tmp[1]
               return cat
         file.close()

def WordNotFound(unknown_word) :
    import string
    print "Couldn't find %s in lexicon!" %string.upper(unknown_word)
    print "If you want to continue, please add it to lexicon: "
    addWord()
    return partOfSpeech

---------------------------------------------------------------------------------------------------------------------------------------------------------------------

testfile.py
----------------------------------------------------------------------------------------------------
import string
import lexphase2

# Get sentence from user

sentence = raw_input("Please enter a sentence: ")
print

#split input string into tokens

wordlist = string.split(sentence)

#empty list to store categories
wordCat = []

# category for each  word :

for x in wordlist:
     Cat = lexphase2.lookupWordInLex(x)
     if Cat: wordCat.append(Cat)
     else:
           NewCat = lexphase2.WordNotFound(x)
           wordCat.append(NewCat)

print lexphase2.punctuation(wordlist)
print wordCat
# display input table

print 
"-----------------------------------------------------------------------------------"
count = 0
print "INPUT  ","|","\t",
while(count < len(wordlist)):
    print wordlist[count],"|","\t",
    count = count + 1
print
print 
"-----------------------------------------------------------------------------------"
count = 0
print "LEXICON","|","\t",
while(count < len(wordCat)):
    print wordCat[count],"|","\t","\t",
    count = count + 1
print
print 
"-----------------------------------------------------------------------------------"
count = 0
print "POSITION","|","\t",
while(count < len(wordlist)):
     print count+1 ,"|","\t","\t",
     count = count + 1

print

>I'm sorry I'm rushing things; I'm getting hungry and must get big, fishy
>french food.  mmm... fooo...good...
>
>
>Please feel free to ask more questions.  Good luck to you!