[Tutor] Fwd: Help/suggestion requested

Tue Apr 12 03:19:18 EDT 2022

Forwarding to list

Please always use "Reply List" or "Repy All" when responding to the
tutor list,
otherwise you just reply to the individual.

-------- Forwarded Message --------

Hi Alan,

Thanks a lot for taking the time out to read my email. I really
appreciate your suggestion and help. Please excuse me for the messy
email earlier. However after following your advice and rewriting my
code, I have managed to get the desired result. Initially in the bottom
part of the code where I was trying to filter out the less desired words
by removing from the original list, I was getting an error as 'list out
of bound'. 

Please see the amended code below and if there is room for improvement,
please do let me know.

***************************************************************************

#Initialise text as string
text = '''"I told you already," the curator stammered, kneeling
defenseless on the floor of the gallery. "I
have no idea what you are talking about!"
"You are lying." The man stared at him, perfectly immobile except for
the glint in his ghostly eyes.
"You and your brethren possess something that is not yours."
The curator felt a surge of adrenaline. How could he possibly know this?
"Tonight the rightful guardians will be restored. Tell me where it is
hidden, and you will live." The
man leveled his gun at the curator's head. "Is it a secret you will die
for?"
Saunière could not breathe.'''

#create a dictionary to store the words and their frequencies as key :
value pair.
word_dictionary = {}

#split and store the sample text in a list
text_list = text.split()
print(text_list)

#new list to store words without the common words
new_list = []

#define unwanted characters as string
special_characters = '''.,/?@:;{[]_ }'"-+=!£$%^&*()~<>¬`'''

#define less desired or common words as a list
less_desired_words = ['the', 'a', 'they', 'are', 'i', 'me', 'you', 'we',
'there', 'their', 'can', 'our', 'is', 'not', 
'be','for', 'in', 'on', 'no', 'have', 'he', 'she', 'and', 'your', 'him',
'her', 'at', 'of', 'that','his', 'what', 'it','where', 'will']

#iterate through text_list and remove the punctuations and convert to
lower case words and add to new list       
for word in text_list:
    for character in special_characters:
        word = word.lower()
        word = word.replace(character, "")
    new_list.append(word)

print(new_list)    

#iterate through new list and remove the common words
for word in new_list:
    for common_word in less_desired_words:
        if common_word in new_list:
            new_list.remove(common_word)       

print(new_list)

#count the words in the list and add to dictionary with their frequecy
as key:value pair            
for word in new_list:
    if word in word_dictionary:
        frequency = word_dictionary[word]
        word_dictionary[word] = frequency + 1

    else:
        word_dictionary[word] = 1

print(word_dictionary)

**************************************************************************

Your's faithfully,

Nadeem Nan

On Monday, 11 April 2022, 09:31:53 BST, Alan Gauld via Tutor
<tutor at python.org> wrote:

On 11/04/2022 02:11, nadeem nan via Tutor wrote:
> Hello experts !
> My name is Nadeem and I am a novice learning Python through an online
course. As part of the end of first module, we have been assigned a
small project to take a form of text or paragraph of any size with
punctuations included. The goal of the project is 
> 1. to write a script which will take the given text and add each word
from it to a dictionary along with its frequency (no. of times repeated). 
> 2. Punctuations are to be removed before adding the words to the
dictionary.
> 3. remove common words like 'the, they, are, not, be ,me, it, is, in'
etc. from the dictionary.

On the last point I'd suggest it would be easier to never add them to
the dictionary in the first place. Just check before inserting...

Unfortunately you need to post code in plain text format,
otherwise the layout gets mangled and unreadable, as you see below...

> #Initialise text as stringtext = '''"I told you already," the curator
stammered, kneeling defenseless on the floor of the gallery. "Ihave no
idea what you are talking about!""You are lying." The man stared at him,
perfectly immobile except for the glint in his ghostly eyes."You and
your brethren possess something that is not yours."The curator felt a
surge of adrenaline. How could he possibly know this?"Tonight the
rightful guardians will be restored. Tell me where it is hidden, and you
will live." Theman leveled his gun at the curator's head. "Is it a
secret you will die for?"Saunière could not breathe.'''
> #create a dictionary to store the words and their frequencies as key :
value pair.word_dictionary = {}#create a dictinary to store the words
without common words.final_dictionary = {}#split and store the sample
text in a listtext_list = text.split()print(text_list)#define unwanted
characters as stringunwanted_characters = '''.,/?@:;{}[]_
'"-+=!£$%^&*()~<>¬`'''
> #define less desired or common words as a listless_desired_words =
['the', 'a', 'they', 'are', 'i', 'me', 'you', 'we', 'there', 'their',
'can', 'our', 'is', 'not', 'for', 'in', 'on', 'no', 'have', 'he', 'she',
'and', 'your', 'him', 'her']
> #iterate through text_list and remove the punctuations and convert to
lower case words        for word in text_list:    for character in
unwanted_characters:        word = word.replace(character, "")       
word = word.lower()
> #count the words in the list and add to dictionary with their frequecy
as key:value pair                if word in word_dictionary:       
frequency = word_dictionary[word]        word_dictionary[word] =
frequency + 1            else:        word_dictionary[word] = 1

There are some bits on that which look like they could
be optimised but...

This looks like the relevant bit. I'll guess at formatting:

> print(word_dictionary)
> #remove the less desired or common words and add the remaining words
to the final dictionary.

       for word, frequent in word_dictionary.items():          for
notword in less_desired_words: 
              if word != notword:
                  final_dictionary[word] = frequent   

Rather than looping over less_desired_words
you could use an 'in' test:

if word not in less_desired_words:
    final_dictionary....

Secondly you are adding to the final dict everytime the word is not
notword. Thats wasteful, so you probably want a 'break' statement in
there. Alternatively use the else clause of the second for loop
to add the word.

Another approach would just be to del() the word from the original
dictionary if it is in less_desired...

However, it does look like it should work.
What are you getting?

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos

_______________________________________________
Tutor maillist  -  Tutor at python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor