[Tutor] Help/suggestion requested

Alan Gauld alan.gauld at yahoo.co.uk
Mon Apr 11 04:30:20 EDT 2022


On 11/04/2022 02:11, nadeem nan via Tutor wrote:
> Hello experts !
> My name is Nadeem and I am a novice learning Python through an online course. As part of the end of first module, we have been assigned a small project to take a form of text or paragraph of any size with punctuations included. The goal of the project is 
> 1. to write a script which will take the given text and add each word from it to a dictionary along with its frequency (no. of times repeated). 
> 2. Punctuations are to be removed before adding the words to the dictionary.
> 3. remove common words like 'the, they, are, not, be ,me, it, is, in' etc. from the dictionary.

On the last point I'd suggest it would be easier to never add them to
the dictionary in the first place. Just check before inserting...

Unfortunately you need to post code in plain text format,
otherwise the layout gets mangled and unreadable, as you see below...

> #Initialise text as stringtext = '''"I told you already," the curator stammered, kneeling defenseless on the floor of the gallery. "Ihave no idea what you are talking about!""You are lying." The man stared at him, perfectly immobile except for the glint in his ghostly eyes."You and your brethren possess something that is not yours."The curator felt a surge of adrenaline. How could he possibly know this?"Tonight the rightful guardians will be restored. Tell me where it is hidden, and you will live." Theman leveled his gun at the curator's head. "Is it a secret you will die for?"Saunière could not breathe.'''
> #create a dictionary to store the words and their frequencies as key : value pair.word_dictionary = {}#create a dictinary to store the words without common words.final_dictionary = {}#split and store the sample text in a listtext_list = text.split()print(text_list)#define unwanted characters as stringunwanted_characters = '''.,/?@:;{}[]_ '"-+=!£$%^&*()~<>¬`'''
> #define less desired or common words as a listless_desired_words = ['the', 'a', 'they', 'are', 'i', 'me', 'you', 'we', 'there', 'their', 'can', 'our', 'is', 'not', 'for', 'in', 'on', 'no', 'have', 'he', 'she', 'and', 'your', 'him', 'her']
> #iterate through text_list and remove the punctuations and convert to lower case words        for word in text_list:    for character in unwanted_characters:        word = word.replace(character, "")        word = word.lower()
> #count the words in the list and add to dictionary with their frequecy as key:value pair                if word in word_dictionary:        frequency = word_dictionary[word]        word_dictionary[word] = frequency + 1            else:        word_dictionary[word] = 1

There are some bits on that which look like they could
be optimised but...

This looks like the relevant bit. I'll guess at formatting:

> print(word_dictionary)
> #remove the less desired or common words and add the remaining words to the final dictionary.

       for word, frequent in word_dictionary.items():           for
notword in less_desired_words: 
               if word != notword:
                  final_dictionary[word] = frequent   

Rather than looping over less_desired_words
you could use an 'in' test:

if word not in less_desired_words:
     final_dictionary....

Secondly you are adding to the final dict everytime the word is not
notword. Thats wasteful, so you probably want a 'break' statement in
there. Alternatively use the else clause of the second for loop
to add the word.

Another approach would just be to del() the word from the original
dictionary if it is in less_desired...

However, it does look like it should work.
What are you getting?


-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos




More information about the Tutor mailing list