[Tutor] List and dictionary comprehensions
Alan Gauld
alan.gauld at btinternet.com
Mon Sep 29 13:51:20 CEST 2014
On 28/09/14 03:36, Armindo Rodrigues wrote:
> have noted the beginning and end of the quotes list so you can easily skip
> and go straight to the code section. ***
It would probably have been better to just delete all but a nfew of the
quotes. We don't need all of them to evaluate your code.
> import re
> from datetime import datetime
> import time
>
>
> ################### DATA LIST STARTS HERE
>
> data_list=["And now here is my secret, a very simple secret: It is only
> with the heart that one can see rightly; what is essential is invisible to
> the eye.",
> "All grown-ups were once children... but only few of them remember it.",
...
> "If you love a flower that lives on a star, then it's good at night, to
> look up at the sky. All the stars are blossoming."]
>
>
> ################## CODE STARTS HERE
>
> #Create a list of words taken from each individual word in the datalist
> word_list = []
> for item in data_list:
> for word in item.split(" "):
> word = re.sub('^[^a-zA-z]*|[^a-zA-Z]*$','', word)
word.strip() would be better here. You can specify a string of chars to
be stripped if its not only whitespace. Consider regular expressions as
a weapon of last resort.
> word_list.append(word)
> word_list = sorted(list(set(word_list))) #Remove repeated words
You don't need to convert the set into a list. sorted() works
with sets too.
> quotesDict = {}
> for word in word_list:
> quotesDict.setdefault(word,[]) #Create a dictionary with keys based on
> each word in the word list
By putting the words in the dictionary you lose the sorting you did
above. So the sorting was a waste of time.
> for key, value in quotesDict.items():
> indexofquote = 0
> for quote in data_list:
You should use enumerate for this. It will automatically give you the
index and quote and be less error prone than maintaining the index yourself.
> if key in quote:
> quotesDict[key].append(indexofquote) #Append the index of the
> found quotes to the dictionary key
> indexofquote+=1
>
> query=input("query: ")
> query = query.strip(" ").split(" ")
> query = list(set(query))
>
I don;t think you need the conversion to list here either.
You can just use the set.
> start_time = time.time()
>
> FoundQuotes = []
>
> # Right now the OR search just prints out the index of the found quotes.
> if ("or" in query) and ("and" not in query):
The logic here can be simplified by testing for 'and' first
if 'and' in query
remove 'or'
process and
elif 'or' in query
process 'or'
else process simple query
> query.remove("or")
> print("Performing OR search for: ", query)
> for item in query:
> if (item in quotesDict):
> print("FOUND ",len(quotesDict[item]), " ", item, "QUOTES: ",
> quotesDict.get(item))
> print("\n--- Execution ---\n", (time.time() - start_time) * 1000,
> "microseconds\n")
>
> else:
> if "and" in query:
> query.remove("and")
> if "or" in query:
> query.remove("or")
> print("Performing AND search for: ", query)
This looks wrong. What about the case where neither and/or are in the query?
> for item in query:
> if (item in quotesDict):
> FoundQuotes = FoundQuotes + (quotesDict.get(item))
> FoundQuotes = list(set([x for x in FoundQuotes if FoundQuotes.count(x)
>> 1]))
This doesn't look right either.
Foundquotes is a list of indexes. The comprehension builds a list of all
the indexes that appear more than once - what about a quote that was
only found once?
It then eliminates all the duplicates(set()) and returns it back to a
list(why not leave it as a set?)
I'd have expected a simple conversion of FoundQuotes to a set would be
what you wanted.
> for x in FoundQuotes:
> print(data_list[x])
> print("\n--- Execution ---\n", (time.time() - start_time) * 1000,
> "microseconds\n")
The other problem is that you are serching the dictionary
several times, thus losing some of the speed advantage of
using a dictionary.
You would get more benefit from the dictionary if you adopt a try/except
approach and just access the key directly. So, instead of:
> for item in query:
> if (item in quotesDict):
> FoundQuotes = FoundQuotes + (quotesDict.get(item))
for item in query:
try: FoundQuotes = FoundQuotes + quotesDict[item]
except KeyError: pass
Or better still use the default value of get:
for item in query:
FoundQuotes = FoundQuotes + quotesDict.get(item,[])
There are a few other things that could be tidied up but that should
give you something to get started with.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.flickr.com/photos/alangauldphotos
More information about the Tutor
mailing list