Review Request of Python Code

Wed Mar 9 07:33:02 EST 2016

On 9 March 2016 at 12:06, Matt Wheeler <m at funkyhat.org> wrote:
> But we can still do better. A list is a poor choice for this kind of
> lookup, as Python has no way to find elements other than by checking
> them one after another. (given (one of the) name(s) you've given it
> sounds a bit like "dictionary" I assume it contains rather a lot of
> items)

Sorry, I've just read your original code properly and see that you're
looking up the next item in the list, this means a set is not
suitable, as it doesn't preserve order (however, your original code is
open to an IndexError if the last element in your list is matched).

If you could provide a sample of the NewTotalTag.txt file data that
would be helpful, but working with the information I've got we can
still get a comparable speedup, by constructing a dict upfront mapping
each word to the next one[1]:

dict_word=dict_read.split()
dict_word.append('N/A')
# Assuming that 'N/A' is a reasonable output if the last word in your
list is matched.
# This works around the IndexError your current code is exposed to.
# The slice ([:-1]) means we don't try to add the last item to the new a4 dict.
a4={}
for index,word in enumerate(words[:-1]):
    a4[word] = dict_word[index+1]

This creates a dict where each key maps to the corresponding next
word, which you can use later in your lookup instead of fetching by
index. i.e. a4[word] instead of a4[windex+1].
This means you're saving yet *another* scan through of the entire list
(`a4.index(word)` has to scan yet again) for the positive matches.

[1] though I suspect if we get to see a sample of your data file there
may be a better way

-- 
Matt Wheeler
http://funkyh.at