pattern

Sat Jun 16 14:59:53 EDT 2018

Dear Cameron,

This is so kind of you. Thanks for spending time to explain the code.
It did help a lot. I did go back and brush up lists & dictionaries.

At this point, I think, I need to go back and brush up Python from the start.
So, I will do that first.

On Friday, 15 June 2018 09:12:22 UTC+5:30, Cameron Simpson  wrote:
> On 14Jun2018 20:01, Sharan Basappa <sharan.basappa at gmail.com> wrote:
> >> >Can anyone explain to me the purpose of "pattern" in the line below:
> >> >
> >> >documents.append((w, pattern['class']))
> >> >
> >> >documents is declared as a list as follows:
> >> >documents.append((w, pattern['class']))
> >>
> >> Not without a lot more context. Where did you find this code?
> >
> >I am sorry that partial info was not sufficient.
> >I am actually trying to implement my first text classification code and I am referring to the below URL for that:
> >
> >https://machinelearnings.co/text-classification-using-neural-networks-f5cd7b8765c6
> 
> Ah, ok. It helps to include some cut/paste of the relevant code, though the URL 
> is a big help.
> 
> The wider context of the code you recite looks like this:
> 
>   words = []
>   classes = []
>   documents = []
>   ignore_words = ['?']
>   # loop through each sentence in our training data
>   for pattern in training_data:
>       # tokenize each word in the sentence
>       w = nltk.word_tokenize(pattern['sentence'])
>       # add to our words list
>       words.extend(w)
>       # add to documents in our corpus
>   documents.append((w, pattern['class']))
> 
> and the training_data is defined like this:
> 
>   training_data = []
>   training_data.append({"class":"greeting", "sentence":"how are you?"})
>   training_data.append({"class":"greeting", "sentence":"how is your day?"})
>   ... lots more ...
> 
> So training data is a list of dicts, each dict holding a "class" and "sentence" 
> key. The "for pattern in training_data" loop iterates over each item of the 
> training_data. It calls nltk.word_tokenize on the 'sentence" part of the 
> training item, presumably getting a list of "word" strings. The documents list 
> gets this tuple:
> 
>   (w, pattern['class'])
> 
> added to it.
> 
> In this way the documents list ends up with tuples of (words, classification), 
> with the words coming from the sentence via nltk and the classification coming 
> straight from the train item's "class" value.
> 
> So at the end of the loop the documents array will look like:
> 
>   documents = [
>     ( ['how', 'are', 'you'], 'greeting' ),
>     ( ['how', 'is', 'your', 'day', 'greeting' ),
>   ]
> 
> and so forth.
> 
> Cheers,
> Cameron Simpson <cs at cskk.id.au>