pattern

Thu Jun 14 23:42:03 EDT 2018

On 14Jun2018 20:01, Sharan Basappa <sharan.basappa at gmail.com> wrote:
>> >Can anyone explain to me the purpose of "pattern" in the line below:
>> >
>> >documents.append((w, pattern['class']))
>> >
>> >documents is declared as a list as follows:
>> >documents.append((w, pattern['class']))
>>
>> Not without a lot more context. Where did you find this code?
>
>I am sorry that partial info was not sufficient.
>I am actually trying to implement my first text classification code and I am referring to the below URL for that:
>
>https://machinelearnings.co/text-classification-using-neural-networks-f5cd7b8765c6

Ah, ok. It helps to include some cut/paste of the relevant code, though the URL 
is a big help.

The wider context of the code you recite looks like this:

  words = []
  classes = []
  documents = []
  ignore_words = ['?']
  # loop through each sentence in our training data
  for pattern in training_data:
      # tokenize each word in the sentence
      w = nltk.word_tokenize(pattern['sentence'])
      # add to our words list
      words.extend(w)
      # add to documents in our corpus
  documents.append((w, pattern['class']))

and the training_data is defined like this:

  training_data = []
  training_data.append({"class":"greeting", "sentence":"how are you?"})
  training_data.append({"class":"greeting", "sentence":"how is your day?"})
  ... lots more ...

So training data is a list of dicts, each dict holding a "class" and "sentence" 
key. The "for pattern in training_data" loop iterates over each item of the 
training_data. It calls nltk.word_tokenize on the 'sentence" part of the 
training item, presumably getting a list of "word" strings. The documents list 
gets this tuple:

  (w, pattern['class'])

added to it.

In this way the documents list ends up with tuples of (words, classification), 
with the words coming from the sentence via nltk and the classification coming 
straight from the train item's "class" value.

So at the end of the loop the documents array will look like:

  documents = [
    ( ['how', 'are', 'you'], 'greeting' ),
    ( ['how', 'is', 'your', 'day', 'greeting' ),
  ]

and so forth.

Cheers,
Cameron Simpson <cs at cskk.id.au>