Word Order Simple.

Sun Mar 13 09:14:42 EDT 2016

Rodrick Brown wrote:

> You are given nn words. Some words may repeat. For each word, output its
> number of occurrences. The output order should correspond with the input
> order of appearance of the word. See the sample input/output for
> clarification.
> 
> *Note:* Each input line ends with a *"\n"* character.
> 
> *Constraints:*
> 1≤n≤1051≤n≤105
> The sum of the lengths of all the words do not exceed 106106
> All the words are composed of lowercase English letters only.
> 
> *Input Format*
> 
> The first line contains the integer, nn.
> The next nn lines each contain a word.
> 
> *Output Format*
> 
> Output 22 lines.
> On the first line, output the number of distinct words from the input.
> On the second line, output the number of occurrences for each distinct
> word according to their appearance in the input.
> 
> *Sample Input*
> 
> 4
> bcdef
> abcdefg
> bcde
> bcdef
> 
> *Sample Output*
> 
> 3
> 2 1 1
> 
> *Explanation*
> 
> There are 3 distinct words. Here, *"bcdef"* appears twice in the input at
> the first and last positions. The other words appear once each. The order
> of the first appearances are *"bcdef"*,*"abcdefg"* and *"bcde"* which
> corresponds to the output.
> 
> Here is my attempt I can't seem to past all test cases and not sure why?

Keys in a dictionary are not stored in the order that they are entered. The 
order may even change between different runs of the same script with the 
same input (this was done to make a class of denial of service attacks on 
web applications harder). For example:

 $ cat wordcount.py # your script with an extra print() at the end
#!/usr/bin/env python3

from collections import defaultdict
from collections import Counter

if __name__ == '__main__':

  words = defaultdict(list)
  for i,word in enumerate(input() for x in range(int(input()))):
    words[word].append([i+1])

  count = Counter()
  print(len(words.keys()))

  for k in words:
    if len(words[k]) > 1:
      print(len(words[k]),end=' ')
    else:
      count[k] += 1

  for c in count.values(): print(c,end=' ')
  print()
$ cat words.txt
6
foo
bar
bar
baz
baz
baz
$ python3 wordcount.py < words.txt
3
3 2 1 
$ python3 wordcount.py < words.txt
3
3 2 1 
$ python3 wordcount.py < words.txt
3
2 3 1 

To fix this you can use a collections.OrderedDict, but I recommend that you 
try to write a script that uses one standard dict to map words to number of 
occurences and one list to keep track of the word when it first occurs.
Note that you do not need to keep track of all occurences of a word; in your 
script below you store more information than necessary.

With the example input above the list should contain

['foo', 'bar', 'baz']

and the dict should contain

{'foo': 1, 'bar': 2, 'baz': 3} # remember that the keys are not necessarily
                               # in that order

For debugging purposes you can print the dict and the list, and once your 
script fills them correctly print the second line required by the task by 
iterating over the list and looking up the frequencies in the dict.

> The explanation for line how to get 1 1 seems weird maybe I'm not reading
> it correctly.
> 
> #!/usr/bin/env python3
> 
> from collections import defaultdict
> from collections import Counter
> 
> if __name__ == '__main__':
> 
>   words = defaultdict(list)
>   for i,word in enumerate(input() for x in range(int(input()))):
>     words[word].append([i+1])
> 
>   count = Counter()
>   print(len(words.keys()))
> 
>   for k in words:
>     if len(words[k]) > 1:
>       print(len(words[k]),end=' ')
>     else:
>       count[k] += 1
> 
>   for c in count.values(): print(c,end=' ')
> 
> $  cat words.txt | ./wordcount.py
> 
> 3
> 2 1 1 ⏎