[Tutor] simple python scrip for collocation discovery

Kent Johnson kent37 at tds.net
Sun Aug 17 22:57:33 CEST 2008


On 8/16/08, Emad Nawfal (عماد نوفل) <emadnawfal at gmail.com> wrote:
> #! usr/bin/python
> # Chi-squared collocation discovery
> # Important definitions first. Let's suppose that we
> # are trying to find whether "powerful computers" is a collocation
> # N = The number of all bigrams in the corpus
> # O11 = how many times the bigram "powerful computers" occurs in the corpus
> # O22 = the number of bigrams not having either word in our collocation = N
> - O11
> #  O12 = The number of bigrams whose second word is our second word
> # but whose first word is not "powerful"

This is just the number of occurrances of the second word - O11, isn't it?

> # O21 = The number of bigrams whose first word is our first word, but whose
> second word
> # is different from oour second word

This is the number of occurrances of the first word - O11.

So one way to solve this would be to make two dictionaries - one which
counts bigrams and one which counts words. Then you would get the
numbers with just three dictionary lookups.

Kent


More information about the Tutor mailing list