How can I count word frequency in a web site?

Michiel Overtoom motoom at xs4all.nl
Mon Nov 30 02:56:32 EST 2015


> On 30 Nov 2015, at 03:54, ryguy7272 <ryanshuell at gmail.com> wrote:
> 
> Now, how can I count specific words like 'fraud' and 'lawsuit'?

- convert the page to plain text
- remove any interpunction
- split into words
- see what words occur
- enumerate all the words and increase a counter for each word

Something like this:

s = """Today we're rounding out our planetary tour with ice giants Uranus
and Neptune. Both have small rocky cores, thick mantles of ammonia, water,
and methane, and atmospheres that make them look greenish and blue. Uranus
has a truly weird rotation and relatively dull weather, while Neptune has
clouds and storms whipped by tremendous winds. Both have rings and moons,
with Neptune's Triton probably being a captured iceball that has active
geology."""

import collections
cleaned = s.lower().replace("\n", " ").replace(".", "").replace(",", "").replace("'", " ")
count = collections.Counter(cleaned.split(" "))
for interesting in ("neptune", "and"):
    print "The word '%s' occurs %d times" % (interesting, count[interesting])


# Outputs:

The word 'neptune' occurs 3 times
The word 'and' occurs 7 times







More information about the Python-list mailing list