[Pythonmac-SIG] Cool Script, Not Sure What To Do With It

Jim Leff jimleff@chowhound.com
Sat, 2 Nov 2002 02:53:28 -0500


Hi guys

I'm not a programmer, and am sitting here with thousands of 
unfathomable macpython components and no idea how to make it do what 
I want: run one very cool script.

It's a script of great interest to writers, who are afflicted by 
overused words. It's really easy to repeat a word once or more within 
a paragraph or two, which makes things read fuzzy and amateurish. 
It's notoriously difficult to detect such repetition even upon 
careful checking, and there are no tools I know of to help with this. 
But this ingenious script does it.

Can anyone tell me how to use Macpython (2.2.1) to make it run and 
check out text files? I'm not a subscriber to the list, so please 
copy me if you reply. The script is by Jeremy Osner, and it's 
freeware. Feel free to distribute/use as you like. If you write, 
checking for repetition will improve your output significantly, I 
promise (I'm a published author FWIW).


Instructions from Jeremy:

>To use it, just type at the command prompt (or whatever the Mac OS 
>equivalent of a command prompt is), "python <filename> <interval> 
><exclude> < <input>" where <filename> is the name you saved the 
>script file as, <interval> is the max. number of words between 
>repetitions to be flagged, <exclude> is the name of a file with a 
>list of words you want to have excluded from being flagged, and 
><input> is the file you want to parse. The commands output will be 
>the same as input except all pertinent repetitions will be enclosed 
>in a pair of asterisks.


Here's the script. If anyone prefers it as a text file, let me know.


#! /usr/bin/env python
import sys
import string

def simplify(word):
     return string.lower(string.translate(word, string.maketrans('', 
''), string.punctuation))

def words(str):
     result = string.split(str)
     for i in range(len(result)):
         result[i] = simplify(result[i])
     return result

recentWords = []
excludeWords = []
queueSize = 10
if len(sys.argv) > 1:
     queueSize = int(sys.argv[1])
     if len(sys.argv) > 2:
         excludeWords = words(open(sys.argv[2]).read())
input = sys.stdin.readline()
while input != "":
     a = input.split()
     for word in a:
         simp = simplify(word)
         if excludeWords.count(simp) == 0:
             if recentWords.count(simp) > 0:
                 recentWords.remove(simp)
                 word = "*" + word + "*"
             recentWords.append(simp)
             if len(recentWords) > queueSize:
                 recentWords.pop(0)
         sys.stdout.write(word + " ")
     sys.stdout.write("\n")
     input = sys.stdin.readline()