Newbie: word count and Win32

Jim Dennis jimd at vega.starshine.org
Thu Mar 21 06:31:14 EST 2002


In article <kr8f9ukbg3n3i3oghrues7ir93rub2tuo2 at 4ax.com>, Joe Christl wrote:
>On 19 Mar 2002 20:26:34 GMT, bokr at oz.net (Bengt Richter) wrote:

>>From a quick look that[1] script, it looks like it expects to read
>>from stdin only. I.e., it says

>>    # iterate over each line on stdin
>>    for line in sys.stdin.readlines() :
>>        ...

 ...

>>To make the script work like what you expected, it needs to look
>>at the command line arguments (sys.argv) for a file name and open that,
>>and use the resulting file object in place of stdin.


>He gave a a diff output for it to accept a filename.  I am going to try that
>first and report back,

	for i in sys.argv[1:]:
		try:
			file = open(i,'r')
			for line in file:
				...
		except IOError: pass

 ... that's about the simplest way to iterate over each in in 
 each file.  Within the last couple of days I posted a script
 that implements the GNU "wc" (word count) program in Python
 with full support for the GNU command line options.  That script
 shows an example of how to selectively use stdin (if no non-switch/option
 arguments are present) or to iterate over a list of files.  My 
 script is slightly UNIX-centric --- but it might work on Windows
 or Mac (if os.path.isdir() is sufficiently similar and getopts works).  
 I think os.path.isdir() is the only OS specific function I used.

 Of course the UNIX/GNU wc command only counts the total number of 
 "words" (whitespace separated things), lines, and characters in a 
 file or set of files.  It doesn't provide a count of individual words.

 Tonight I wrote a "wordcount.py" program that counts "words" and
 "knownwords" (if you have a suitable list under /usr/share/dict/words
 --- as most UNIX and GNU/Linux systems will) and then dumps a sorted
 list of all non-unique "words" (sequences of letters, possibly 
 hyphenated or with apostrophes, as in "o'clock").  It keeps counts
 of the words that it processed, the number that it added to it's
 dictionary, and the number that it found in "knownwords" dictionary 
 and prints summaries and ratios of those.

>>BTW, IMO you get a nicer ouput if you change the last line of the script to
>>     print '%6d: %s' % (words[word], word)

>I don't really care too much about the amount of times a word shows up.  I'll
>try it his way and yours.  :)

	I missed what you're actually trying to do.

>>Regards,
>>Bengt Richter

>Thanks,
>Joe Christl



More information about the Python-list mailing list