Newbie: word count and Win32
Jim Dennis
jimd at vega.starshine.org
Thu Mar 21 06:31:14 EST 2002
In article <kr8f9ukbg3n3i3oghrues7ir93rub2tuo2 at 4ax.com>, Joe Christl wrote:
>On 19 Mar 2002 20:26:34 GMT, bokr at oz.net (Bengt Richter) wrote:
>>From a quick look that[1] script, it looks like it expects to read
>>from stdin only. I.e., it says
>> # iterate over each line on stdin
>> for line in sys.stdin.readlines() :
>> ...
...
>>To make the script work like what you expected, it needs to look
>>at the command line arguments (sys.argv) for a file name and open that,
>>and use the resulting file object in place of stdin.
>He gave a a diff output for it to accept a filename. I am going to try that
>first and report back,
for i in sys.argv[1:]:
try:
file = open(i,'r')
for line in file:
...
except IOError: pass
... that's about the simplest way to iterate over each in in
each file. Within the last couple of days I posted a script
that implements the GNU "wc" (word count) program in Python
with full support for the GNU command line options. That script
shows an example of how to selectively use stdin (if no non-switch/option
arguments are present) or to iterate over a list of files. My
script is slightly UNIX-centric --- but it might work on Windows
or Mac (if os.path.isdir() is sufficiently similar and getopts works).
I think os.path.isdir() is the only OS specific function I used.
Of course the UNIX/GNU wc command only counts the total number of
"words" (whitespace separated things), lines, and characters in a
file or set of files. It doesn't provide a count of individual words.
Tonight I wrote a "wordcount.py" program that counts "words" and
"knownwords" (if you have a suitable list under /usr/share/dict/words
--- as most UNIX and GNU/Linux systems will) and then dumps a sorted
list of all non-unique "words" (sequences of letters, possibly
hyphenated or with apostrophes, as in "o'clock"). It keeps counts
of the words that it processed, the number that it added to it's
dictionary, and the number that it found in "knownwords" dictionary
and prints summaries and ratios of those.
>>BTW, IMO you get a nicer ouput if you change the last line of the script to
>> print '%6d: %s' % (words[word], word)
>I don't really care too much about the amount of times a word shows up. I'll
>try it his way and yours. :)
I missed what you're actually trying to do.
>>Regards,
>>Bengt Richter
>Thanks,
>Joe Christl
More information about the Python-list
mailing list