Howegrown wordcount

Keith P. Boruff kboruff at optonline.net
Sun Jun 13 07:24:03 EDT 2004


Grégoire Dooms wrote:


> What's the purpose of stripping the items in the list if you just count 
> their number ? Isn't this equivalent to
>           words += len(input.split(sep))
> 
>>         return words
>>     else:
>>         for item in input:
>>             wordcount(item)
>>
>>     return words
> 
> 
> Removing the global statement and sep param, you get:
> 
> def wordcount(input):
>     if isinstance(input, str):
>         return len(input.split())
>     else:
>         return sum([wordcount(item) for item in input])
> 
> -- 
> Grégoire Dooms

After reading this thread, I decided to embark on a word counting 
program of my own. One thing I like to do when learning new programming 
languages is to try and emulate some of my favorite UNIX type programs.

That said, to get the count of words in a string, I merely did the 
following:


# Beginning of program

import re

# Right now my simple wc program just reads piped data
if not sys.stdin.isatty(): input_data = sys.stdin.read()

print "number of words:", len(re.findall('[^\s]+', input_data))

# End of program

Though I've only done trivial tests on this up to now, the word count of 
this script seems to match that of the wc on my system (RH Linux WS). I 
ran some big RFC text files through this too.

There could be some flaws here; I don't know. I'll have to look at it 
better when I get back from the gym. If anyone here finds a problem, I'd 
be interested in hearing it.

Like I said, I love using these UNIX type programs to learn a new 
language. It helps me learn things like file I/O, command line 
arguments, string manipulations.. etc.

Keith P. Boruff









More information about the Python-list mailing list