cataloging words in text file (fwd)

Dr. David Mertz mertz at gnosis.cx
Sat Mar 3 02:20:51 EST 2001


Stephen Boulet <stepheb at comm.mot.com> wrote:
| I remember this homework assignment for my data structures (c++)
| class: read in a large file, and create a data structure containing
| every word in the file and the number of times it appears.

Funny you should ask.  I just finished a program called 'indexer.py'
that does EXACTLY this, and then uses the data structure (a dictionary)
to perform searches.  Find the program at:

  http://gnosis.cx/download/indexer.py

I wrote this module as part of my _Charming Python_ series of articles,
and there is an article discussing the design of the module.  However, I
have been a bit naughty lately in putting up articles to my own site
(thanks all my readers) before IBM gets around to publishing them (they
pay for them, so should have some rights here[*]).  So I don't actually
have the accompanying article on my website now.  If anyone begs me for
it in email, I'll cough it up individually... otherwise, wait a couple
weeks (and read it at IBM developerWorks, especially... and excellent
site for fine programming information).

That said, the above module itself is rather extensively documented...
and I very much welcome feedback.

Yours, David...

------------------------------------------------------------------------
[*] Actually...  I'm not sure they *should* have rights.  Information
wants to be free, and everything should be available to everyone all the
time, gratis.  After the revolution, writers, programmers, artists, and
us whole merry lot will be paid endowments, and all intellectual
creation will be part of the common lot of humankind... and so on...

But for now, IBM pays me money, and I agree to a variety of conditions
about publications, legal restrictions, and stuff like that.

P.S. Strained joke along these lines can be found at gnosis.cx/.






More information about the Python-list mailing list