cataloging words in text file

Fri Mar 2 17:43:31 EST 2001

On 02-Mar-2001 Stephen Boulet wrote:
> I remember this homework assignment for my data structures (c++)
> class: read in a large file, and create a data structure containing
> every word in the file and the number of times it appears.
> 
> I was wondering how to do this in python. In c++ we had to do it
> with hash tables and b-trees.
> 
> Can you do it with dictionaries in python, with the key as the word
> and the data the number of times it appears? If there's any
> documentation in this problem domain that people could point out to
> me I would appreciate it.
> 

the algorithm is basically just like the C++ version (and same in any language
that supports hash/dictionaries).

while input:
   pieces = split input
   for each piece in pieces
       if hash.has_key(piece):
           hash[piece] = hash[piece] + 1
       else:
           hash[piece] = 0

the python advantage is the easy string manip.

> What about if you had a bunch of objects, like say stars, with
> attributes like position (two coordinates), magnitude, color. If you
> had to sort a bunch of them at a time by attributes (say at most
> 10^3 of them), would using dictionaries be a good idea, or should
> you start looking at interfaces to, for example, mysql?
> 

depends on the application, speed, storage ability, etc.  If this start list
will continue to grow a db sounds right.