"Newbie" questions - "unique" sorting ?

Tue Jun 24 23:11:59 EDT 2003

>>>>> "John" == John Fitzsimons <xpm4senn001 at sneakemail.com> writes:

    John> (B) I am wanting to sort words (or is that strings ?) into a
    John> list from a clipboard and/or file input and/or....

    John> (C) To sort out the list of "unique" words/strings.

The classic idiom for getting a unique list is to use a dictionary
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560.

If you have enough memory to do everything in memory, the following
should be quote efficient

  allWords = file('myfile.dat').read().split()
  uwords = dict([(w,1) for w in allWords]).keys()
  uwords.sort()
  print uwords

By using list comprehensions to build the dict, as above, you avoid
some of the overhead of a manual loop approach.  

Although this approach conserves speed over memory, in my own
experience processing text files, it is the way to go.  Very large
text files (you mentioned 50MB) are extremely rare.  For example, the
entire King James bible, including html markup, is < 5MB.  The
complete works of Shakespeare, including html markup, are < 10MB.  So
I think it would be unusual for you to need to process a single text
file larger that 10MB.  Unless you have a specific example where you
need to process such extremely large files, I recommend doing as much
as possible in memory.

John Hunter