A Mountain of Perl Books + Python Advocacy

Tue May 9 13:11:13 EDT 2000

tony summerfelt wrote in comp.lang.python:
> correct me if i'm wrong, but a duplicate of the last line is what's checked?
> 
> the code i posted (typos and all) the duplicate items could have been
> anyhere in the file:
> 
> 
> one
> one
> two
> three
> 
> or:
> 
> one
> two
> three
> one
> 
> the perl's hashes make that possible

Python's dictionaries are similar to Perl's hashes. What he wrote is what
Unix uniq does, I think. I couldn't read your original version, but if you
want to put every line into a hash so that you can find the uniques, that's
easy. (Oh wait, I think you put them into a list as well - to preserve
order? Let's do that to).

#!/usr/local/bin/python

import sys

try:
  file1 = open(sys.argv[1],'r')
  file2 = open(sys.argv[2],'w')
except:
  print "Usage: unique.py file1 file2"
  print
  print "An error occurred. You didn't give the right arguments, or the"
  print "files couldn't be opened."
  sys.exit(1) 

dict = {}
uniques = []

for line in file1.readlines():
  if not dict.has_key(line):
    dict[line] = 1
    uniques.append(line)

file2.writelines(uniques)

This does slurp in the whole file at once. If you don't like that, replace
the 'for line in...' line with something like

while 1:
  line = file1.readline()
  if not line:
    break

-- 
Remco Gerlich,  scarblac at pino.selwerd.nl
-rwxr-xr-x  1 root  5.89824e37 Oct 22  1990 /usr/bin/emacs [STR]